首页 > 器件类别 >

79RV4640-180DUG

Low-Cost Embedded 64-bit RISController w/ DSP Capability

厂商名称:IDT(艾迪悌)

厂商官网:http://www.idt.com/

下载文档
文档预览
Low-Cost Embedded
64-bit RISController
w/ DSP Capability
Features
High-performance embedded 64-bit microprocessor
– 64-bit integer operations
– 64-bit registers
– Based on the MIPS RISC Architecture
– 100MHz, 133MHz, 150MHz, 180MHz, 200MHz and 267MHz
operating frequencies
– 32-bit bus interface brings 64-bit power to 32-bit system cost
High-performance DSP capability
– 133.5 Million Integer Mul-Accumulate
operations/sec @267MHz
– 89 MFlops floating-point operations @267MHz
High-performance microprocessor
– 133.5 M Mul-Add/second @267MHz
– 89 MFlops @267MHz
– >640,000 dhrystone (2.1)/sec capability @267MHz (352
dhrystone MIPS)
High level of integration
– 64-bit, 267 MHz integer CPU
– 8KB instruction cache; 8KB data cache
– Integer multiply unit with 133.5M Mul-Add/sec
Upwardly software compatible with IDT RISController
Family
Easily upgradable to 64-bit system
IDT79RC4640
Low-power operation
– Active power management powers-down inactive units
– Standby mode
Large, efficient on-chip caches
– Separate 8KB Instruction and 8KB Data caches
– Over 3200MB/sec bandwidth from internal caches
– 2-set associative
– Write-back and write-through support
– Cache locking, to facilitate deterministic response
– High performance write protocols, for graphics and data
communications
Bus compatible with RC4000 family
– System interfaces to 125MHz, provides bandwidth up to 500
MB/sec
– Direct interface to 32-bit wide systems
– Synchronized to external reference clock for multi- master
operation
– Socket compatible with IDT RC 64474 and RC64574
Improved real-time support
– Fast interrupt decode
– Optional cache locking
Note:
“R” refers to 5V parts; “RV” refers to 3.3V parts; “RC”
refers to both
Block Diagram
267 MHz 64-bit CPU
64-bit Register File
64-bit Adder
System Control Coprocessor
Address Translation/
Cache Attribute Control
89 MFlops Single-Precision FPA
FP Register File
Pipeline Control
Load Aligner
Store Aligner
Logic Unit
High-Performance
Integer Multiply
Exception Management
Functions
Pipeline Control
Pack/Unpack
FP Add/Sub/Cvt/
Div/Sqrt
FP Multiply
Control Bus
Data Bus
Instruction Bus
Instruction Cache
Set A
(Lockable)
Instruction Cache
Set B
32-bit
Synchronized
System Interface
Data Cache
Set A
(Lockable)
Data Cache
Set B
The IDT logo is a trademark and RC4600, RC4650, RC3081,RC3052,RC3051,RC3041 RISController, and RISCore are trademarks of Integrated Device Technology, Inc.
1 of 23
©
2008 Integrated Device Technology, Inc.
December 5, 2008
DSC 3486/2
IDT79RC4640™
Description
The IDT79RC4640 is a low-cost member of the Integrated Device
Technology, Inc. RC4000 family, targeted to a variety of performance-
hungry embedded applications. The RC4640 continues the RC4000
tradition of high-performance through high-speed pipelines, high-band-
width caches and bus interface, 64-bit architecture, and careful attention
to efficient control. The cost of this performance is reduced by removing
functional units frequently not required for many embedded applications.
The RC4640 supports a wide variety of embedded processor-based
applications, such as internetworking equipment (routers, switches),
office automation equipment (printers, scanners), and consumer multi-
media game systems. Also, being upwardly software-compatible with
the RC32300 family as well as bus- and upwardly software-compatible
with the IDT RC4000 family, the RC4640 will serve in many of the same
applications. And, the RC4640 supports applications that require integer
digital signal processing (DSP) functions.
The RC64475 and RC64575 processors offer a direct migration path
for designs based on IDT’s RC4650 processors, through full pin and
socket compatibility.
The RC4640 brings 64-bit performance levels to lower cost systems.
High performance is preserved by retaining large on-chip two-way set-
associative caches, a streamlined high-speed pipeline, high bandwidth,
64-bit execution, and facilities such as early restart for data cache
misses.
These techniques allow the system designer over 3.2 GB/sec aggre-
gate internal bandwidth, 500 MB/sec bus bandwidth, almost 352 Dhrys-
tone MIPS, 89MFlops, and 133.5 M Mul-Add/sec. An array of tools
facilitates rapid development of RC4640-based systems, allowing a
wide variety of customers access to the processor’s high-performance
capabilities while maintaining short time-to-market goals.
The extensions result in better code density, greater multi-
processing support, improved performance for commonly used code
sequences in operating system kernels, and faster execution of floating-
point intensive applications. All resource dependencies are made trans-
parent to the programmer, insuring transportability among implementa-
tions of the MIPS instruction set architecture. In addition, MIPS-III
specifies new instructions defined to take advantage of the 64-bit archi-
tecture of the processor.
Finally, the RC4640 also implements additional instructions, which
are considered extensions to the MIPS-III architecture. These instruc-
tions improve the multiply and multiply-add throughput of the CPU,
making it well suited to a wide variety of imaging and DSP applications.
These extensions, which use opcodes allocated by MIPS Technologies
for this purpose, are supported by a wide variety of development tools.
The MIPS integer unit implements a load/store architecture with
single cycle ALU operations (logical, shift, add, sub) and autonomous
multiply/divide unit. The 64-bit register resources include: 32 general-
purpose orthogonal integer registers, the HI/LO result registers for the
integer multiply/divide unit, and the program counter. In addition, the on-
chip floating-point co-processor adds 32 floating-point registers, and a
floating-point control/status register.
Register File
The RC4640 has 32 general-purpose 64-bit registers. These regis-
ters are used for scalar integer operations and address calculation. The
register file consists of two read ports and one write port and is fully
bypassed to minimize operation latency in the pipeline.
Arithmetic Logic Unit
The RC4640 ALU consists of the integer adder and logic unit. The
adder performs address calculations in addition to arithmetic operations;
the logic unit performs all of the logic and shift operations. Each unit is
highly optimized and can perform an operation in a single pipeline cycle.
Integer Multiply/Divide
The RC4640 uses a dedicated integer multiply/divide unit, optimized
for high-speed multiply and multiply-accumulate operation. Table 1
shows the performance, expressed in terms of pipeline clocks, achieved
by the RC4640 integer multiply unit.
Opcode
MULT/U, MAD/U
MUL
DMULT, DMULTU
DIV, DIVU
DDIV, DDIVU
Operand
Size
16 bit
32 bit
16 bit
32 bit
any
any
any
Latency Repeat
3
4
3
4
6
36
68
2
3
2
3
5
36
68
Stall
0
0
1
2
0
0
0
Hardware Overview
Some key elements of the RC4640 are briefly described below. More
detailed information is available in the
IDT79RC4640/IDT79RC4650
RISC Processor Hardware User’s Manual.
Pipeline
The RC4640 uses a 5-stage pipeline that is similar to the
IDT79RC3000 and the IDT79RC4700 processors. The simplicity of this
pipeline allows the RC4640 to cost less than super-scalar processors
and require less power than super-pipelined processors. So, unlike
superscalar processors, applications that have large data dependen-
cies, or require frequent load/stores, can still achieve peak performance.
Integer Execution Engine
The RC4640 implements the MIPS-III Instruction Set Architecture
and is fully upward compatible with applications that run on earlier
generation parts. The RC4640 is software-compatible with the RC4650,
and includes the instruction set found in the RC4700 microprocessor,
targeted at higher performance while maintaining binary compatibility
with RC32300 processors.
Table 1 RC4640 Integer Multiply Operation
2 of 23
December 5, 2008
IDT79RC4640™
The MIPS-III architecture defines that the results of a multiply or
divide operation are placed in the HI and LO registers. The values can
then be transferred to the general purpose register file using the MFHI/
MFLO instructions.
The RC4640 adds a new multiply instruction, “MUL”, which can
specify that the multiply results bypass the “Lo” register and are placed
immediately in the primary register file. By avoiding the explicit “Move-
from-Lo” instruction required when using “Lo”, throughput of multiply-
intensive operations is increased.
An additional enhancement offered by the RC4640 is an atomic
“multiply-add” operation, MAD, used to perform multiply-accumulate
operations. This instruction multiplies two numbers and adds the product
to the current contents of the HI and LO registers. This operation is used
in numerous DSP algorithms, and allows the RC4640 to cost reduce
systems requiring a mix of DSP and control functions.
Finally, aggressive implementation techniques feature low latency for
these operations along with pipelining to allow new operations to be
issued before a previous one has fully completed. Table 1 also shows
the repeat rate (peak issue rate), latency, and number of processor stalls
required for the various operations. The RC4640 performs automatic
operand size detection to determine the size of the operand, and imple-
ments hardware interlocks to prevent overrun, allowing this high-perfor-
mance to be achieved with simple programming.
Floating-Point Coprocessor
The RC4640 incorporates an entire single-precision floating-point
coprocessor on chip, including a floating-point register file and execution
units. The floating-point coprocessor forms a “seamless” interface with
the integer unit, decoding and executing instructions in parallel with the
integer unit.
The floating-point unit of the RC4640 directly implements single-
precision floating-point operations, which enables the RC4640 to
perform functions such as graphics rendering without requiring exten-
sive die area or power consumption. The single-precision unit of the
RC4640 is directly compatible with the single-precision operation of the
RC4700, and features the same latencies and repeat rates.
The RC4640 does not directly implement the double-precision opera-
tions found in the RC4700. However, to maintain software compatibility,
the RC4640 will signal a trap when a double-precision operation is initi-
ated, allowing the requested function to be emulated in software. Alter-
natively, the system architect could use a software library emulation of
double-precision functions, selected at compile time, to eliminate the
overhead associated with trap and emulation.
Floating-Point Units
The RC4640’s floating-point execution units perform single precision
arithmetic, as specified in IEEE Standard 754. The execution unit is
broken into a separate multiply unit and a combined add/convert/divide/
square root unit. Overlap of multiply and add/subtract is supported. The
multiplier is partially pipelined, allowing a new multiplication instruction
to begin every 6 cycles.
As in the IDT79RC4700, the RC4640 maintains fully precise floating-
point exceptions while allowing both overlapped and pipelined opera-
tions. Precise exceptions are extremely important in mission-critical
environments, such as ADA, and highly desirable for debugging in any
environment.
The floating-point unit’s operation set includes floating-point add,
subtract, multiply, divide, square root, conversion between fixed-point
and floating-point format, conversion among floating-point formats, and
floating-point compare. These operations comply with IEEE Standard
754. Double precision operations are not directly supported; attempts to
execute double-precision floating point operations, or refer directly to
double-precision registers, result in the RC4640 signalling a “trap” to the
CPU, enabling emulation of the requested function. Table 2 gives the
latencies of some of the floating-point instructions in internal processor
cycles.
Operation
ADD
SUB
MUL
DIV
SQRT
CMP
FIX
FLOAT
ABS
MOV
NEG
LWC1
SWC1
4
4
8
32
31
3
4
6
1
1
1
2
1
Instruction
Latency
Table 2 Floating-Point Operation
Floating-Point General Register File
The floating-point register file is made up of thirty-two 32-bit regis-
ters. These registers are used as source or target registers for the
single-precision operations.
References to these registers as 64-bit registers (as supported in the
RC4700) will cause a trap to be signalled to the integer unit.
The floating-point control register space contains two registers; one
for determining configuration and revision information for the copro-
cessor and one for control and status information. These are primarily
involved with diagnostic software, exception handling, state saving and
restoring, and control of rounding modes.
3 of 23
December 5, 2008
IDT79RC4640™
System Control Coprocessor (CP0)
The system control coprocessor in the MIPS architecture is respon-
sible for the virtual to physical address translation and cache protocols,
the exception control system, and the diagnostics capability of the
processor. In the MIPS architecture, the system control coprocessor
(and thus the kernel software) is implementation dependent.
In the RC4640, significant changes in CP0 relative to the RC4600
have been implemented. These changes are designed to simplify
memory management, facilitate debug, and speed real-time processing.
System Control Coprocessor Registers
The RC4640 incorporates all system control co-processor (CP0)
registers on-chip. These registers provide the path through which the
virtual memory system’s address translation is controlled, exceptions
are handled, and operating modes are controlled (kernel vs. user mode,
interrupts enabled or disabled, cache features). In addition, the RC4640
includes registers to implement a real-time cycle counting facility, which
aids in cache diagnostic testing, assists in data error detection, and facil-
itates software debug. Alternatively, this timer can be used as the
operating system reference timer, and can signal a periodic interrupt.
Table 3 shows the CP0 registers of the RC4640.
Number
0
1
2
3
Name
IBase
IBound
DBase
DBound
Function
Instruction address space base
Instruction address space bound
Data address space base
Data address space bound
Not used
Virtual address on address exceptions
Counts every other cycle
Generate interrupt when Count = Compare
Miscellaneous control/status
Exception/Interrupt information
Exception PC
Processor ID
Cache and system attributes
Cache attributes for the 8 512MB regions of the
virtual address space
Instruction breakpoint virtual address
Data breakpoint virtual address
Used in cache diagnostics
Cache diagnostic information
Cache index information
CacheError exception PC
Operation Modes
The RC4640 supports two modes of operation: user mode and
kernel mode. Kernel mode operation is typically used for exception
handling and operating system kernel functions, including CP0 manage-
ment and access to IO devices. In kernel mode, software has access to
the entire address space and all of the co-processor 0 registers, and
can select whether to enable co-processor 1 accesses. The processor
enters kernel mode at reset, and whenever an exception is recognized.
User mode is typically used for applications programs. User mode
accesses are limited to a subset of the virtual address space, and can
be inhibited from accessing CP0 functions.
0xFFFFFFFF
Kernel virtual address space
(kseg2)
Unmapped, 1.0 GB
0xC0000000
0xBFFFFFFF
Uncached kernel physical address space
(kseg1)
Unmapped, 0.5GB
0xA0000000
0x9FFFFFFF
Cached kernel physical address space
(kseg0)
Unmapped, 0.5GB
0x80000000
0x7FFFFFF
4-7, 10, 20-25, -
29, 31
8
9
11
12
13
14
15
16
17
18
19
26
27
28
30
BadVAddr
Count
Compare
Status
Cause
EPC
PRId
Config
CAlg
IWatch
DWatch
ECC
CacheErr
TagLo
ErrorEPC
User virtual address space
(useg)
Mapped, 2.0GB
0x00000000
Figure 1 Mode Virtual Addressing (32-bit mode)
Virtual-to-Physical Address Mapping
The 4GB virtual address space of the RC4640 is shown in Figure 1.
The 4 GB address space is divided into addresses accessible in either
kernel or user mode (kuseg), and addresses only accessible in kernel
mode (kseg2:0).
The RC4640 supports the use of multiple user tasks sharing
common virtual addresses, but mapped to separate physical addresses.
This facility is implemented via the “base-bounds” registers contained in
CP0.
When a user virtual address is asserted (load, store, or instruction
fetch), the RC4640 compares the virtual address with the contents of
the appropriate “bounds” register (instruction or data). If the virtual
Table 3 RC4640 CPO Registers
4 of 23
December 5, 2008
IDT79RC4640™
address is “in bounds”, the value of the corresponding “base” register is
added to the virtual address to form the physical address for that refer-
ence. If the address is not within bounds, an exception is signalled.
This facility enables multiple user processes in a single physical
memory without the use of a TLB. This type of operation is further
supported by a number of development tools for the RC4640, including
real-time operating systems and “position independent code”.
Kernel mode addresses do not use the base-bounds registers, but
rather undergo a fixed virtual-to-physical address translation.
Debug Support
To facilitate software debug, the RC4640 adds a pair of “watch” regis-
ters to CP0. When enabled, these registers will cause the CPU to take
an exception when a “watched” address is appropriately accessed.
Interrupt Vector
The RC4640 also adds the capability to speed interrupt exception
decoding. Unlike the RC4700, which utilizes a single common exception
vector for all exception types (including interrupts), the RC4640 allows
kernel software to enable a separate interrupt exception vector. When
enabled, this vector location speeds interrupt processing by allowing
software to avoid decoding interrupts from general purpose exceptions.
Cache Memory
To keep the RC4640’s high-performance pipeline full and operating
efficiently, the RC4640 incorporates on-chip instruction and data caches
that can each be accessed in a single processor cycle. Each cache has
its own 64-bit data path and can be accessed in parallel. The cache
subsystem provides the integer and floating-point units with an aggre-
gate bandwidth of over 3200 MB per second at a pipeline clock
frequency of 267MHz. The cache subsystem is similar in construction to
that found in the RC4700, although some changes have been imple-
mented. Table 4 is an overview of the caches found on the RC4640.
Instruction Cache
The RC4640 incorporates a two-way set associative on-chip instruc-
tion cache. This virtually indexed, physically tagged cache is 8KB in size
and is parity protected.
Because the cache is virtually indexed, the virtual-to-physical
address translation occurs in parallel with the cache access, thus further
increasing performance by allowing these two operations to occur simul-
taneously. The tag holds a 20-bit physical address and valid bit, and is
parity protected.
The instruction cache is 64-bits wide, and can be refilled or accessed
in a single processor cycle. Instruction fetches require only 32 bits per
cycle, for a peak instruction bandwidth of 1068MB/sec at 267MHz.
Sequential accesses take advantage of the 64-bit fetch to reduce power
dissipation, and cache miss refill, can write 64 bits-per-cycle to minimize
the cache miss penalty. The line size is eight instructions (32 bytes) to
maximize performance.
In addition, the contents of one set of the instruction cache (set “A”)
can be “locked” by setting a bit in a CP0 register. Locking the set
prevents its contents from being overwritten by a subsequent cache
miss; refill occurs then only into “set B”.
This operation effectively “locks” time critical code into one 4kB set,
while allowing the other set to service other instruction streams in a
normal fashion. Thus, the benefits of cached performance are achieved,
while deterministic real-time response is preserved.
Data Cache
For fast, single cycle data access, the RC4640 includes an 8KB on-
chip data cache that is two-way set associative with a fixed 32-byte
(eight words) line size. Table 4 lists the RC4640 cache attributes.
Characteristics
Size
Organization
Line size
Index
Tag
Write policy
Line transfer order
Instruction
8KB
8KB
Data
2-way set associative 2-way set associative
32B
vAddr
11..0
pAddr
31..12
n.a.
read sub-block order
write sequential
32B
vAddr
11..0
pAddr
31..12
writeback /writethru
read sub-block order
write sequential
first word
per-byte
set A
Miss restart after transfer of
Parity
Cache locking
entire line
per-word
set A
Table 4 RC4640 Cache Attributes
The data cache is protected with byte parity and its tag is protected
with a single parity bit. It is virtually indexed and physically tagged to
allow simultaneous address translation and data cache access
The normal write policy is writeback, which means that a store to a
cache line does not immediately cause memory to be updated. This
increases system performance by reducing bus traffic and eliminating
the bottleneck of waiting for each store operation to finish before issuing
a subsequent memory operation. Software can however select write-
through for certain address ranges, using the CAlg register in CP0.
Cache protocols supported for the data cache are:
Uncached.
Addresses in a memory area indicated as uncached will not be
read from the cache. Stores to such addresses will be written
directly to main memory, without changing cache contents.
Writeback.
Loads and instruction fetches will first search the cache, reading
main memory only if the desired data is not cache resident. On
data store operations, the cache is first searched to see if the
target address is cache resident. If it is resident, the cache con-
5 of 23
December 5, 2008
查看更多>
热门器件
热门资源推荐
器件捷径:
S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 SA SB SC SD SE SF SG SH SI SJ SK SL SM SN SO SP SQ SR SS ST SU SV SW SX SY SZ T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 TA TB TC TD TE TF TG TH TI TJ TK TL TM TN TO TP TQ TR TS TT TU TV TW TX TY TZ U0 U1 U2 U3 U4 U6 U7 U8 UA UB UC UD UE UF UG UH UI UJ UK UL UM UN UP UQ UR US UT UU UV UW UX UZ V0 V1 V2 V3 V4 V5 V6 V7 V8 V9 VA VB VC VD VE VF VG VH VI VJ VK VL VM VN VO VP VQ VR VS VT VU VV VW VX VY VZ W0 W1 W2 W3 W4 W5 W6 W7 W8 W9 WA WB WC WD WE WF WG WH WI WJ WK WL WM WN WO WP WR WS WT WU WV WW WY X0 X1 X2 X3 X4 X5 X7 X8 X9 XA XB XC XD XE XF XG XH XK XL XM XN XO XP XQ XR XS XT XU XV XW XX XY XZ Y0 Y1 Y2 Y4 Y5 Y6 Y9 YA YB YC YD YE YF YG YH YK YL YM YN YP YQ YR YS YT YX Z0 Z1 Z2 Z3 Z4 Z5 Z6 Z8 ZA ZB ZC ZD ZE ZF ZG ZH ZJ ZL ZM ZN ZP ZR ZS ZT ZU ZV ZW ZX ZY
需要登录后才可以下载。
登录取消