Low-Cost 64-bit
RISController
w/DSP Capability
)HDWXUHV
)HDWXUHV
x
x
IDT79RC4650
™
High-performance embedded 64-bit microprocessor
– 64-bit integer operations
– 64-bit registers
– 100MHz, 133MHz, 150 MHz, 180MHz, 200MHz and 267MHz
operation frequencies
x
High-performance DSP capability
– 133.5 Million Integer Multiply-Accumulate Operations/sec @
267 MHz
x
High-performance microprocessor
– 133.5 M Mul-Add/second at 267MHz
– 89 MFL0P/s at 250MHz
– >640,000 dhrystone (2.1)/sec capability at 267MHz
(352 dhrystone MIPS)
x
High level of integration
– 64-bit, 267 MHz integer CPU
– 8KB instruction cache; 8KB data cache
– Integer multiply unit with 133.5M Mul-Add/sec
x
Low-power operation
– Active power management powers-down inactive units
– Standby mode
x
Upwardly software compatible with IDT RISController
Family
Large, efficient on-chip caches
– Separate 8kB Instruction and 8kB Data caches
– Over 3200MB/sec bandwidth from internal caches
– 2-set associative
– Write-back and write-through support
– Cache locking to facilitate deterministic response
x
Bus compatible with RC4000 family
– System interface provides bandwidth up to 1000 MB/S
– Direct interface to 32-bit wide or 64-bit wide systems
– Synchronized to external reference clock for multi-master
operation
– Socket compatible with IDT RC64475 and RC64575
x
Improved real-time support
– Fast interrupt decode
Optional cache locking
x
Note:“R” refers to 5V parts; “RV” refers to 3.3V parts; “RC”
refers to both
%ORFN#'LDJUDP
352 M IP S 64-bit C P U
64-bit register file
System C on trol C oprocessor
A d dress Tran slation/
C ache A ttribu te C ontrol
89M F L O P S Sin gle-P recision F PA
FP register file
P ip elin e C o n tro l
L oad aligner
S tore A ligner
Logic U nit
H igh-Perform an ce
In teger M u ltiply
Exception M an agem en t
F u nctions
P ip e lin e C o n tro l
64-b it ad der
P ack /U n pack
F P A dd /S ub /C vt/
D iv/S qrt
FP M ultiply
C ontrol B us
D ata Bu s
Instru ction Bu s
Instru ction C ach e
S et A
(Lock able)
In stru ctio n C a che
Set B
3 2 -/6 4 -b it
S y n c h ro n ize d
S y ste m In ter fa ce
D ata C ach e
S et A
(Lockable)
Data Cache
Set B
The IDT logo is a registered trademark and ORION, RC4600, RC4650, RV4650, RC4700, RC3081, RC3052, RC3051, RC3041, RISController, and RISCore are trademarks of Integrated Device Technology, Inc.
1 of 25
©
2000 Integrated Device Technology, Inc.
March 28, 2000
DSC 3149/3
IDT79RC4650™
'HVFULSWLRQ
The IDT79RC4650 is a low-cost member of the IDT Microprocessor
family, targeted to a variety of performance-hungry embedded applica-
tions. The RC4650 continues the IDT tradition of high-performance
through high-speed pipelines, high-bandwidth caches and bus interface,
64-bit architecture, and careful attention to efficient control. The RC4650
reduces the cost of this performance relative to the RC4700 by removing
functional units that are frequently unneeded for many embedded appli-
cations, such as double-precision floating point arithmetic and a TLB.
The RC4650 adds features relative to the RC4700, reflective of its
target applications. These features enable system cost reduction (e.g.,
optional 32-bit system interface) as well as higher performance for
certain types of systems (e.g., cache locking, improved real-time
support, integer DSP capability).
The RC4650 supports a wide variety of embedded processor-based
applications, such as consumer game systems, multi-media functions,
internetworking equipment, switching equipment, and printing systems.
Upwardly software-compatible with the RC3000 family, and bus- and
upwardly software-compatible with the IDT RC4000/RC5000 family, the
RC4650 will serve in many of the same applications, but, in addition
supports other applications such as those requiring integer DSP func-
tions.
The RC64475 and RC64575 processors offer a direct migration path
for designs based on IDT’s RC4650 processors, through full pin and
socket compatibility.
The RC4650 brings 64-bit performance levels to lower cost systems.
High performance is preserved by retaining large on-chip caches that
are two-way set associative, a streamlined high-speed pipeline, high-
bandwidth, 64-bit execution, and facilities such as early restart for data
cache misses. These techniques combine to allow the system designer
3.2GB/sec aggregate bandwidth, 1000 MB/sec bus bandwidth, 352
Dhrystone MIPS, 89 MFlops, and 133.5 M Multiply-add/second.
The RC4650 provides complete upward application-software
compatibility with the IDT79RC32300
™
and IDT79RC64xxx
™
families of
microprocessors. An array of development tools facilitates the rapid
development of RC4650-based systems, enabling a wide variety of
customers to take advantage of the high-performance capabilities of the
processor while maintaining short time to market goals.
The 64-bit computing capability of the RC4650 enables a wide
variety of capabilities previously limited by the lower bandwidth and bit-
manipulation rates inherent in 32-bit architectures. For example, the
RC4650 can perform loads and stores from cached memory at the rate
of 8-bytes every clock cycle, doubling the bandwidth of an equivalent 32-
bit processor. This capability, coupled with the high clock rate for the
RC4650 pipeline, enables new levels of performance to be obtained
from embedded systems.
This data sheet provides an overview of the features and architecture
of the RC4650 CPU. A more detailed description of the processor is
available in the
IDT79RC4650 Processor Hardware User’s Manual,
available from IDT. Further information on development support, appli-
cations notes, and complementary products are also available from your
local IDT sales representative.
+DUGZDUH#2YHUYLHZ
The RC4650 family brings a high-level of integration designed for
high-performance computing. The key elements of the RC4650 are
briefly described below. A more detailed description of each of these
subsystems is available in the User’s Manual.
3LSHOLQH
The RC4650 uses a 5-stage pipeline similar to the IDT79RC3000
and the IDT79RC4700. The simplicity of this pipeline allows the RC4650
to be lower cost and lower power than super-scalar or super-pipelined
processors. Unlike superscalar processors, applications that have large
data dependencies or that require a great deal of load/stores can still
achieve performance close to the peak performance of the processor.
General Purpose Registers
63
0
r1
r2
•
•
•
•
r29
0
Multiply/Divide Registers
63
HI (Accumulate HI)
63
LO (Accumulate LO)
0
0
Program Counter
63
0
Figure 1 CPU Registers
32
310
PC
2 of 25
March 28, 2000
IDT79RC4650™
,QWHJHU#([HFXWLRQ#(QJLQH
The RC4650 implements the MIPS-III Instruction Set Architecture
and is upwardly compatible with applications that run on the earlier
generation parts. The RC4650 includes the same additions to the
instruction set found in the RC4700 family of microprocessors, targeted
at improving performance and capability while maintaining binary
compatibility with earlier RC3000 processors.
The extensions result in better code density, greater multi-processing
support, improved performance for commonly used code sequences in
operating system kernels, and faster execution of floating-point intensive
applications. All resource dependencies are made transparent to the
programmer, insuring transportability among implementations of the
MIPS instruction set architecture. In addition, MIPS-III specifies new
instructions defined to take advantage of the 64-bit architecture of the
processor.
Finally, the RC4650 also implements additional instructions, which
are considered extensions to the MIPS-III architecture. These instruc-
tions improve the multiply and multiply-add throughput of the CPU,
making it well suited to a wide variety of imaging and DSP applications.
These extensions, which use opcodes allocated by MIPS Technologies
for this purpose, are supported by a wide variety of development tools.
The MIPS integer unit implements a load/store architecture with
single cycle ALU operations (logical, shift, add, sub) and autonomous
multiply/divide unit. The 64-bit register resources include: 32 general-
purpose orthogonal integer registers, the HI/LO result registers for the
integer multiply/divide unit, and the program counter. In addition, the on-
chip floating-point co-processor adds 32 floating-point registers, and a
floating-point control/status register.
5HJLVWHU#)LOH
The RC4650 has thirty-two general-purpose 64-bit registers. These
registers are used for scalar integer operations and address calculation.
The register file consists of two read ports and one write port and is fully
bypassed to minimize operation latency in the pipeline. Figure 1 illus-
trates the RC4650 Register File.
$/8
The RC4650 ALU consists of the integer adder and logic unit. The
adder performs address calculations in addition to arithmetic operations,
and the logic unit performs all logical and shift operations. Each of these
units is highly optimized and can perform an operation in a single pipe-
line cycle.
,QWHJHU#0XOWLSO\2'LYLGH
The RC4650 uses a dedicated integer multiply/divide unit, optimized
for high-speed multiply and multiply-accumulate operation. Table 1
shows the performance, expressed in terms of pipeline clocks, achieved
by the RC4650 integer multiply unit.
2SFRGH
MULT/U, MAD/U
2SHUDQG#
6L]H
16 bit
32 bit
/DWHQF\
3
4
3
4
6
36
68
5HSHDW
2
3
2
3
5
36
68
6WDOO
0
0
1
2
0
0
0
MUL
16 bit
32 bit
DMULT,
DMULTU
DIV, DIVU
DDIV, DDIVU
any
any
any
Table 1 RC4650 Integer Multiply Operation
The MIPS-III architecture defines that the results of a multiply or
divide operation are placed in the HI and LO registers. The values can
then be transferred to the general purpose register file using the MFHI/
MFLO instructions.
The RC4650 adds a new multiply instruction, “MUL”, which can
specify that the multiply results bypass the “Lo” register and are placed
immediately in the primary register file. By avoiding the explicit “Move-
from-Lo” instruction required when using “Lo”, throughput of multiply-
intensive operations is increased.
An additional enhancement offered by the RC4650 is an atomic
“multiply-add” operation, MAD, used to perform multiply-accumulate
operations. This instruction multiplies two numbers and adds the product
to the current contents of the HI and LO registers. This operation is used
in numerous DSP algorithms, and allows the RC4650 to cost reduce
systems requiring a mix of DSP and control functions.
Finally, aggressive implementation techniques feature low latency for
these operations along with pipelining to allow new operations to be
issued before a previous one has fully completed. Table 1 also shows
the repeat rate (peak issue rate), latency, and number of processor stalls
required for the various operations. The RC4650 performs automatic
operand size detection to determine the size of the operand, and imple-
ments hardware interlocks to prevent overrun, allowing this high-perfor-
mance to be achieved with simple programming.
)ORDWLQJ03RLQW#&R03URFHVVRU
The RC4650 incorporates an entire single-precision floating-point co-
processor on chip, including a floating-point register file and execution
units. The floating-point co-processor forms a “seamless” interface with
the integer unit, decoding and executing instructions in parallel with the
integer unit.
The RC4650’s floating-point unit directly implements single-precision
floating-point operations. This enables the RC4650 to perform functions
such as graphics rendering, without requiring extensive die are or power
consumption.
The RC4650 does not directly implement the double-precision opera-
tions found in the RC64475. However, to maintain software compatibility,
3 of 25
March 28, 2000
IDT79RC4650™
the RC4650 will signal a trap when a double-precision operation is initi-
ated, allowing the requested function to be emulated in software. Alter-
natively, the system architect could use a software library emulation of
double-precision functions, selected at compile time, to eliminate the
overhead associated with trap and emulation.
)ORDWLQJ03RLQW#8QLWV
The RC4650 floating-point execution units perform single precision
arithmetic, as specified in the IEEE Standard 754. The execution unit is
broken into a separate multiply unit and a combined add/convert/divide/
square root unit. Overlap of multiplies and add/subtract is supported.
The multiplier is partially pipelined, allowing a new multiply to begin
every 6 cycles.
As in the IDT79RC64475, the RC4650 maintains fully precise
floating-point exceptions while allowing both overlapped and pipelined
operations. Precise exceptions are extremely important in mission-crit-
ical environments, such as ADA, and highly desirable for debugging in
any environment.
The floating-point unit’s operation set includes floating-point add,
subtract, multiply, divide, square root, conversion between fixed-point
and floating-point format, conversion among floating-point formats, and
floating-point compare.These operations comply with IEEE Standard
754. Double precision operations are not directly supported; attempts to
execute double-precision floating point operations, or refer directly to
double-precision registers, result in the RC4650 signalling a “trap” to the
CPU, enabling emulation of the requested function. Table 2 gives the
latencies of some of the floating-point instructions in internal processor
cycles.
2SHUDWLRQ
ADD
SUB
MUL
DIV
SQRT
CMP
FIX
FLOAT
ABS
MOV
NEG
LWC1
SWC1
,QVWUXFWLRQ#/DWHQF\
4
4
8
32
31
3
4
6
1
1
1
2
1
Table 2 Floating-Point Operation
)ORDWLQJ03RLQW#*HQHUDO#5HJLVWHU#)LOH
The floating-point register file is made up of thirty-two 32-bit regis-
ters. These registers are used as source or target registers for the
single-precision operations. References to these registers as 64-bit
registers (as supported in the RC64475) will cause a trap to be
signalled.
The floating-point control register space contains two registers; one
for determining configuration and revision information for the copro-
cessor and one for control and status information. These are primarily
involved with diagnostic software, exception handling, state saving and
restoring, and control of rounding modes.
6\VWHP#&RQWURO#&R03URFHVVRU#+&33,
The system control co-processor in the MIPS architecture is respon-
sible for the virtual to physical address translation and cache protocols,
the exception control system, and the diagnostics capability of the
processor. In the MIPS architecture, the system control co-processor
(and thus the kernel software) is implementation dependent.
In the RC4650, significant changes in CP0—relative to the
RC4700—have been implemented. These changes are designed to
simplify memory management, facilitate debug, and speed real-time
processing.
6\VWHP#&RQWURO#&R03URFHVVRU#5HJLVWHUV
The RC4650 incorporates all system control co-processor (CP0)
registers on-chip. These registers provide the path through which the
virtual memory system’s address translation is controlled, exceptions
are handled, and operating modes are controlled (kernel vs. user mode,
interrupts enabled or disabled, cache features). In addition, the RC4650
includes registers to implement a real-time cycle counting facility, which
aids in cache diagnostic testing, assists in data error detection, and
facilitates software debug. Alternatively, this timer can be used as the
operating system reference timer, and can signal a periodic interrupt.
Table 3 shows the CP0 registers of the RC4650.
1XPEHU
0
1
2
3
4-7, 10, 20-
25, 29, 31
8
9
11
12
13
1DPH
IBase
IBound
DBase
DBound
—
BadVAddr
Count
Compare
Status
Cause
)XQFWLRQ
Instruction address space base
Instruction address space bound
Data address space base
Data address space bound
Not used
Virtual address on address exceptions
Counts every other cycle
Generate interrupt when Count = Compare
Miscellaneous control/status
Exception/Interrupt information
Table 3 RC4650 CPO Registers (Page 1 of 2)
4 of 25
March 28, 2000
IDT79RC4650™
1XPEHU
14
15
16
17
18
19
26
27
28
30
1DPH
EPC
PRId
Config
CAlg
IWatch
DWatch
ECC
CacheErr
TagLo
ErrorEPC
Exception PC
Processor ID
Cache and system attributes
Cache attributes for the eight 512MB regions of
the virtual address space
Instruction breakpoint virtual address
Data breakpoint virtual address
Used in cache diagnostics
Cache diagnostics
Cache index
CacheError exception PC
0x80000000
0x7FFFFFF
0xA0000000
0x9FFFFFFF
Cached kernel physical address space
(kseg0)
Unmapped, 0.5GB
0xC0000000
0xBFFFFFFF
Uncached kernel physical address space
(kseg1)
Unmapped, 0.5GB
0xFFFFFFFF
Kernel virtual address space
(kseg2)
Unmapped, 1.0 GB
)XQFWLRQ
Kernel mode addresses do not use the base-bounds registers, but
rather undergo a fixed virtual-to-physical address translation.
Table 3 RC4650 CPO Registers (Page 2 of 2)
2SHUDWLRQ#0RGHV
The RC4650 supports two modes of operation: user mode and
kernel mode. Kernel mode operation is typically used for exception
handling and operating system kernel functions, including CP0 manage-
ment and access to IO devices. In kernel mode, software has access to
the entire address space and all of the co-processor 0 registers, and can
select whether to enable co-processor 1 accesses. The processor
enters kernel mode at reset, and whenever an exception is recognized.
User mode is typically used for applications programs. User mode
accesses are limited to a subset of the virtual address space and can be
inhibited from accessing CP0 functions
9LUWXDO0WR03K\VLFDO#$GGUHVV#0DSSLQJ
The 4GB virtual address space of the RC4650 is shown in Figure 2.
The 4 GB address space is divided into addresses accessible in either
kernel or user mode (kuseg), and addresses only accessible in kernel
mode (kseg2:0).
The RC4650 supports the use of multiple user tasks sharing
common virtual addresses, but mapped to separate physical addresses.
This facility is implemented via the “base-bounds” registers contained in
CP0.
When a user virtual address is asserted (load, store, or instruction
fetch), the RC4650 compares the virtual address with the contents of the
appropriate “bounds” register (instruction or data). If the virtual address
is “in bounds”, the value of the corresponding “base” register is added to
the virtual address to form the physical address for that reference. If the
address is not within bounds, an exception is signalled.
This facility enables multiple user processes in a single physical
memory without the use of a TLB. This type of operation is further
supported by a number of development tools for the RC4650, including
real-time operating systems and “position independent code.”
User virtual address space
(useg)
Mapped, 2.0GB
0x00000000
Figure 2 Kernel/User Mode Virtual Addressing (32-bit mode)
'HEXJ#6XSSRUW
To facilitate software debug, the RC4650 adds a pair of “watch” regis-
ters to CP0. When enabled, these registers will cause the CPU to take
an exception when a “watched” address is appropriately accessed.
,QWHUUXSW#9HFWRU
The RC4650 also adds the capability to speed interrupt exception
decoding. Unlike the RC4700, which utilizes a single common exception
vector for all exception types (including interrupts), the RC4650 allows
kernel software to enable a separate interrupt exception vector. When
enabled, this vector location speeds interrupt processing by allowing
software to avoid decoding interrupts from general purpose exceptions.
&DFKH#0HPRU\
To keep the RC4650’s high-performance pipeline full and operating
efficiently, the RC4650 incorporates on-chip instruction and data caches
that can each be accessed in a single processor cycle. Each cache has
its own 64-bit data path and can be accessed in parallel. The cache
subsystem provides the integer and floating-point units with an aggre-
gate bandwidth of over 3200 MB per second at a pipeline clock
frequency of 267MHz. The cache subsystem is similar in construction to
that found in the RC4700, although some changes have been imple-
mented. Table 4 is an overview of the caches found on the RC4650.
5 of 25
March 28, 2000