x
-
-
-
High-performance embedded 64-bit microprocessor
64-bit integer operations
64-bit registers
100MHz, 133MHz, 150 MHz, 180MHz, and 200MHz
operation frequencies
x
High-performance DSP capability
-
100 Million Integer Multiply-Accumulate Operations/sec
@ 200 MHz
-
67 MFlops floating point operations @200MHz
x
High-performance microprocessor
-
100 M Mul-Add/second at 200MHz
-
67 MFLOP/s at 200MHz
-
>500,000 dhrystone (2.1)/sec capability at 200MHz
(265 dhrystone MIPS)
x
High level of integration
-
64-bit, 200 MHz integer CPU
-
67MFlops single-precision floating-point unit
-
8KB instruction cache; 8KB data cache
-
Integer multiply unit with 100M Mul-Add/sec
"
! " #
200 MIPS 64-bit ORION CPU
64-bit register file
x
-
-
Low-power operation
Active power management powers-down inactive units
Standby mode
x
Upwardly software compatible with IDT RISController
Family
x
Large, efficient on-chip caches
-
Separate 8kB Instruction and 8kB Data caches
-
Over 2400MB/sec bandwidth from internal caches
-
2-set associative
-
Write-back and write-through support
-
Cache locking to facilitate deterministic response
x
Bus compatible with RC4000 family
-
System interface provides bandwidth up to 800 MB/S
-
Direct interface to 32-bit wide or 64-bit wide systems
-
Synchronized to external reference clock for multi-master
operation
x
Improved real-time support
-
Fast interrupt decode
-
Optional cache locking
System Control Coprocessor
Address Translation/
Cache Attribute Control
67MFLOPS Single-Precision FPA
FP register file
64-bit adder
Pipeline Control
Pipeline Control
Pack/Unpack
Load aligner
Store Aligner
Logic Unit
High-Performance
Integer Multiply
Exception Management
Functions
FP Add/Sub/Cvt/
Div/Sqrt
FP Multiply
Control Bus
Data Bus
Instruction Bus
Instruction Cache
Set A
(Lockable)
Instruction Cache
Set B
32-/64-bit
Synchronized
System Interface
Data Cache
Set A
(Lockable)
Data Cache
Set B
The IDT logo is a registered trademark and ORION, RC4600, RC4650, RV4650, RC4700, RC3081, RC3052, RC3051, RC3041, RISController, and RISCore are trademarks of Integrated Device Technology, Inc.
©
1998 Integrated Device Technology, Inc.
August 1998
DSC3149/2
The 64-bit computing capability of the RC4650 enables a wide
variety of capabilities previously limited by the lower bandwidth and
bit-manipulation rates inherent in 32-bit architectures. For example,
the RC4650 can perform loads and stores from cached memory at
the rate of 8-bytes every clock cycle, doubling the bandwidth of an
equivalent 32-bit processor. This capability, coupled with the high
clock rate for the RC4650 pipeline, enables new levels of perfor-
mance to be obtained from embedded systems.
This data sheet provides an overview of the features and architec-
ture of the RC4650 CPU. A more detailed description of the
processor is available in the
IDT79RC4650 Processor Hardware
User’s Manual,
available from IDT. Further information on develop-
ment support, applications notes, and complementary products are
also available from your local IDT sales representative.
The IDT79RC4650 is a low-cost member of the IDT Micropro-
cessor family, targeted to a variety of performance hungry
embedded applications. The RC4650 continues the IDT tradition of
high-performance through high-speed pipelines, high-bandwidth
caches and bus interface, 64-bit architecture, and careful attention
to efficient control. The RC4650 reduces the cost of this perfor-
mance relative to the RC4700, by removing functional units that
are frequently unneeded for many embedded applications, such as
double-precision floating point arithmetic and a TLB.
The RC4650 adds features relative to the RC4700, reflective of
its target applications. These features enable system cost reduc-
tion (e.g. optional 32-bit system interface) as well as higher perfor-
mance for certain types of systems (e.g. cache locking, improved
real-time support, integer DSP capability).
The RC4650 supports a wide variety of embedded processor-
based applications, such as consumer game systems, multi-media
functions, internetworking equipment, switching equipment, and
printing systems. Upwardly software-compatible with the RC3000
family, and bus- and upwardly software-compatible with the IDT
RC4000/RC5000 family, the RC4650 will serve in many of the
same applications, but, in addition supports other applications such
as those requiring integer DSP functions.
The RC4650 brings 64-bit performance levels to lower cost
systems. High performance is preserved by retaining large on-chip
caches that are two-way set associative, a streamlined high-speed
pipeline, high-bandwidth, 64-bit execution, and facilities such as
early restart for data cache misses. These techniques combine to
allow the system designer 3GB/sec aggregate bandwidth, 800 MB/
sec bus bandwidth, 265 Dhrystone MIPS, 67 MFlops, and 100 M
Multiply-add/second.
The RC4650 provides complete upward application-software
compatibility with the IDT79RC3000
™
and IDT79RC4700
™
families
of microprocessors.An array of development tools facilitates the
rapid development of RC4650-based systems, enabling a wide
variety of customers to take advantage of the high-performance
capabilities of the processor while maintaining short time to market
goals.
General Purpose Registers
63
0
r1
r2
•
•
•
•
r29
0
The RC4650 family brings a high-level of integration designed for
high-performance computing. The key elements of the RC4650 are
briefly described below. A more detailed description of each of these
subsystems is available in the User’s Manual.
The RC4650 uses a 5-stage pipeline similar to the IDT79RC3000
and the IDT79RC4700. The simplicity of this pipeline allows the
RC4650 to be lower cost and lower power than super-scalar or super-
pipelined processors. Unlike superscalar processors, applications
that have large data dependencies or that require a great deal of
load/stores can still achieve performance close to the peak
performance of the processor. Figure 2 shows the RC4650 pipeline.
The RC4650 implements the MIPS-III Instruction Set Architecture
and is upwardly compatible with applications that run on the earlier
generation parts. The RC4650 includes the same additions to the
instruction set found in the RC4700 family of microprocessors,
targeted at improving performance and capability while maintaining
binary compatibility with earlier RC3000 processors.
Multiply/Divide Registers
63
HI (Accumulate HI)
63
LO (Accumulate LO)
0
0
Program Counter
63
310
PC
Figure 1: CPU Registers
I
0
1I
2I
1R
2R
1A
2A
1D
2D
1W
2W
I
1
1I
2I
1R
2R
1A
2A
1D
2D
1W
2W
I
2
1I
2I
1R
2R
1A
2A
1D
2D
1W
•••
I
3
1I
2I
1R
2R
1A
2A
1D
•••
I
4
1I
2I
1R
2R
1A
•••
one cycle
2A-2D
1D
1D-2D
2R
2R
2R
2R
1A
1A-2A
1A
2A
1A
2W
Data cache access and load align
! " !
#!$"% ! !&& !!
Data virtual-to-physical address translation
Virtual-to-physical address translation
Register file read
Bypass calculation
Instruction decode
Branch address calculation
Issue or slip decision
Integer add, logical, shift
Data virtual address calculation
Store align
Branch decision
Register file write
Figure 2: RC4650 Pipeline
The extensions result in better code density, greater multi-
processing support, improved performance for commonly used
code sequences in operating system kernels, and faster execution
of floating-point intensive applications. All resource dependencies
are made transparent to the programmer, insuring transportability
among implementations of the MIPS instruction set architecture. In
addition, MIPS-III specifies new instructions defined to take advan-
tage of the 64-bit architecture of the processor.
Finally, the RC4650 also implements additional instructions, which
are considered extensions to the MIPS-III architecture. These instruc-
tions improve the multiply and multiply-add throughput of the CPU,
making it well suited to a wide variety of imaging and DSP applica-
tions. These extensions, which use opcodes allocated by MIPS Tech-
nologies for this purpose, are supported by a wide variety of
development tools.
The MIPS integer unit implements a load/store architecture with
single cycle ALU operations (logical, shift, add, sub) and autono-
mous multiply/divide unit. The 64-bit register resources include: 32
general-purpose orthogonal integer registers, the HI/LO result
registers for the integer multiply/divide unit, and the program
counter. In addition, the on-chip floating-point co-processor adds
32 floating-point registers, and a floating-point control/status
register.
The RC4650 adds a new multiply instruction, “MUL”, which can
specify that the multiply results bypass the “Lo” register and are
placed immediately in the primary register file. By avoiding the explicit
“Move-from-Lo” instruction required when using “Lo”, throughput of
multiply-intensive operations is increased.
An additional enhancement offered by the RC4650 is an atomic
“multiply-add” operation, MAD, used to perform multiply-accumulate
operations. This instruction multiplies two numbers and adds the
product to the current contents of the HI and LO registers. This oper-
ation is used in numerous DSP algorithms, and allows the RC4650 to
cost reduce systems requiring a mix of DSP and control functions.
Finally, aggressive implementation techniques feature low latency
for these operations along with pipelining to allow new operations to
be issued before a previous one has fully completed. Table 1 also
shows the repeat rate (peak issue rate), latency, and number of
processor stalls required for the various operations. The RC4650
performs automatic operand size detection to determine the size of
the operand, and implements hardware interlocks to prevent overrun,
allowing this high-performance to be achieved with simple program-
ming.
The RC4650 has thirty-two general-purpose 64-bit registers.
These registers are used for scalar integer operations and address
calculation. The register file consists of two read ports and one
write port and is fully bypassed to minimize operation latency in the
pipeline. Figure 1 illustrates the RC4650 Register File.
The RC4650 ALU consists of the integer adder and logic unit.
The adder performs address calculations in addition to arithmetic
operations, and the logic unit performs all logical and shift opera-
tions. Each of these units is highly optimized and can perform an
operation in a single pipeline cycle.
The RC4650 incorporates an entire single-precision floating-point
co-processor on chip, including a floating-point register file and
The RC4650 uses a dedicated integer multiply/divide unit, opti- execution units. The floating-point co-processor forms a “seamless”
mized for high-speed multiply and multiply-accumulate operation. interface with the integer unit, decoding and executing instructions in
Table 1 shows the performance, expressed in terms of pipeline parallel with the integer unit.
The RC4650’s floating-point unit directly implements single-preci-
clocks, achieved by the RC4650 integer multiply unit.
sion floating-point operations. This enables the RC4650 to perform
functins such as graphics rendiering, without requiring extensive die
are or power consumption.
'$ &
(
The RC4650 does not directly implement the double-precision
operations found in the RC4700. However, to maintain software
MULT/U, MAD/U
16 bit
3
2
0
compatibility, the RC4650 will signal a trap when a double-precision
32 bit
4
3
0
operation is initiated, allowing the requested function to be emulated
in software. Alternatively, the system architect could use a software
MUL
16 bit
3
2
1
library emulation of double-precision functions, selected at compile
32 bit
4
3
2
time, to eliminate the overhead associated with trap and emulation.
DMULT,
DMULTU
DIV, DIVU
any
any
6
36
5
36
0
0
The RC4650 floating-point execution units perform single preci-
sion arithmetic, as specified in the IEEE Standard 754. The execution
DDIV, DDIVU
any
68
68
0
unit is broken into a separate multiply unit and a combined add/
convert/divide/square root unit. Overlap of multiplies and add/subtract
Table 1: RC4650 Integer Multiply Operation
is supported. The multiplier is partially pipelined, allowing a new
multiply to begin every 6 cycles.
The MIPS-III architecture defines that the results of a multiply or
As in the IDT79RC4700, the RC4650 maintains fully precise
divide operation are placed in the HI and LO registers. The values floating-point exceptions while allowing both overlapped and pipe-
can then be transferred to the general purpose register file using lined operations. Precise exceptions are extremely important in
the MFHI/MFLO instructions.
mission-critical environments, such as ADA, and highly desirable for
debugging in any environment.
The floating-point unit’s operation set includes floating-point
add, subtract, multiply, divide, square root, conversion between
fixed-point and floating-point format, conversion among floating-
point formats, and floating-point compare.These operations comply
with IEEE Standard 754. Double precision operations are not
directly supported; attempts to execute double-precision floating
point operations, or refer directly to double-precision registers,
result in the RC4650 signalling a “trap” to the CPU, enabling
emulation of the requested function.
Table 2: gives the latencies of some of the floating-point instruc-
tions in internal processor cycles.
"
#$
!"
#$%
The system control co-processor in the MIPS architecture is
responsible for the virtual to physical address translation and cache
protocols, the exception control system, and the diagnostics capa-
bility of the processor. In the MIPS architecture, the system control
co-processor (and thus the kernel software) is implementation depen-
dent.
In the RC4650, significant changes in CP0—relative to the
RC4700—have been implemented. These changes are designed to
simplify memory management, facilitate debug, and speed real-time
processing.
'$!
ADD
SUB
MUL
DIV
SQRT
CMP
FIX
FLOAT
ABS
MOV
NEG
LWC1
SWC1
! %
4
4
8
32
31
3
4
6
1
1
1
2
1
"
!"
The RC4650 incorporates all system control co-processor (CP0)
registers on-chip. These registers provide the path through which the
virtual memory system’s address translation is controlled, exceptions
are handled, and operating modes are controlled (kernel vs. user
mode, interrupts enabled or disabled, cache features). In addition, the
RC4650 includes registers to implement a real-time cycle counting
facility, which aids in cache diagnostic testing, assists in data error
detection, and facilitates software debug. Alternatively, this timer can
be used as the operating system reference timer, and can signal a
periodic interrupt.
Table 3 shows the CP0 registers of the RC4650.
)*
0
1
2
3
4-7, 10, 20-
25, 29, 31
8
9
11
12
13
14
15
16
17
18
)!*
IBase
IBound
DBase
DBound
—
BadVAddr
Count
Compare
Status
Cause
EPC
PRId
Config
CAlg
IWatch
+
Instruction address space base
Instruction address space bound
Data address space base
Data address space bound
Not used
Virtual address on address exceptions
Counts every other cycle
Generate interrupt when Count = Com-
pare
Miscellaneous control/status
Exception/Interrupt information
Exception PC
Processor ID
Cache and system attributes
Cache attributes for the eight 512MB
regions of the virtual address space
Instruction breakpoint virtual address
Table 2: Floating-Point Operation
The floating-point register file is made up of thirty-two 32-bit
registers. These registers are used as source or target registers for
the single-precision operations.
References to these registers as 64-bit registers (as supported
in the RC4700) will cause a trap to be signalled.
The floating-point control register space contains two registers;
one for determining configuration and revision information for the
coprocessor and one for control and status information. These are
primarily involved with diagnostic software, exception handling,
state saving and restoring, and control of rounding modes.
Table 3: RC4650 CPO Registers