79RV4650-100MS,79RV4650-100MS pdf中文资料,79RV4650-100MS引脚图,79RV4650-100MS电路-Datasheet-电子工程世界

文档预览

High-performance embedded 64-bit microprocessor

64-bit integer operations

64-bit registers

100MHz, 133MHz, 150 MHz, 180MHz, and 200MHz

operation frequencies

High-performance DSP capability

100 Million Integer Multiply-Accumulate Operations/sec

@ 200 MHz

67 MFlops floating point operations @200MHz

High-performance microprocessor

100 M Mul-Add/second at 200MHz

67 MFLOP/s at 200MHz

>500,000 dhrystone (2.1)/sec capability at 200MHz

(265 dhrystone MIPS)

High level of integration

64-bit, 200 MHz integer CPU

67MFlops single-precision floating-point unit

8KB instruction cache; 8KB data cache

Integer multiply unit with 100M Mul-Add/sec

! " #

200 MIPS 64-bit ORION CPU

64-bit register file

Low-power operation

Active power management powers-down inactive units

Standby mode

Upwardly software compatible with IDT RISController

Family

Large, efficient on-chip caches

Separate 8kB Instruction and 8kB Data caches

Over 2400MB/sec bandwidth from internal caches

2-set associative

Write-back and write-through support

Cache locking to facilitate deterministic response

Bus compatible with RC4000 family

System interface provides bandwidth up to 800 MB/S

Direct interface to 32-bit wide or 64-bit wide systems

Synchronized to external reference clock for multi-master

operation

Improved real-time support

Fast interrupt decode

Optional cache locking

System Control Coprocessor

Address Translation/

Cache Attribute Control

67MFLOPS Single-Precision FPA

FP register file

64-bit adder

Pipeline Control

Pack/Unpack

Load aligner

Store Aligner

Logic Unit

High-Performance

Integer Multiply

Exception Management

Functions

FP Add/Sub/Cvt/

Div/Sqrt

FP Multiply

Control Bus

Data Bus

Instruction Bus

Instruction Cache

Set A

(Lockable)

Instruction Cache

Set B

32-/64-bit

Synchronized

System Interface

Data Cache

Set A

(Lockable)

Data Cache

Set B

The IDT logo is a registered trademark and ORION, RC4600, RC4650, RV4650, RC4700, RC3081, RC3052, RC3051, RC3041, RISController, and RISCore are trademarks of Integrated Device Technology, Inc.

1998 Integrated Device Technology, Inc.

August 1998

DSC3149/2

The 64-bit computing capability of the RC4650 enables a wide

variety of capabilities previously limited by the lower bandwidth and

bit-manipulation rates inherent in 32-bit architectures. For example,

the RC4650 can perform loads and stores from cached memory at

the rate of 8-bytes every clock cycle, doubling the bandwidth of an

equivalent 32-bit processor. This capability, coupled with the high

clock rate for the RC4650 pipeline, enables new levels of perfor-

mance to be obtained from embedded systems.

This data sheet provides an overview of the features and architec-

ture of the RC4650 CPU. A more detailed description of the

processor is available in the

IDT79RC4650 Processor Hardware

User’s Manual,

available from IDT. Further information on develop-

ment support, applications notes, and complementary products are

also available from your local IDT sales representative.

The IDT79RC4650 is a low-cost member of the IDT Micropro-

cessor family, targeted to a variety of performance hungry

embedded applications. The RC4650 continues the IDT tradition of

high-performance through high-speed pipelines, high-bandwidth

caches and bus interface, 64-bit architecture, and careful attention

to efficient control. The RC4650 reduces the cost of this perfor-

mance relative to the RC4700, by removing functional units that

are frequently unneeded for many embedded applications, such as

double-precision floating point arithmetic and a TLB.

The RC4650 adds features relative to the RC4700, reflective of

its target applications. These features enable system cost reduc-

tion (e.g. optional 32-bit system interface) as well as higher perfor-

mance for certain types of systems (e.g. cache locking, improved

real-time support, integer DSP capability).

The RC4650 supports a wide variety of embedded processor-

based applications, such as consumer game systems, multi-media

functions, internetworking equipment, switching equipment, and

printing systems. Upwardly software-compatible with the RC3000

family, and bus- and upwardly software-compatible with the IDT

RC4000/RC5000 family, the RC4650 will serve in many of the

same applications, but, in addition supports other applications such

as those requiring integer DSP functions.

The RC4650 brings 64-bit performance levels to lower cost

systems. High performance is preserved by retaining large on-chip

caches that are two-way set associative, a streamlined high-speed

pipeline, high-bandwidth, 64-bit execution, and facilities such as

early restart for data cache misses. These techniques combine to

allow the system designer 3GB/sec aggregate bandwidth, 800 MB/

sec bus bandwidth, 265 Dhrystone MIPS, 67 MFlops, and 100 M

Multiply-add/second.

The RC4650 provides complete upward application-software

compatibility with the IDT79RC3000

™

and IDT79RC4700

™

families

of microprocessors.An array of development tools facilitates the

rapid development of RC4650-based systems, enabling a wide

variety of customers to take advantage of the high-performance

capabilities of the processor while maintaining short time to market

goals.

General Purpose Registers

•

r29

The RC4650 family brings a high-level of integration designed for

high-performance computing. The key elements of the RC4650 are

briefly described below. A more detailed description of each of these

subsystems is available in the User’s Manual.

The RC4650 uses a 5-stage pipeline similar to the IDT79RC3000

and the IDT79RC4700. The simplicity of this pipeline allows the

RC4650 to be lower cost and lower power than super-scalar or super-

pipelined processors. Unlike superscalar processors, applications

that have large data dependencies or that require a great deal of

load/stores can still achieve performance close to the peak

performance of the processor. Figure 2 shows the RC4650 pipeline.

The RC4650 implements the MIPS-III Instruction Set Architecture

and is upwardly compatible with applications that run on the earlier

generation parts. The RC4650 includes the same additions to the

instruction set found in the RC4700 family of microprocessors,

targeted at improving performance and capability while maintaining

binary compatibility with earlier RC3000 processors.

Multiply/Divide Registers

HI (Accumulate HI)

LO (Accumulate LO)

Program Counter

310

Figure 1: CPU Registers

•••

one cycle

2A-2D

1D-2D

1A-2A

Data cache access and load align

! " !

#!$"% ! !&& !!

Data virtual-to-physical address translation

Virtual-to-physical address translation

Bypass calculation

Instruction decode

Branch address calculation

Issue or slip decision

Integer add, logical, shift

Data virtual address calculation

Store align

Branch decision

Figure 2: RC4650 Pipeline

The extensions result in better code density, greater multi-

processing support, improved performance for commonly used

code sequences in operating system kernels, and faster execution

of floating-point intensive applications. All resource dependencies

are made transparent to the programmer, insuring transportability

among implementations of the MIPS instruction set architecture. In

addition, MIPS-III specifies new instructions defined to take advan-

tage of the 64-bit architecture of the processor.

Finally, the RC4650 also implements additional instructions, which

are considered extensions to the MIPS-III architecture. These instruc-

tions improve the multiply and multiply-add throughput of the CPU,

making it well suited to a wide variety of imaging and DSP applica-

tions. These extensions, which use opcodes allocated by MIPS Tech-

nologies for this purpose, are supported by a wide variety of

development tools.

The MIPS integer unit implements a load/store architecture with

single cycle ALU operations (logical, shift, add, sub) and autono-

mous multiply/divide unit. The 64-bit register resources include: 32

general-purpose orthogonal integer registers, the HI/LO result

registers for the integer multiply/divide unit, and the program

counter. In addition, the on-chip floating-point co-processor adds

32 floating-point registers, and a floating-point control/status

The RC4650 adds a new multiply instruction, “MUL”, which can

specify that the multiply results bypass the “Lo” register and are

placed immediately in the primary register file. By avoiding the explicit

“Move-from-Lo” instruction required when using “Lo”, throughput of

multiply-intensive operations is increased.

An additional enhancement offered by the RC4650 is an atomic

“multiply-add” operation, MAD, used to perform multiply-accumulate

operations. This instruction multiplies two numbers and adds the

product to the current contents of the HI and LO registers. This oper-

ation is used in numerous DSP algorithms, and allows the RC4650 to

cost reduce systems requiring a mix of DSP and control functions.

Finally, aggressive implementation techniques feature low latency

for these operations along with pipelining to allow new operations to

be issued before a previous one has fully completed. Table 1 also

shows the repeat rate (peak issue rate), latency, and number of

processor stalls required for the various operations. The RC4650

performs automatic operand size detection to determine the size of

the operand, and implements hardware interlocks to prevent overrun,

allowing this high-performance to be achieved with simple program-

ming.

The RC4650 has thirty-two general-purpose 64-bit registers.

These registers are used for scalar integer operations and address

calculation. The register file consists of two read ports and one

write port and is fully bypassed to minimize operation latency in the

pipeline. Figure 1 illustrates the RC4650 Register File.

The RC4650 ALU consists of the integer adder and logic unit.

The adder performs address calculations in addition to arithmetic

operations, and the logic unit performs all logical and shift opera-

tions. Each of these units is highly optimized and can perform an

operation in a single pipeline cycle.

The RC4650 incorporates an entire single-precision floating-point

co-processor on chip, including a floating-point register file and

The RC4650 uses a dedicated integer multiply/divide unit, opti- execution units. The floating-point co-processor forms a “seamless”

mized for high-speed multiply and multiply-accumulate operation. interface with the integer unit, decoding and executing instructions in

Table 1 shows the performance, expressed in terms of pipeline parallel with the integer unit.

The RC4650’s floating-point unit directly implements single-preci-

clocks, achieved by the RC4650 integer multiply unit.

sion floating-point operations. This enables the RC4650 to perform

functins such as graphics rendiering, without requiring extensive die

'$!& ! %

$! !

are or power consumption.

'$ &

(

The RC4650 does not directly implement the double-precision

operations found in the RC4700. However, to maintain software

MULT/U, MAD/U

16 bit

compatibility, the RC4650 will signal a trap when a double-precision

32 bit

operation is initiated, allowing the requested function to be emulated

in software. Alternatively, the system architect could use a software

MUL

16 bit

library emulation of double-precision functions, selected at compile

32 bit

time, to eliminate the overhead associated with trap and emulation.

DMULT,

DMULTU

DIV, DIVU

any

The RC4650 floating-point execution units perform single preci-

sion arithmetic, as specified in the IEEE Standard 754. The execution

DDIV, DDIVU

any

unit is broken into a separate multiply unit and a combined add/

convert/divide/square root unit. Overlap of multiplies and add/subtract

Table 1: RC4650 Integer Multiply Operation

is supported. The multiplier is partially pipelined, allowing a new

multiply to begin every 6 cycles.

The MIPS-III architecture defines that the results of a multiply or

As in the IDT79RC4700, the RC4650 maintains fully precise

divide operation are placed in the HI and LO registers. The values floating-point exceptions while allowing both overlapped and pipe-

can then be transferred to the general purpose register file using lined operations. Precise exceptions are extremely important in

the MFHI/MFLO instructions.

mission-critical environments, such as ADA, and highly desirable for

debugging in any environment.

The floating-point unit’s operation set includes floating-point

add, subtract, multiply, divide, square root, conversion between

fixed-point and floating-point format, conversion among floating-

point formats, and floating-point compare.These operations comply

with IEEE Standard 754. Double precision operations are not

directly supported; attempts to execute double-precision floating

point operations, or refer directly to double-precision registers,

result in the RC4650 signalling a “trap” to the CPU, enabling

emulation of the requested function.

Table 2: gives the latencies of some of the floating-point instruc-

tions in internal processor cycles.

" #$

!" #$%

The system control co-processor in the MIPS architecture is

responsible for the virtual to physical address translation and cache

protocols, the exception control system, and the diagnostics capa-

bility of the processor. In the MIPS architecture, the system control

co-processor (and thus the kernel software) is implementation depen-

dent.

In the RC4650, significant changes in CP0—relative to the

RC4700—have been implemented. These changes are designed to

simplify memory management, facilitate debug, and speed real-time

processing.

'$!

ADD

SUB

MUL

DIV

SQRT

CMP

FIX

FLOAT

ABS

MOV

NEG

LWC1

SWC1

! %

The RC4650 incorporates all system control co-processor (CP0)

registers on-chip. These registers provide the path through which the

virtual memory system’s address translation is controlled, exceptions

are handled, and operating modes are controlled (kernel vs. user

mode, interrupts enabled or disabled, cache features). In addition, the

RC4650 includes registers to implement a real-time cycle counting

facility, which aids in cache diagnostic testing, assists in data error

detection, and facilitates software debug. Alternatively, this timer can

be used as the operating system reference timer, and can signal a

periodic interrupt.

Table 3 shows the CP0 registers of the RC4650.

4-7, 10, 20-

25, 29, 31

)!*

IBase

IBound

DBase

DBound

—

BadVAddr

Count

Compare

Status

Cause

EPC

PRId

Config

CAlg

IWatch

Instruction address space base

Instruction address space bound

Data address space base

Data address space bound

Not used

Virtual address on address exceptions

Counts every other cycle

Generate interrupt when Count = Com-

pare

Miscellaneous control/status

Exception/Interrupt information

Exception PC

Processor ID

Cache and system attributes

Cache attributes for the eight 512MB

regions of the virtual address space

Instruction breakpoint virtual address

Table 2: Floating-Point Operation

The floating-point register file is made up of thirty-two 32-bit

registers. These registers are used as source or target registers for

the single-precision operations.

References to these registers as 64-bit registers (as supported

in the RC4700) will cause a trap to be signalled.

The floating-point control register space contains two registers;

one for determining configuration and revision information for the

coprocessor and one for control and status information. These are

primarily involved with diagnostic software, exception handling,

state saving and restoring, and control of rounding modes.

Table 3: RC4650 CPO Registers