79RV4640-180DUG,79RV4640-180DUG pdf中文资料,79RV4640-180DUG引脚图,79RV4640-180DUG电路-Datasheet-电子工程世界

文档预览

Low-Cost Embedded

64-bit RISController

w/ DSP Capability

Features

High-performance embedded 64-bit microprocessor

– 64-bit integer operations

– 64-bit registers

– Based on the MIPS RISC Architecture

– 100MHz, 133MHz, 150MHz, 180MHz, 200MHz and 267MHz

operating frequencies

– 32-bit bus interface brings 64-bit power to 32-bit system cost

◆

High-performance DSP capability

– 133.5 Million Integer Mul-Accumulate

operations/sec @267MHz

– 89 MFlops floating-point operations @267MHz

◆

High-performance microprocessor

– 133.5 M Mul-Add/second @267MHz

– 89 MFlops @267MHz

– >640,000 dhrystone (2.1)/sec capability @267MHz (352

dhrystone MIPS)

◆

High level of integration

– 64-bit, 267 MHz integer CPU

– 8KB instruction cache; 8KB data cache

– Integer multiply unit with 133.5M Mul-Add/sec

◆

Upwardly software compatible with IDT RISController

Family

◆

Easily upgradable to 64-bit system

◆

IDT79RC4640

™

Low-power operation

– Active power management powers-down inactive units

– Standby mode

◆

Large, efficient on-chip caches

– Separate 8KB Instruction and 8KB Data caches

– Over 3200MB/sec bandwidth from internal caches

– 2-set associative

– Write-back and write-through support

– Cache locking, to facilitate deterministic response

– High performance write protocols, for graphics and data

communications

◆

Bus compatible with RC4000 family

– System interfaces to 125MHz, provides bandwidth up to 500

MB/sec

– Direct interface to 32-bit wide systems

– Synchronized to external reference clock for multi- master

operation

– Socket compatible with IDT RC 64474 and RC64574

◆

Improved real-time support

– Fast interrupt decode

– Optional cache locking

Note:

“R” refers to 5V parts; “RV” refers to 3.3V parts; “RC”

refers to both

Block Diagram

267 MHz 64-bit CPU

64-bit Register File

64-bit Adder

System Control Coprocessor

Address Translation/

Cache Attribute Control

89 MFlops Single-Precision FPA

FP Register File

Pipeline Control

Load Aligner

Store Aligner

Logic Unit

High-Performance

Integer Multiply

Exception Management

Functions

Pipeline Control

Pack/Unpack

FP Add/Sub/Cvt/

Div/Sqrt

FP Multiply

Control Bus

Data Bus

Instruction Bus

Instruction Cache

Set A

(Lockable)

Instruction Cache

Set B

32-bit

Synchronized

System Interface

Data Cache

Set A

(Lockable)

Data Cache

Set B

The IDT logo is a trademark and RC4600, RC4650, RC3081,RC3052,RC3051,RC3041 RISController, and RISCore are trademarks of Integrated Device Technology, Inc.

1 of 23

2008 Integrated Device Technology, Inc.

December 5, 2008

DSC 3486/2

IDT79RC4640™

Description

The IDT79RC4640 is a low-cost member of the Integrated Device

Technology, Inc. RC4000 family, targeted to a variety of performance-

hungry embedded applications. The RC4640 continues the RC4000

tradition of high-performance through high-speed pipelines, high-band-

width caches and bus interface, 64-bit architecture, and careful attention

to efficient control. The cost of this performance is reduced by removing

functional units frequently not required for many embedded applications.

The RC4640 supports a wide variety of embedded processor-based

applications, such as internetworking equipment (routers, switches),

office automation equipment (printers, scanners), and consumer multi-

media game systems. Also, being upwardly software-compatible with

the RC32300 family as well as bus- and upwardly software-compatible

with the IDT RC4000 family, the RC4640 will serve in many of the same

applications. And, the RC4640 supports applications that require integer

digital signal processing (DSP) functions.

The RC64475 and RC64575 processors offer a direct migration path

for designs based on IDT’s RC4650 processors, through full pin and

socket compatibility.

The RC4640 brings 64-bit performance levels to lower cost systems.

High performance is preserved by retaining large on-chip two-way set-

associative caches, a streamlined high-speed pipeline, high bandwidth,

64-bit execution, and facilities such as early restart for data cache

misses.

These techniques allow the system designer over 3.2 GB/sec aggre-

gate internal bandwidth, 500 MB/sec bus bandwidth, almost 352 Dhrys-

tone MIPS, 89MFlops, and 133.5 M Mul-Add/sec. An array of tools

facilitates rapid development of RC4640-based systems, allowing a

wide variety of customers access to the processor’s high-performance

capabilities while maintaining short time-to-market goals.

The extensions result in better code density, greater multi-

processing support, improved performance for commonly used code

sequences in operating system kernels, and faster execution of floating-

point intensive applications. All resource dependencies are made trans-

parent to the programmer, insuring transportability among implementa-

tions of the MIPS instruction set architecture. In addition, MIPS-III

specifies new instructions defined to take advantage of the 64-bit archi-

tecture of the processor.

Finally, the RC4640 also implements additional instructions, which

are considered extensions to the MIPS-III architecture. These instruc-

tions improve the multiply and multiply-add throughput of the CPU,

making it well suited to a wide variety of imaging and DSP applications.

These extensions, which use opcodes allocated by MIPS Technologies

for this purpose, are supported by a wide variety of development tools.

The MIPS integer unit implements a load/store architecture with

single cycle ALU operations (logical, shift, add, sub) and autonomous

multiply/divide unit. The 64-bit register resources include: 32 general-

purpose orthogonal integer registers, the HI/LO result registers for the

integer multiply/divide unit, and the program counter. In addition, the on-

chip floating-point co-processor adds 32 floating-point registers, and a

floating-point control/status register.

The RC4640 has 32 general-purpose 64-bit registers. These regis-

ters are used for scalar integer operations and address calculation. The

bypassed to minimize operation latency in the pipeline.

Arithmetic Logic Unit

The RC4640 ALU consists of the integer adder and logic unit. The

adder performs address calculations in addition to arithmetic operations;

the logic unit performs all of the logic and shift operations. Each unit is

highly optimized and can perform an operation in a single pipeline cycle.

Integer Multiply/Divide

The RC4640 uses a dedicated integer multiply/divide unit, optimized

for high-speed multiply and multiply-accumulate operation. Table 1

shows the performance, expressed in terms of pipeline clocks, achieved

by the RC4640 integer multiply unit.

Opcode

MULT/U, MAD/U

MUL

DMULT, DMULTU

DIV, DIVU

DDIV, DDIVU

Operand

Size

16 bit

32 bit

16 bit

32 bit

any

Latency Repeat

Stall

Hardware Overview

Some key elements of the RC4640 are briefly described below. More

detailed information is available in the

IDT79RC4640/IDT79RC4650

RISC Processor Hardware User’s Manual.

Pipeline

The RC4640 uses a 5-stage pipeline that is similar to the

IDT79RC3000 and the IDT79RC4700 processors. The simplicity of this

pipeline allows the RC4640 to cost less than super-scalar processors

and require less power than super-pipelined processors. So, unlike

superscalar processors, applications that have large data dependen-

cies, or require frequent load/stores, can still achieve peak performance.

Integer Execution Engine

The RC4640 implements the MIPS-III Instruction Set Architecture

and is fully upward compatible with applications that run on earlier

generation parts. The RC4640 is software-compatible with the RC4650,

and includes the instruction set found in the RC4700 microprocessor,

targeted at higher performance while maintaining binary compatibility

with RC32300 processors.

Table 1 RC4640 Integer Multiply Operation

2 of 23

December 5, 2008

IDT79RC4640™

The MIPS-III architecture defines that the results of a multiply or

divide operation are placed in the HI and LO registers. The values can

then be transferred to the general purpose register file using the MFHI/

MFLO instructions.

The RC4640 adds a new multiply instruction, “MUL”, which can

specify that the multiply results bypass the “Lo” register and are placed

immediately in the primary register file. By avoiding the explicit “Move-

from-Lo” instruction required when using “Lo”, throughput of multiply-

intensive operations is increased.

An additional enhancement offered by the RC4640 is an atomic

“multiply-add” operation, MAD, used to perform multiply-accumulate

operations. This instruction multiplies two numbers and adds the product

to the current contents of the HI and LO registers. This operation is used

in numerous DSP algorithms, and allows the RC4640 to cost reduce

systems requiring a mix of DSP and control functions.

Finally, aggressive implementation techniques feature low latency for

these operations along with pipelining to allow new operations to be

issued before a previous one has fully completed. Table 1 also shows

the repeat rate (peak issue rate), latency, and number of processor stalls

required for the various operations. The RC4640 performs automatic

operand size detection to determine the size of the operand, and imple-

ments hardware interlocks to prevent overrun, allowing this high-perfor-

mance to be achieved with simple programming.

Floating-Point Coprocessor

The RC4640 incorporates an entire single-precision floating-point

coprocessor on chip, including a floating-point register file and execution

units. The floating-point coprocessor forms a “seamless” interface with

the integer unit, decoding and executing instructions in parallel with the

integer unit.

The floating-point unit of the RC4640 directly implements single-

precision floating-point operations, which enables the RC4640 to

perform functions such as graphics rendering without requiring exten-

sive die area or power consumption. The single-precision unit of the

RC4640 is directly compatible with the single-precision operation of the

RC4700, and features the same latencies and repeat rates.

The RC4640 does not directly implement the double-precision opera-

tions found in the RC4700. However, to maintain software compatibility,

the RC4640 will signal a trap when a double-precision operation is initi-

ated, allowing the requested function to be emulated in software. Alter-

natively, the system architect could use a software library emulation of

double-precision functions, selected at compile time, to eliminate the

overhead associated with trap and emulation.

Floating-Point Units

The RC4640’s floating-point execution units perform single precision

arithmetic, as specified in IEEE Standard 754. The execution unit is

broken into a separate multiply unit and a combined add/convert/divide/

square root unit. Overlap of multiply and add/subtract is supported. The

multiplier is partially pipelined, allowing a new multiplication instruction

to begin every 6 cycles.

As in the IDT79RC4700, the RC4640 maintains fully precise floating-

point exceptions while allowing both overlapped and pipelined opera-

tions. Precise exceptions are extremely important in mission-critical

environments, such as ADA, and highly desirable for debugging in any

environment.

The floating-point unit’s operation set includes floating-point add,

subtract, multiply, divide, square root, conversion between fixed-point

and floating-point format, conversion among floating-point formats, and

floating-point compare. These operations comply with IEEE Standard

754. Double precision operations are not directly supported; attempts to

execute double-precision floating point operations, or refer directly to

double-precision registers, result in the RC4640 signalling a “trap” to the

CPU, enabling emulation of the requested function. Table 2 gives the

latencies of some of the floating-point instructions in internal processor

cycles.

Operation

ADD

SUB

MUL

DIV

SQRT

CMP

FIX

FLOAT

ABS

MOV

NEG

LWC1

SWC1

Instruction

Latency

Table 2 Floating-Point Operation

Floating-Point General Register File

The floating-point register file is made up of thirty-two 32-bit regis-

ters. These registers are used as source or target registers for the

single-precision operations.

References to these registers as 64-bit registers (as supported in the

RC4700) will cause a trap to be signalled to the integer unit.

The floating-point control register space contains two registers; one

for determining configuration and revision information for the copro-

cessor and one for control and status information. These are primarily

involved with diagnostic software, exception handling, state saving and

restoring, and control of rounding modes.

3 of 23

December 5, 2008

IDT79RC4640™

System Control Coprocessor (CP0)

The system control coprocessor in the MIPS architecture is respon-

sible for the virtual to physical address translation and cache protocols,

the exception control system, and the diagnostics capability of the

processor. In the MIPS architecture, the system control coprocessor

(and thus the kernel software) is implementation dependent.

In the RC4640, significant changes in CP0 relative to the RC4600

have been implemented. These changes are designed to simplify

memory management, facilitate debug, and speed real-time processing.

System Control Coprocessor Registers

The RC4640 incorporates all system control co-processor (CP0)

registers on-chip. These registers provide the path through which the

virtual memory system’s address translation is controlled, exceptions

are handled, and operating modes are controlled (kernel vs. user mode,

interrupts enabled or disabled, cache features). In addition, the RC4640

includes registers to implement a real-time cycle counting facility, which

aids in cache diagnostic testing, assists in data error detection, and facil-

itates software debug. Alternatively, this timer can be used as the

operating system reference timer, and can signal a periodic interrupt.

Table 3 shows the CP0 registers of the RC4640.

Number

Name

IBase

IBound

DBase

DBound

Function

Instruction address space base

Instruction address space bound

Data address space base

Data address space bound

Not used

Virtual address on address exceptions

Counts every other cycle

Generate interrupt when Count = Compare

Miscellaneous control/status

Exception/Interrupt information

Exception PC

Processor ID

Cache and system attributes

Cache attributes for the 8 512MB regions of the

virtual address space

Instruction breakpoint virtual address

Data breakpoint virtual address

Used in cache diagnostics

Cache diagnostic information

Cache index information

CacheError exception PC

Operation Modes

The RC4640 supports two modes of operation: user mode and

kernel mode. Kernel mode operation is typically used for exception

handling and operating system kernel functions, including CP0 manage-

ment and access to IO devices. In kernel mode, software has access to

the entire address space and all of the co-processor 0 registers, and

can select whether to enable co-processor 1 accesses. The processor

enters kernel mode at reset, and whenever an exception is recognized.

User mode is typically used for applications programs. User mode

accesses are limited to a subset of the virtual address space, and can

be inhibited from accessing CP0 functions.

0xFFFFFFFF

Kernel virtual address space

(kseg2)

Unmapped, 1.0 GB

0xC0000000

0xBFFFFFFF

Uncached kernel physical address space

(kseg1)

Unmapped, 0.5GB

0xA0000000

0x9FFFFFFF

Cached kernel physical address space

(kseg0)

Unmapped, 0.5GB

0x80000000

0x7FFFFFF

4-7, 10, 20-25, -

29, 31

BadVAddr

Count

Compare

Status

Cause

EPC

PRId

Config

CAlg

IWatch

DWatch

ECC

CacheErr

TagLo

ErrorEPC

User virtual address space

(useg)

Mapped, 2.0GB

0x00000000

Figure 1 Mode Virtual Addressing (32-bit mode)

Virtual-to-Physical Address Mapping

The 4GB virtual address space of the RC4640 is shown in Figure 1.

The 4 GB address space is divided into addresses accessible in either

kernel or user mode (kuseg), and addresses only accessible in kernel

mode (kseg2:0).

The RC4640 supports the use of multiple user tasks sharing

common virtual addresses, but mapped to separate physical addresses.

This facility is implemented via the “base-bounds” registers contained in

CP0.

When a user virtual address is asserted (load, store, or instruction

fetch), the RC4640 compares the virtual address with the contents of

the appropriate “bounds” register (instruction or data). If the virtual

Table 3 RC4640 CPO Registers

4 of 23

December 5, 2008

IDT79RC4640™

address is “in bounds”, the value of the corresponding “base” register is

added to the virtual address to form the physical address for that refer-

ence. If the address is not within bounds, an exception is signalled.

This facility enables multiple user processes in a single physical

memory without the use of a TLB. This type of operation is further

supported by a number of development tools for the RC4640, including

real-time operating systems and “position independent code”.

Kernel mode addresses do not use the base-bounds registers, but

rather undergo a fixed virtual-to-physical address translation.

Debug Support

To facilitate software debug, the RC4640 adds a pair of “watch” regis-

ters to CP0. When enabled, these registers will cause the CPU to take

an exception when a “watched” address is appropriately accessed.

Interrupt Vector

The RC4640 also adds the capability to speed interrupt exception

decoding. Unlike the RC4700, which utilizes a single common exception

vector for all exception types (including interrupts), the RC4640 allows

kernel software to enable a separate interrupt exception vector. When

enabled, this vector location speeds interrupt processing by allowing

software to avoid decoding interrupts from general purpose exceptions.

Cache Memory

To keep the RC4640’s high-performance pipeline full and operating

efficiently, the RC4640 incorporates on-chip instruction and data caches

that can each be accessed in a single processor cycle. Each cache has

its own 64-bit data path and can be accessed in parallel. The cache

subsystem provides the integer and floating-point units with an aggre-

gate bandwidth of over 3200 MB per second at a pipeline clock

frequency of 267MHz. The cache subsystem is similar in construction to

that found in the RC4700, although some changes have been imple-

mented. Table 4 is an overview of the caches found on the RC4640.

Instruction Cache

The RC4640 incorporates a two-way set associative on-chip instruc-

tion cache. This virtually indexed, physically tagged cache is 8KB in size

and is parity protected.

Because the cache is virtually indexed, the virtual-to-physical

address translation occurs in parallel with the cache access, thus further

increasing performance by allowing these two operations to occur simul-

taneously. The tag holds a 20-bit physical address and valid bit, and is

parity protected.

The instruction cache is 64-bits wide, and can be refilled or accessed

in a single processor cycle. Instruction fetches require only 32 bits per

cycle, for a peak instruction bandwidth of 1068MB/sec at 267MHz.

Sequential accesses take advantage of the 64-bit fetch to reduce power

dissipation, and cache miss refill, can write 64 bits-per-cycle to minimize

the cache miss penalty. The line size is eight instructions (32 bytes) to

maximize performance.

In addition, the contents of one set of the instruction cache (set “A”)

can be “locked” by setting a bit in a CP0 register. Locking the set

prevents its contents from being overwritten by a subsequent cache

miss; refill occurs then only into “set B”.

This operation effectively “locks” time critical code into one 4kB set,

while allowing the other set to service other instruction streams in a

normal fashion. Thus, the benefits of cached performance are achieved,

while deterministic real-time response is preserved.

Data Cache

For fast, single cycle data access, the RC4640 includes an 8KB on-

chip data cache that is two-way set associative with a fixed 32-byte

(eight words) line size. Table 4 lists the RC4640 cache attributes.

Characteristics

Size

Organization

Line size

Index

Tag

Write policy

Line transfer order

Instruction

8KB

Data

2-way set associative 2-way set associative

32B

vAddr

11..0

pAddr

31..12

n.a.

read sub-block order

write sequential

32B

vAddr

11..0

pAddr

31..12

writeback /writethru

read sub-block order

write sequential

first word

per-byte

set A

Miss restart after transfer of

Parity

Cache locking

entire line

per-word

set A

Table 4 RC4640 Cache Attributes

The data cache is protected with byte parity and its tag is protected

with a single parity bit. It is virtually indexed and physically tagged to

allow simultaneous address translation and data cache access

The normal write policy is writeback, which means that a store to a

cache line does not immediately cause memory to be updated. This

increases system performance by reducing bus traffic and eliminating

the bottleneck of waiting for each store operation to finish before issuing

a subsequent memory operation. Software can however select write-

through for certain address ranges, using the CAlg register in CP0.

Cache protocols supported for the data cache are:

◆

Uncached.

Addresses in a memory area indicated as uncached will not be

read from the cache. Stores to such addresses will be written

directly to main memory, without changing cache contents.

◆

Writeback.

Loads and instruction fetches will first search the cache, reading

main memory only if the desired data is not cache resident. On

data store operations, the cache is first searched to see if the

target address is cache resident. If it is resident, the cache con-

5 of 23

December 5, 2008