Features
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
3000 Dhrystone 2.1 MIPS at 1.3 GHz
Selectable Bus Clock (30 CPU Bus Dividers up to 28x)
13 Selectable Core-to-L3 Frequency Divisors
Selectable MPx/60x Interface Voltage (1.8V, 2.5V)
Selectable L3 Interface of 1.8V or 2.5V
P
D
Typical 12.6W at 1 GHz at V
DD
= 1.3V; 8.3W at 1 GHz at V
DD
= 1.1V, Full Operating
Conditions
Nap, Doze and Sleep Modes for Power Saving
Superscalar (Four Instructions Fetched Per Clock Cycle)
4 GB Direct Addressing Range
Virtual Memory: 4 Hexabytes (2
52
)
64-bit Data and 36-bit Address Bus Interface
Integrated L1: 36 KB Instruction and 32 KB Data Cache
Integrated L2: 512 KB
11 Independent Execution Units and Three Register Files
Write-back and Write-through Operations
f
INT
Max = 1 GHz (1.2 GHz to be Confirmed)
f
BUS
Max = 133 MHz/166 MHz
PowerPC
®
7447/7457
PC7447/57
Preliminary
Description
This document is primarily concerned with the PowerPC PC7457; however, unless
otherwise noted, all information here also applies to the PC7447. The PC7457 and
PC7447 are implementations of the PowerPC microprocessor family of reduced
instruction set computer (RISC) microprocessors. This document describes pertinent
electrical and physical characteristics of the PC7457.
The PC7457 is the fourth implementation of the fourth generation (G4) microproces-
sors from Freescale. The PC7457 implements the full PowerPC 32-bit architecture
and is targeted at networking and computing systems applications. The PC7457 con-
sists of a processor core, a 512 Kbyte L2, and an internal L3 tag and controller which
support a glueless backside L3 cache through a dedicated high-bandwidth interface.
The PC7447 is identical to the PC7457 except it does not support the L3 cache
interface.
The core is a high-performance superscalar design supporting a double-precision
floating-point unit and a SIMD multimedia unit. The memory storage subsystem sup-
ports the MPX bus interface to main memory and other system resources. The L3
interface supports 1, 2, or 4M bytes of external SRAM for L3 cache and/or private
memory data. For systems implementing 4M bytes of SRAM, a maximum of 2M bytes
may be used as cache; the remaining 2M bytes must be private memory.
Note that the PC7457 is a footprint-compatible, drop-in replacement in a PC7455
application if the core power supply is 1.3V.
Rev. 5345C–HIREL–07/05
Screening
• CBGA Upscreenings Based on Atmel Standards
• Full Military Temperature Range (T
j
= -55°C, +125°C),
Industrial Temperature Range (T
j
= -40°C, +110°C)
• HCTE Package for the 7457
CBGA 483
GH suffix
HITCE 483
Ceramic Ball Grid Array
G suffix
CBGA 360
Ceramic Ball Grid Array
2
PC7447/57 [Preliminary]
5345C–HIREL–07/05
5345C–HIREL–07/05
Figure 1-1.
1. Block Diagram
Additional Features
- Time Base Counter/Decrementer
- Clock Multiplier
- JTAG/COP Interface
- Thermal/Power Management
- Performance Monitor
Completion Unit
Completion Queue
(16-Entry)
Instruction Unit
Branch Processing Unit
BTIC (128-Entry)
BHT (2048-Entry)
CTR
Fetcher
Instruction Queue
(12-Word)
Instruction MMU
SRs
(Shadow)
128-Entry
ITLB
128-Bit (4 Instructions)
Tags
IBAT Array
LR
Dispatch
Unit
Data MMU
SRs
(Original)
VR Issue
(4-Entry/2-Issue)
GPR Issue
(6-Entry/3-Issue)
FPR Issue
(2-Entry/1-Issue)
128-Entry
DTLB
32-Kbyte
I Cache
PC7457 Microprocessor Block Diagram
96-Bit (3 Instructions)
Tags
D Cache
32-Kbyte
DBAT Array
Reservation
Stations (2-Entry)
Completes up
to three
instructions
per clock
VR File
16 Rename
Buffers
Reservation Reservation Reservation Reservation
v
Station
Station
Station
Station
Reservation
v
Stations (2)
Reservation
Reservation
Reservation
Station
Station
Station
Vector
Touch
Queue
GPR File
16 Rename
Buffers
EA
Load/Store Unit
Vector Touch Engine
+ (EA Calculation)
Finished
Stores
L1 Castout
PA
FPR File
16 Rename
Buffers
Reservation
Stations (2)
Integer
Unit 2
x÷
Integer
Integer
Integer
Unit 122
Unit
Unit
(3)
+++
Floating-
Point Unit
+ x÷
FPSCR
L1 Push
Completed
Stores
Vector
Permute
Unit
Vector
Integer
er
Unit 2
Vector
Integer
er
Unit 1
Vector
FPU
128-Bit
128-Bit
32-Bit
32-Bit
32-Bit
Load Miss
64-Bit
64-Bit
PC7447/57 [Preliminary]
Memory Subsystem
L1 Store Queue
(LSQ)
L1 Load Queue (LLQ)
L1 Load Miss (5)
L2 Prefetch (3)
Instruction Fetch (2)
Cacheable Store Request(1)
L2 Store Queue (L2SQ)
Snoop Push/
L1 Castouts
Interventions
(4)
Bus Accumulator
19-Bit Address
64-Bit Data
(8-Bit Parity)
External SRAM
(1, 2, or 4 Mbytes)
512-Kbyte UniÞed L2 Cache Controller
Line
Block 0 (32-Byte) Block 1 (32-Byte)
Tags Status
Status
L3 Cache Controller(1)
Line Block 0/1
Tags Status
L3CR
System Bus Interface
Load
Queue (11)
Bus Store Queue
Castout
Queue (9)/
Push
Queue (10)(2)
L1 Service
Queues
Bus Accumulator
36-Bit
Address Bus
Notes:
1. The L3 cache interface is not implemented on the PC7447.
2. The Castout Queue and Push Queue share resources such for a combined total of 10 entries.
The Castout Queue itself is limited to 9 entries, ensuring 1 entry will be available for a push.
64-Bit
Data Bus
3
2. General Parameters
Table 2-1
provides a summary of the general parameters of the PC7457.
Table 2-1.
Parameter
Technology
Die size
Transistor count
Logic design
Packages
Core power supply
I/O power supply
Device Parameters
Description
0.13 µm CMOS, nine-layer metal
9.1 mm × 10.8 mm
58 million
Fully-static
PC7447: surface mount 360 ceramic ball grid array (CBGA)
PC7457: surface mount 483 ceramic ball grid array (CBGA) + HiTCE CBGA
1.3V ±500 mV DC nominal or 1.1V ±50 mV (nominal, see
”Recommended
Operating Conditions
(1)
” on page 12
1.8V ±5% DC, or 2.5V ±5% for recommended operating conditions
3. Overview
This section summarizes features of the PC7457 implementation of the PowerPC architecture.
Major features of the PC7457 are as follows:
• High-performance, superscalar microprocessor
– As many as 4 instructions can be fetched from the instruction cache at a time
– As many as 3 instructions can be dispatched to the issue queues at a time
– As many as 12 instructions can be in the instruction queue (IQ)
– As many as 16 instructions can be at some stage of execution simultaneously
– Single-cycle execution for most instructions
– One instruction per clock cycle throughput for most instructions
– Seven-stage pipeline control
• Eleven independent execution units and three register files
– Branch processing unit (BPU) features static and dynamic branch prediction
128-entry (32-set, four-way set-associative) branch target instruction cache (BTIC),
a cache of branch instructions that have been encountered in branch/loop code
sequences. If a target instruction is in the BTIC, it is fetched into the instruction
queue a cycle sooner than it can be made available from the instruction cache.
Typically, a fetch that hits the BTIC provides the first four instructions in the target
stream
2048-entry branch history table (BHT) with two bits per entry for four levels of
prediction – not-taken, strongly not-taken, taken, and strongly taken
Up to three outstanding speculative branches
Branch instructions that
don’t
update the count register (CTR) or link register (LR)
are often removed from the instruction stream
4
PC7447/57 [Preliminary]
5345C–HIREL–07/05
PC7447/57 [Preliminary]
Eight-entry link register stack to predict the target address of Branch Conditional to
Link Register (BCLR) instructions
– Four integer units (IUs) that share 32 GPRs for integer operands
Three identical IUs (IU1a, IU1b, and IU1c) can execute all integer instructions except
multiply, divide, and move to/from special-purpose register instructions
IU2 executes miscellaneous instructions including the CR logical operations, integer
multiplication and division instructions, and move to/from special-purpose register
instructions
– Five-stage FPU and a 32-entry FPR file
Fully IEEE 754-1985-compliant FPU for both single- and double-precision
operations
Supports non-IEEE mode for time-critical operations
Hardware support for denormalized numbers
Thirty-two 64-bit FPRs for single- or double-precision operands
– Four vector units and 32-entry vector register file (VRs)
Vector permute unit (VPU)
Vector integer unit 1 (VIU1) handles short-latency AltiVec
™
integer instructions, such
as vector add instructions (vaddsbs, vaddshs, and vaddsws, for example)
Vector integer unit 2 (VIU2) handles longer-latency AltiVec integer instructions, such
as vector multiply add instructions (vmhaddshs, vmhraddshs, and vmladduhm, for
example)
Vector floating-point unit (VFPU)
– Three-stage load/store unit (LSU)
Supports integer, floating-point, and vector instruction load/store traffic
Four-entry vector touch queue (VTQ) supports all four architected AltiVec data
stream operations
Three-cycle GPR and AltiVec load latency (byte, half-word, word, vector) with one-
cycle throughput
Four-cycle FPR load latency (single, double) with one-cycle throughput
No additional delay for misaligned access within double-word boundary
5
5345C–HIREL–07/05