PowerPC 476FP Embedded Processor Core User’s Manual Version 2.2

Transcription

Title Page
PowerPC 476FP Embedded Processor Core
User’s Manual
Version 2.2
July 31, 2014
®
Copyright and Disclaimer
© Copyright International Business Machines Corporation 2009, 2014
Printed in the United States of America July 2014
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp.,
registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml.
Other company, product, and service names may be trademarks or service marks of others.
All information contained in this document is subject to change without notice. The products described in this document
are NOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction
could result in death, bodily injury, or catastrophic property damage. The information contained in this document does not
affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied
license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this document was obtained in specific environments, and is presented as an illustration. The results obtained in other operating
environments may vary.
THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN “AS IS” BASIS. In no event will IBM be
liable for damages arising directly or indirectly from any use of the information contained in this document.
IBM Systems and Technology Group
2070 Route 52, Bldg. 330
Hopewell Junction, NY 12533-6351
The IBM home page can be found at ibm.com®.
The IBM microelectronics home page can be found at ibm.com/chips.
Version 2.2
July 31, 2014
User’s Manual
Contents
List of Figures ............................................................................................................... 13
List of Tables ................................................................................................................. 15
Revision Log ................................................................................................................. 19
About this Document .................................................................................................... 23
1. Overview .................................................................................................................... 25
1.1 General Features ............................................................................................................................
1.2 Power Control Features ..................................................................................................................
1.2.1 Power Control Modes ............................................................................................................
1.2.2 Power Control Procedures ....................................................................................................
1.2.2.1 CPU Sleep Mode ............................................................................................................
1.2.2.2 CPU Doze Mode ............................................................................................................
1.2.2.3 Waking up the Processor ...............................................................................................
1.3 Implemented Instruction Set ...........................................................................................................
1.4 Test and Debug Facilities ................................................................................................................
1.5 Floating-Point Unit Overview ...........................................................................................................
1.6 Instruction Cache Overview ............................................................................................................
1.7 Data Cache Unit Overview ..............................................................................................................
1.8 Memory Management Unit Overview ..............................................................................................
1.9 Timers .............................................................................................................................................
26
26
26
28
28
28
28
29
29
30
30
31
31
31
2. Programming Model ................................................................................................. 33
2.1 Storage Addressing .........................................................................................................................
2.1.1 Storage Operands .................................................................................................................
2.1.2 Effective Address Calculation ................................................................................................
2.1.2.1 Data Storage Addressing Modes ...................................................................................
2.1.2.2 Instruction Storage Addressing Modes ..........................................................................
2.1.3 Byte Ordering ........................................................................................................................
2.1.3.1 Structure Mapping Examples .........................................................................................
2.1.3.2 Instruction Byte Ordering ................................................................................................
2.1.3.3 Data Byte Ordering .........................................................................................................
2.1.3.4 Byte-Reverse Instructions ..............................................................................................
2.2 Registers .........................................................................................................................................
2.2.1 Register Types ......................................................................................................................
2.2.1.1 General Purpose Registers ............................................................................................
2.2.1.2 Special Purpose Registers .............................................................................................
2.2.1.3 Condition Register ..........................................................................................................
2.2.1.4 Machine State Register ..................................................................................................
2.2.1.5 Device Control Registers ................................................................................................
2.3 Instruction Classes ..........................................................................................................................
2.3.1 Defined Instruction Class .......................................................................................................
2.3.2 Preserved Instruction Class ...................................................................................................
Version 2.2
July 31, 2014
33
33
34
35
35
36
37
38
39
40
40
45
45
46
46
46
46
47
47
48
Contents
Page 3 of 322
User’s Manual
2.3.3 Reserved Instruction Class ....................................................................................................
2.4 Implemented Instruction Set Summary ...........................................................................................
2.4.1 Integer Instructions ................................................................................................................
2.4.1.1 Integer Storage Access Instructions ...............................................................................
2.4.1.2 Integer Arithmetic Instructions ........................................................................................
2.4.1.3 Integer Logical Instructions .............................................................................................
2.4.1.4 Integer Compare Instructions .........................................................................................
2.4.1.5 Integer Trap Instructions .................................................................................................
2.4.1.6 Integer Rotate Instructions .............................................................................................
2.4.1.7 Integer Shift Instructions .................................................................................................
2.4.1.8 Integer Select Instruction ................................................................................................
2.4.2 Branch Instructions ................................................................................................................
2.4.3 Processor Control Instructions ...............................................................................................
2.4.3.1 Condition Register Logical Instructions ..........................................................................
2.4.3.2 Register Management Instructions .................................................................................
2.4.3.3 System Linkage Instructions ...........................................................................................
2.4.3.4 Processor Synchronization Instruction ...........................................................................
2.4.4 Storage Control Instructions ..................................................................................................
2.4.4.1 Cache Management Instructions ....................................................................................
2.4.4.2 TLB Management Instructions ........................................................................................
2.4.4.3 Storage Synchronization Instructions .............................................................................
2.4.5 Previous Integer Multiply-Accumulate Instructions ................................................................
2.5 Branch Processing ..........................................................................................................................
2.5.1 Branch Addressing .................................................................................................................
2.5.2 Branch Instruction BI Field .....................................................................................................
2.5.3 Branch Instruction BO Field ...................................................................................................
2.5.4 Branch Prediction ...................................................................................................................
2.5.5 Branch Control Registers .......................................................................................................
2.5.5.1 Link Register (LR) ...........................................................................................................
2.5.5.2 Count Register (CTR) .....................................................................................................
2.5.5.3 Condition Register (CR) .................................................................................................
2.6 Integer Processing ..........................................................................................................................
2.6.1 General Purpose Registers (GPRs) .......................................................................................
2.6.2 Fixed-Point Exception Register (XER) ...................................................................................
2.6.2.1 Summary Overflow (SO) Field ........................................................................................
2.6.2.2 Overflow (OV) Field ........................................................................................................
2.6.2.3 Carry (CA) Field ..............................................................................................................
2.6.2.4 Transfer Byte Count (TBC) Field ....................................................................................
2.7 Processor Control ............................................................................................................................
2.7.1 Special Purpose Registers General (USPRG0, SPRG0 - SPRG8) .......................................
2.7.2 Processor Version Register (PVR) ........................................................................................
2.7.3 Processor Identification Register (PIR) ..................................................................................
2.7.4 Core Configuration Register 0 (CCR0) ..................................................................................
2.7.7 Reset Configuration (RSTCFG) .............................................................................................
2.7.8 Device Control Register Immediate Prefix Register (DCRIPR) .............................................
2.8 User and Supervisor Modes ............................................................................................................
2.8.1 Privileged Instructions ............................................................................................................
2.8.2 Privileged SPRs .....................................................................................................................
Contents
Page 4 of 322
48
49
49
50
50
51
51
51
51
52
52
52
52
53
53
53
54
54
54
54
55
55
56
56
57
57
58
59
59
60
60
63
63
64
65
65
66
66
66
67
68
68
69
70
73
74
74
75
75
75
Version 2.2
July 31, 2014
User’s Manual
2.9 Speculative Accesses .....................................................................................................................
2.10 Synchronization .............................................................................................................................
2.10.1 Context Synchronization ......................................................................................................
2.10.2 Execution Synchronization ..................................................................................................
2.10.3 Storage Ordering and Synchronization ...............................................................................
2.10.4 SPRs Requiring Context Synchronization ...........................................................................
2.10.5 Instructions Requiring a Context Synchronization Instruction .............................................
2.11 Storage Model ...............................................................................................................................
76
76
76
78
78
79
80
81
3. Floating-Point Unit Programming Model ................................................................ 85
3.1 Floating-Point Exceptions ............................................................................................................... 85
3.2 Floating-Point Registers .................................................................................................................. 86
3.2.1 Register Types ...................................................................................................................... 86
3.2.1.1 Floating-Point Registers (FPR0 - FPR31) ...................................................................... 86
3.2.1.2 Floating-Point Status and Control Register (FPSCR) ................................................... 87
3.3 Floating-Point Data Formats ........................................................................................................... 89
3.3.1 Value Representation ............................................................................................................ 90
3.3.2 Binary Floating-Point Numbers .............................................................................................. 91
3.3.2.1 Normalized Numbers ...................................................................................................... 91
3.3.2.2 Denormalized Numbers .................................................................................................. 91
3.3.2.3 Zero Values .................................................................................................................... 91
3.3.3 Infinities ................................................................................................................................. 91
3.3.3.1 Not a Numbers ............................................................................................................... 92
3.3.4 Sign of Result ........................................................................................................................ 93
3.3.5 Data Handling and Precision ................................................................................................. 93
3.3.6 Rounding ............................................................................................................................... 94
3.4 Floating-Point Instructions ............................................................................................................... 95
3.4.1 Instructions By Category ....................................................................................................... 96
3.4.2 Load and Store Instructions ................................................................................................... 97
3.4.3 Floating-Point Store Instructions ........................................................................................... 98
3.4.4 Floating-Point Move Instructions ........................................................................................... 99
3.4.5 Floating-Point Arithmetic Instructions .................................................................................. 100
3.4.5.1 Floating-Point Multiply-Add Instructions ....................................................................... 100
3.4.6 Floating-Point Rounding and Conversion Instructions ........................................................ 101
3.4.7 Floating-Point Compare Instructions ................................................................................... 101
3.4.8 Floating-Point Status and Control Register Instructions ...................................................... 102
4. Memory Management Unit ..................................................................................... 103
4.1 Overview .......................................................................................................................................
4.2 Address Translation ......................................................................................................................
4.3 MMU Implementation ....................................................................................................................
4.3.1 Translation Lookaside Buffer ...............................................................................................
4.3.2 UTLB Index Address Hash ..................................................................................................
4.3.3 Initialize a Single UTLB Entry ..............................................................................................
4.3.4 Tag Array .............................................................................................................................
4.3.5 Comparison .........................................................................................................................
4.3.6 Data Array ...........................................................................................................................
4.3.6.1 Hardware Enforced I = 1 = IL1I = IL1D ........................................................................
4.3.7 Writing UTLB Entries ...........................................................................................................
Version 2.2
July 31, 2014
103
103
104
106
106
107
108
109
109
110
110
Contents
Page 5 of 322
User’s Manual
4.3.8 Bolted UTLB Entries ............................................................................................................
4.3.9 Hardware Assisted Way Selection .......................................................................................
4.3.10 Searching UTLB Entries ....................................................................................................
4.3.10.1 Instruction-Side and Data-Side TLB Miss Searches ..................................................
4.3.11 Reading UTLB Entries .......................................................................................................
4.3.12 Invalidating UTLB Entries ..................................................................................................
4.4 Access Control ..............................................................................................................................
4.4.1 Execute Access ...................................................................................................................
4.4.2 Write Access ........................................................................................................................
4.4.3 Read Access ........................................................................................................................
4.4.4 Access Control Applied to Cache Management Instructions ...............................................
4.5 Storage Attributes ..........................................................................................................................
4.5.1 Write-Through (W) ...............................................................................................................
4.5.2 Caching Inhibited (I) .............................................................................................................
4.5.3 Hardware Enforced IL1I and IL1D .......................................................................................
4.5.4 Memory Coherence Required (M) .......................................................................................
4.5.5 Guarded (G) .........................................................................................................................
4.5.6 Endian (E) ............................................................................................................................
4.5.7 User-Definable (U0 - U3) .....................................................................................................
4.5.8 Supported Storage Attribute Combinations .........................................................................
4.5.9 Aliasing ................................................................................................................................
4.6 MMU Registers ..............................................................................................................................
4.6.1 Process ID Register (PID) ....................................................................................................
4.6.2 Real Mode Page Description Register (RMPD) ...................................................................
4.6.3 MMU Bolted Entries 0 Register (MMUBE0) .........................................................................
4.6.4 MMU Bolted Entries 1 Register (MMUBE1) .........................................................................
4.6.5 Search Priority Configuration Registers ...............................................................................
4.6.6 Supervisor Search Priority Configuration Register (SSPCR) ...............................................
4.6.7 Invalidate Search Priority Configuration Register (ISPCR) ..................................................
4.6.8 User Search Priority Configuration Register (USPCR) ........................................................
4.6.9 Reset Configuration Register (RSTCFG) .............................................................................
4.6.10 MMU Configuration Register (MMUCR) ............................................................................
4.7 UTLB Block Descriptions ...............................................................................................................
4.7.1 Tag Array .............................................................................................................................
4.8 Software Considerations ...............................................................................................................
4.8.1 TLB Search Indexed (tlbsx) ................................................................................................
4.8.2 TLB Read Entry (tlbre) ........................................................................................................
4.8.3 TLB Write Entry (tlbwe) .......................................................................................................
4.8.4 TLB Invalidate, Virtual Address Indexed (tlbivax) ...............................................................
4.9 UTLB Coherency ...........................................................................................................................
4.10 tlbsync Special Operations ........................................................................................................
4.10.1 Remote tlbsync operation .................................................................................................
4.10.1.1 CPU Remote tlbsync Operation ................................................................................
4.10.1.2 L2 Cache Remote tlbsync Operations ......................................................................
111
111
111
112
113
113
113
113
114
114
115
116
116
116
117
117
117
118
118
118
118
119
120
120
121
121
122
122
123
125
126
126
127
127
127
128
128
129
130
130
131
131
131
132
5. Instruction and Data Caches .................................................................................. 133
5.1 Cache Array Organization and Operation ..................................................................................... 133
5.2 Instruction Cache Controller .......................................................................................................... 134
5.2.1 I-Cache Operations .............................................................................................................. 134
Contents
Page 6 of 322
Version 2.2
July 31, 2014
User’s Manual
5.2.2 Instruction Cache Parity Operations ....................................................................................
5.2.2.1 Instruction Cache Block Lock Clear (icblc) ..................................................................
5.2.2.2 Instruction Cache Block Invalidate (icbi) ......................................................................
5.2.2.3 Instruction Cache Invalidate (ici) ..................................................................................
5.2.2.4 icbt ...............................................................................................................................
5.2.2.5 icbtls ............................................................................................................................
5.2.2.6 icread ...........................................................................................................................
5.2.2.7 Instruction Cache Debug Data Register 0 (ICDBDR0) .................................................
5.2.2.8 Instruction Cache Debug Data Register 1 (ICDBDR1) .................................................
5.2.2.9 Instruction Cache Debug Tag Register Low (ICDBTRL) ..............................................
5.2.2.10 Instruction Cache Debug Tag Register High (ICDBTRH) ..........................................
5.2.2.11 Instruction Cache Parity Operations ...........................................................................
5.2.3 Speculative Prefetch ............................................................................................................
5.2.4 Exceptions ...........................................................................................................................
5.2.4.1 Instruction Storage Interrupt .........................................................................................
5.2.4.2 Instruction-Side UTLB Miss ..........................................................................................
5.2.4.3 Instruction-Side Machine Check ...................................................................................
5.3 ICU Special Purpose Registers .....................................................................................................
5.3.1 Instruction Cache Error Syndrome Register (ICESR) .........................................................
5.4 Self-Modifying Code ......................................................................................................................
5.5 Data Cache Controller ...................................................................................................................
5.5.1 DCU Operations ..................................................................................................................
5.5.1.1 Load Operations ...........................................................................................................
5.5.1.2 Store Operations ..........................................................................................................
5.5.2 Store Gathering ...................................................................................................................
5.5.3 Line Flush Operations .........................................................................................................
5.5.4 Storage Access Ordering ....................................................................................................
5.5.5 Data Cache Coherency .......................................................................................................
5.5.6 Data Cache Control and Debug ..........................................................................................
5.5.7 Data Cache Management and Debug Instruction Summary ...............................................
5.5.7.1 Data Cache Block Zero (dcbz) .....................................................................................
5.5.8 Data Cache Block Lock Clear (dcblc) .................................................................................
5.5.9 Data Cache Block Store (dcbst) .........................................................................................
5.5.10 Data Cache Block Flush (dcbf) .........................................................................................
5.5.11 Data Cache Block Invalidate (dcbi) ...................................................................................
5.5.12 Data Cache Invalidate (dci) ...............................................................................................
5.5.13 Data Cache Block Touch (dcbt) ........................................................................................
5.5.14 Data Cache Block Touch with Lock Set (dcbtls) ..............................................................
5.5.15 Data Cache Block Touch for Store (dcbtst) ......................................................................
5.5.16 Data Cache Block Touch For Store with Lock Set (dcbtstls) ...........................................
5.5.17 Data Cache Read (dcread) ...............................................................................................
5.5.18 Memory Barrier Instructions ..............................................................................................
5.5.18.1 Memory Synchronization (msync) .............................................................................
5.5.18.2 Memory Barrier (mbar) ..............................................................................................
5.5.18.3 Lightweight Sync (lwsync) .........................................................................................
5.5.19 Core Configuration Registers (CCR0, CCR1, and CCR2) ................................................
5.5.20 dcbt and dcbtst Operation ...............................................................................................
5.5.21 dcread Operation ..............................................................................................................
5.5.22 Data Cache Debug Tag Register Low (DCDBTRL) ..........................................................
5.5.23 Data Cache Debug Tag Register High (DCDBTRH) .........................................................
Version 2.2
July 31, 2014
134
135
135
135
135
136
136
137
137
137
138
138
138
139
139
139
139
139
140
140
141
142
142
142
142
143
143
144
144
144
144
145
145
145
146
146
147
147
148
148
149
149
150
150
150
150
151
151
152
153
Contents
Page 7 of 322
User’s Manual
5.5.24 Data Cache Parity Operations ........................................................................................... 153
5.5.24.1 Data Cache Exception Status Register (DCESR) ...................................................... 153
5.5.25 Simulating Data Cache Parity Errors for Software Testing ................................................ 155
6. Timer Facilities ........................................................................................................ 157
6.1 Time Base .....................................................................................................................................
6.1.1 Reading the Time Base .......................................................................................................
6.1.2 Writing the Time Base ..........................................................................................................
6.2 Decrementer and Decrementer Autoreload Registers ..................................................................
6.3 Fixed-Interval Timer ......................................................................................................................
6.4 Watchdog Timer ............................................................................................................................
6.5 Timer Control Register ..................................................................................................................
6.6 Timer Status Register ....................................................................................................................
6.7 Halting the Timer Facilities ............................................................................................................
6.8 Selection of the Timer Clock Source .............................................................................................
158
159
159
159
160
161
163
164
165
165
7. Processor Interrupts and Exceptions .................................................................... 167
7.1 Overview .......................................................................................................................................
7.2 Interrupt Classes ...........................................................................................................................
7.2.1 Asynchronous Interrupts ......................................................................................................
7.2.2 Synchronous Interrupts ........................................................................................................
7.2.2.1 Synchronous, Precise Interrupts ..................................................................................
7.2.2.2 Synchronous, Imprecise Interrupts ...............................................................................
7.2.3 Critical and Noncritical Interrupts .........................................................................................
7.2.4 Machine Check Interrupts ....................................................................................................
7.3 Interrupt Processing ......................................................................................................................
7.3.1 Partially Executed Instructions .............................................................................................
7.4 Interrupt Processing Registers ......................................................................................................
7.4.1 Machine State Register (MSR) ............................................................................................
7.4.2 Save/Restore Register 0 (SRR0) .........................................................................................
7.4.3 Save/Restore Register 1 (SRR1) .........................................................................................
7.4.4 Critical Save/Restore Register 0 (CSRR0) ..........................................................................
7.4.5 Critical Save/Restore Register 1 (CSRR1) ..........................................................................
7.4.6 Machine Check Save/Restore Register 0 (MCSRR0) .........................................................
7.4.7 Machine Check Save/Restore Register 1 (MCSRR1) .........................................................
7.4.8 Data Exception Address Register (DEAR) ...........................................................................
7.4.9 Interrupt Vector Offset Registers (IVOR0 - IVOR15) ...........................................................
7.4.10 Interrupt Vector Prefix Register (IVPR) ..............................................................................
7.4.11 Exception Syndrome Register (ESR) .................................................................................
7.4.12 Machine Check Syndrome Register (MCSR) ....................................................................
7.5 Interrupt Definitions .......................................................................................................................
7.5.1 Critical Input Interrupt ...........................................................................................................
7.5.2 Machine Check Interrupt ......................................................................................................
7.5.3 Data Storage Interrupt .........................................................................................................
7.5.4 Instruction Storage Interrupt ................................................................................................
7.5.5 External Input Interrupt ........................................................................................................
7.5.6 Alignment Interrupt ...............................................................................................................
7.5.7 Program Interrupt .................................................................................................................
7.5.8 Floating-Point Unavailable Interrupt .....................................................................................
Contents
Page 8 of 322
167
167
167
168
168
168
169
169
170
172
173
173
174
175
175
176
176
177
177
178
179
179
181
182
185
186
188
190
191
192
193
196
Version 2.2
July 31, 2014
User’s Manual
7.5.9 System Call Interrupt ...........................................................................................................
7.5.10 Decrementer Interrupt .......................................................................................................
7.5.11 Fixed-Interval Timer Interrupt ............................................................................................
7.5.12 Watchdog Timer Interrupt ..................................................................................................
7.5.13 Data TLB Error Interrupt ....................................................................................................
7.5.14 Instruction TLB Error Interrupt ...........................................................................................
7.5.15 Debug Interrupt ..................................................................................................................
7.6 Interrupt Ordering and Masking ....................................................................................................
7.6.1 Interrupt Ordering Software Requirements ..........................................................................
7.6.2 Interrupt Order .....................................................................................................................
7.7 Exception Priorities .......................................................................................................................
7.7.1 Exception Priorities for Integer Load, Store, and Cache Management Instructions ............
7.7.2 Exception Priorities for Floating-Point Load and Store Instructions ....................................
7.7.3 Exception Priorities for Allocated Load and Store Instructions ............................................
7.7.4 Exception Priorities for Floating-Point Instructions (Other) ..................................................
7.7.5 Exception Priorities for Allocated Instructions (Other) .........................................................
7.7.6 Exception Priorities for Privileged Instructions ....................................................................
7.7.7 Exception Priorities for Trap Instructions .............................................................................
7.7.8 Exception Priorities for System Call Instruction ...................................................................
7.7.9 Exception Priorities for Branch Instructions .........................................................................
7.7.10 Exception Priorities for Return From Interrupt Instructions ................................................
7.7.11 Exception Priorities for Preserved Instructions ..................................................................
7.7.12 Exception Priorities for Reserved Instructions ...................................................................
7.7.13 Exception Priorities for All Other Instructions ....................................................................
197
197
198
199
199
201
201
207
208
209
210
211
211
212
212
213
214
214
214
215
215
215
216
216
8. Debug Facilities ...................................................................................................... 217
8.1 Development Tool Support ...........................................................................................................
8.2 Debug Modes ................................................................................................................................
8.2.1 Internal Debug Mode ...........................................................................................................
8.2.2 External Debug Mode ..........................................................................................................
8.2.3 Trace Mode .........................................................................................................................
8.2.4 Debug Wait Enable Mode ....................................................................................................
8.3 Debug Events ................................................................................................................................
8.3.1 Broadcast of Debug Events .................................................................................................
8.3.2 Exceptions ...........................................................................................................................
8.3.3 Instruction Address Comparison .........................................................................................
8.3.3.1 IAC Debug Events ........................................................................................................
8.3.3.2 Exact Comparison Mode ..............................................................................................
8.3.3.3 Range Inclusive Comparison Mode .............................................................................
8.3.3.4 Range Exclusive Comparison Mode ............................................................................
8.3.3.5 IAC User/Supervisor Field ............................................................................................
8.3.3.6 IAC Effective/Real Address Field .................................................................................
8.3.3.7 IAC Range Mode Autotoggle Field ...............................................................................
8.3.4 Data Address Comparison ..................................................................................................
8.3.4.1 DAC Debug Event Fields .............................................................................................
8.3.4.2 DAC Debug Events Applied to Instructions that Result in Multiple Storage Accesses
8.3.4.3 DAC Debug Events Applied to Various Instruction Types ............................................
8.3.4.4 Data Value Compare (DVC) Debug Event ...................................................................
8.3.5 Trap .....................................................................................................................................
8.3.6 Branch Taken ......................................................................................................................
Version 2.2
July 31, 2014
217
217
217
218
218
218
219
219
219
220
220
221
221
221
222
222
222
223
223
226
226
227
230
230
Contents
Page 9 of 322
User’s Manual
8.3.7 Instruction Completed ..........................................................................................................
8.3.8 Return Debug Events ...........................................................................................................
8.3.9 Interrupt Debug Events ........................................................................................................
8.3.10 Unconditional Debug Events ..............................................................................................
8.4 Debug Timer Freeze .....................................................................................................................
8.5 Debug Special Purpose Registers ................................................................................................
8.5.1 Debug Control Register 0 (DBCR0) .....................................................................................
8.5.4 Debug Status Register (DBSR) ...........................................................................................
8.5.5 Setting the DBSR Based on MSR[DE] and DBCR0[IDM] ....................................................
8.5.6 Instruction Address Comparison 1 - 4 (IAC1 - IAC4) ...........................................................
8.5.7 Setup Order for IACs, DACs, and DVCs ..............................................................................
8.6 JTAG and Debug Capabilities in a Multiprocessor SoC Environment ...........................................
8.6.1 Debug Bus Out Mask Register (DBOMask) .........................................................................
8.6.2 Debug Input Mask Register (DBIMask) ...............................................................................
231
232
233
234
234
235
235
236
237
239
240
240
240
241
241
242
9. Initialization .............................................................................................................. 243
9.1 Processor Core State after Reset .................................................................................................
9.2 Reset Types ..................................................................................................................................
9.3 Reset Sources ...............................................................................................................................
9.4 Initialization Software Requirements .............................................................................................
243
249
250
250
10. L2 Cache and UTLB Synchronous Interfaces ..................................................... 255
10.1 L2 Cache Interface ......................................................................................................................
10.2 L2 Cache Features ......................................................................................................................
10.2.1 L2 Cache Storage Reservation Management ....................................................................
10.2.2 Performance Monitor .........................................................................................................
10.2.2.1 Performance Monitor Unit Core Control Register 0 (PMUCC0) .................................
10.2.3 Cache Operations Handling ...............................................................................................
10.2.4 tlbivax, tlbsync, msync, mbar Handling ..........................................................................
10.3 L1 Cache UTLB Snoop Interface ................................................................................................
255
257
257
258
259
259
261
261
Appendix A. Register Summary ................................................................................. 263
A.1 Data Cache Address Compare 1 Register (DAC1) .......................................................................
A.2 Data Cache Address Compare 2 Register (DAC2) .......................................................................
A.3 Data Cache Value Compare 1 Register (DVC1) ...........................................................................
A.4 Data Cache Value Compare 2 Register (DVC2) ...........................................................................
A.5 Debug Data Register (DBDR) .......................................................................................................
A.6 Data Cache Exception Syndrome Register (DCESR) ..................................................................
A.7 Instruction Opcode Compare Control Register (IOCCR) ..............................................................
A.8 Instruction Opcode Compare Register 1 (IOCR1) ........................................................................
A.9 Instruction Opcode Compare Register 2 (IOCR2) ........................................................................
266
266
266
266
266
268
269
269
270
Appendix B. Instruction Summary ............................................................................. 271
B.1 Instructions That Behave Differently from the Power ISA Specification ....................................... 271
B.2 Unsupported Power ISA Instructions ............................................................................................ 271
Contents
Page 10 of 322
Version 2.2
July 31, 2014
User’s Manual
B.3 Integer Instructions in the PowerPC 476FP Processor ................................................................ 271
B.4 Floating-Point Instructions ............................................................................................................ 276
Appendix C. Instruction Execution Performance for Code Optimization .............. 279
C.1 PowerPC 476FP Pipeline Overview .............................................................................................
C.1.1 PowerPC 476FP Integer Pipelines .....................................................................................
C.1.1.1 ICRD, IST, and ISD Pipeline Stages ...........................................................................
C.1.1.2 DISS Stage ..................................................................................................................
C.1.1.3 RACC Stage ................................................................................................................
C.1.1.4 Execution Pipeline Stages ...........................................................................................
C.1.2 PowerPC 476FP Floating-Point Pipelines ..........................................................................
C.2 Instruction Execution Latency and Penalty ...................................................................................
C.3 Instruction Fetch and Decode .......................................................................................................
C.3.1 Instruction Fetch Address Arbitration and Fetch Process ...................................................
C.3.2 Instruction Predecode, Instruction Field Adjust, and Endian Adjust ...................................
C.3.2.1 Instruction Field Adjust ................................................................................................
C.3.3 Instruction Predecode .........................................................................................................
C.4 Branch Prediction and Branch Instruction Processing .................................................................
C.4.1 Branch History Table Operation ..........................................................................................
C.4.2 Global History Register Operation ......................................................................................
C.4.3 Branch Target Address CAM (BTAC) Operation ................................................................
C.4.4 Branch Link-Stack Operation ..............................................................................................
C.4.5 Branch Instruction process .................................................................................................
C.4.6 Branch Information Queue Operation .................................................................................
C.5 Instruction Issue Operation ...........................................................................................................
C.5.1 L-Pipe Instructions ..............................................................................................................
C.5.2 I-Pipe Instructions ...............................................................................................................
C.5.3 I-Pipe and J-Pipe Instructions .............................................................................................
C.5.4 B-Pipe Instructions ..............................................................................................................
C.5.5 FA-pipe Instructions ............................................................................................................
C.5.6 FP FL-pipe Instructions .......................................................................................................
C.5.7 Special Issue Rules for System Synchronizing Instructions ...............................................
C.6 Instruction Execution and Penalties .............................................................................................
C.6.1 Contention for the Same RACC Stage ...............................................................................
C.6.2 GPR Operand Dependency ................................................................................................
C.6.3 General CR Operand Dependency .....................................................................................
C.6.4 Multiply Dependency ...........................................................................................................
C.6.5 Multiply-Accumulate (MAC) Dependency ...........................................................................
C.6.6 Divide Dependency .............................................................................................................
C.6.7 Move to Condition Register Fields (mtcrf) Instruction Dependency ...................................
C.6.8 Store Word Conditional Indexed (stwcx.) Instruction Dependency ....................................
C.6.9 Move from Conditional Register (mfcr) Instruction Dependency ........................................
C.6.10 Move from Special Purpose Register (mfspr) Dependency .............................................
C.6.11 Move from Machine State Register (mfmsr) Dependency ...............................................
C.6.12 Move to Special Purpose Register (mtspr) Dependency .................................................
C.6.13 TLB Management Instruction Dependency .......................................................................
C.6.14 DCR Register Managing Instruction Operation Dependency ...........................................
C.6.15 Processor Control Instruction Operation ...........................................................................
C.6.16 Load Instruction Dependency ...........................................................................................
Version 2.2
July 31, 2014
279
279
280
281
281
281
283
284
289
290
291
291
291
292
294
295
296
296
297
297
298
298
299
299
299
300
300
300
300
303
303
304
304
305
305
305
306
306
307
307
307
308
308
309
310
Contents
Page 11 of 322
User’s Manual
C.6.17 Load/Store Operations ......................................................................................................
C.6.18 String and Multiple Operations ..........................................................................................
C.6.19 lwarx and stwcx. Operations ............................................................................................
C.6.20 Storage Ordering and Synchronizing Operations ..............................................................
C.6.21 Special TLB Managing Operations ....................................................................................
C.7 Interrupt Handling .........................................................................................................................
310
310
311
311
311
312
Glossary ....................................................................................................................... 313
Index ............................................................................................................................. 317
Contents
Page 12 of 322
Version 2.2
July 31, 2014
User’s Manual
List of Figures
Figure 1-1.
PowerPC 476FP Embedded Processor Core Block Diagram ................................................ 25
Figure 2-1.
User Programming Model Registers ...................................................................................... 41
Figure 2-2.
Supervisor Programming Model Registers ............................................................................ 42
Figure 3-1.
Approximation to Real Numbers ............................................................................................ 90
Figure 3-2.
Selection of z1 and z2 ............................................................................................................ 95
Figure 4-1.
Address Mapping for each Page Size .................................................................................. 104
Figure 4-2.
MMU Block Diagram ............................................................................................................ 105
Figure 4-3.
Supervisor Search Priority Configuration Registers ............................................................. 123
Figure 4-4.
Invalidate Search Priority Configuration Register ................................................................ 124
Figure 4-5.
User Search Priority Configuration Registers (USPCR) ...................................................... 126
Figure 6-1.
Relationship of Timer Facilities to the Time Base ................................................................ 157
Figure 6-2.
Watchdog State Machine ..................................................................................................... 163
Figure 8-1.
JTAG-Controlled MP DBSR Monitor Capability ................................................................... 241
Figure 8-2.
JTAG-Controlled MP Stop and Run Control Capability. ....................................................... 242
Figure 10-1. L2 Cache and Interface Block Diagram ............................................................................... 256
Figure C-1.
PowerPC 476FP Integer Pipeline Structure ......................................................................... 280
Figure C-2.
PowerPC 476FP Floating-Point Pipeline Structure .............................................................. 283
Figure C-3.
Instruction Sequence Without a Dependency ...................................................................... 286
Figure C-4.
Instruction sequence with a dependency ............................................................................. 288
Figure C-5.
Load Instruction Followed by an add with a Dependency on the Load ................................ 289
Figure C-6.
Typical Branch-Predict-Taken Timing Diagram
(Branch Target Address is Computed at ISD) ...................................................................... 293
Figure C-7.
TBTAC and BHT Based Branch-Predict-Taken Timing Diagram
(BTAC Hit and BTAC Contains the Branch Target Address) ............................................... 294
Figure C-8.
Link-Stack Based Branch-Predict-taken Timing Diagram
(Link-Stack Pops the Branch Target Address at Clock 3) .................................................... 294
Figure C-9.
GHR use for BHT Lookup .................................................................................................... 296
Figure C-10. Instruction Sequence Example with no Dependency on the Integer Unit ............................ 302
Version 2.2
July 31, 2014
List of Figures
Page 13 of 322
User’s Manual
List of Figures
Page 14 of 322
Version 2.2
July 31, 2014
User’s Manual
List of Tables
Table 1-1.
PowerPC 476FP Power Savings Modes ................................................................................ 27
Table 1-2.
PowerPC 476FP Frequency switching ................................................................................... 27
Table 1-3.
Frequency Switching Examples ............................................................................................. 28
Table 2-1.
Data Operand Definitions ....................................................................................................... 34
Table 2-2.
Alignment Effects for Storage Access Instructions ................................................................ 34
Table 2-3.
Big-Endian Mapping of Structure S ........................................................................................ 37
Table 2-4.
Little-Endian Mapping of Structure S ..................................................................................... 38
Table 2-5.
PowerPC 476FP SPRs .......................................................................................................... 43
Table 2-6.
Instruction Categories ........................................................................................................... 49
Table 2-7.
Integer Storage Access Instructions ...................................................................................... 50
Table 2-8.
Integer Arithmetic Instructions ................................................................................................ 50
Table 2-9.
Integer Logical Instructions .................................................................................................... 51
Table 2-10.
Integer Compare Instructions ................................................................................................. 51
Table 2-11.
Integer Trap Instructions ........................................................................................................ 51
Table 2-12.
Integer Rotate Instructions ..................................................................................................... 51
Table 2-13.
Integer Shift Instructions ........................................................................................................ 52
Table 2-14.
Integer Select Instruction ....................................................................................................... 52
Table 2-15.
Branch Instructions ................................................................................................................ 52
Table 2-16.
Condition Register Logical Instructions .................................................................................. 53
Table 2-17.
Register Management Instructions ........................................................................................ 53
Table 2-18.
System Linkage Instructions .................................................................................................. 53
Table 2-19.
Processor Synchronization Instruction ................................................................................... 54
Table 2-20.
Cache Management Instructions ........................................................................................... 54
Table 2-21.
TLB Management Instructions ............................................................................................... 55
Table 2-22.
Storage Synchronization Instructions ..................................................................................... 55
Table 2-23.
Previous Integer Multiply-Accumulate Instructions ................................................................ 56
Table 2-24.
BO Field Definition ................................................................................................................. 57
Table 2-25.
BO Field Examples ................................................................................................................ 58
Table 2-26.
CR Updating Instructions ....................................................................................................... 61
Table 2-27.
XER[SO,OV] Updating Instructions ........................................................................................ 65
Table 2-28.
XER[CA] Updating Instructions .............................................................................................. 65
Table 2-29.
Privileged Instructions ............................................................................................................ 75
Table 3-1.
Invalid Operation Exception Categories ................................................................................. 85
Table 3-2.
Format Fields ......................................................................................................................... 90
Table 3-3.
IEEE 754 Floating-Point Fields .............................................................................................. 90
Table 3-4.
Rounding Modes .................................................................................................................... 95
Table 3-5.
Floating-Point Load Instructions ............................................................................................. 98
Table 3-6.
Floating-Point Store Instructions ............................................................................................ 99
Version 2.2
July 31, 2014
List of Tables
Page 15 of 322
User’s Manual
Table 3-7.
Floating-Point Move Instructions ..........................................................................................100
Table 3-8.
Floating-Point Elementary Arithmetic Instructions ................................................................100
Table 3-9.
Floating-Point Multiply-Add Instructions ...............................................................................101
Table 3-10.
Floating-Point Rounding and Conversion Instructions .........................................................101
Table 3-11.
Comparison Sets ..................................................................................................................102
Table 3-12.
Floating-Point Compare and Select Instructions ..................................................................102
Table 3-13.
Floating-Point Status and Control Register Instructions .......................................................102
Table 4-1.
PowerPC 476FP Processor MMU ........................................................................................103
Table 4-2.
UTLB Set Address Generation Hashing Function ................................................................107
Table 4-3.
UTLB Tag Field Description .................................................................................................108
Table 4-4.
EPN and EA Comparison .....................................................................................................109
Table 4-5.
UTLB Data Field Description ................................................................................................110
Table 4-6.
Access Control Applied to Cache Management Instructions ................................................117
Table 4-7.
MMU SPR Summary ............................................................................................................119
Table 5-1.
Instruction and Data Cache Array Organization ...................................................................133
Table 5-2.
Cache Size and Parameters .................................................................................................134
Table 5-3.
EA Format icread .................................................................................................................136
Table 5-4.
ICU Special Purpose Registers ............................................................................................139
Table 5-5.
Effective Address Format for icread and dcread .................................................................151
Table 6-1.
Timer Register Summary ......................................................................................................158
Table 6-2.
Fixed-Interval Timer Period Selection ..................................................................................161
Table 6-3.
Watchdog Timer Period Selection ........................................................................................161
Table 6-4.
Watchdog Timer Exception Behavior ...................................................................................162
Table 7-1.
Interrupt Types Associated with each IVOR .........................................................................178
Table 7-2.
Interrupt and Exception Types ..............................................................................................183
Table 7-3.
BRT Debug Event Actions ....................................................................................................202
Table 7-4.
TRAP Debug Event Actions .................................................................................................203
Table 7-5.
RET Debug Event Actions ....................................................................................................203
Table 7-6.
ICMP Debug Event Actions ..................................................................................................204
Table 7-7.
IRPT Debug Event Actions ...................................................................................................204
Table 7-8.
UDE Debug Event Actions ...................................................................................................204
Table 8-1.
IAC Range Mode Toggle Summary ......................................................................................223
Table 8-2.
Trap Debug Event Actions ....................................................................................................230
Table 8-3.
BRT Debug Event Actions ....................................................................................................231
Table 8-4.
ICMP Debug Event Actions ..................................................................................................231
Table 8-5.
RET Debug Event Actions ....................................................................................................232
Table 8-6.
IRPT Debug Event Actions ...................................................................................................233
Table 8-7.
UDE Debug Event Actions ...................................................................................................234
Table 8-8.
Setting the DBSR based on MSR[DE] and DBCR0[IDM] .....................................................240
List of Tables
Page 16 of 322
Version 2.2
July 31, 2014
User’s Manual
Table 9-1.
Reset Values of Registers and Other PowerPC 476FP Facilities ........................................ 244
Table 10-1.
lwarx and stwcx. Actions in the L2 Cache and Processor Core ......................................... 258
Table 10-2.
CT Field Value and Cache Level ......................................................................................... 259
Table 10-3.
Cache Operations ................................................................................................................ 259
Table A-1.
Register Categories ............................................................................................................. 263
Table B-1.
New Instructions in the PowerPC 476FP Core .................................................................... 271
Table B-2.
Power ISA V2.05 Integer Instructions .................................................................................. 271
Table B-3.
Floating-Point Instructions .................................................................................................... 276
Table C-1.
Instruction Predecode Bit Definition ..................................................................................... 292
Table C-2.
Branch Prediction and BHT, GHR, and BTAC Use .............................................................. 293
Table C-3.
Link-Stack Operations .......................................................................................................... 297
Version 2.2
July 31, 2014
List of Tables
Page 17 of 322
User’s Manual
List of Tables
Page 18 of 322
Version 2.2
July 31, 2014
User’s Manual
Revision Log
Revision Date
Description
July 31, 2014
Version 2.2
• Revised Section 2.7.4 Core Configuration Register 0 (CCR0) on page 69.
February 26, 2014
Version 2.1
• Revised Section 7.5.15 Debug Interrupt on page 201.
• Revised Section 8.2.1 Internal Debug Mode on page 217.
• Revised Section 8.2.3 Trace Mode on page 218.
January 10, 2014
Version 2.0
• Revised Section 5.2.2.9 Instruction Cache Debug Tag Register Low (ICDBTRL) on page 137.
June 24, 2013
May 1, 2013
Version 1.9
• Revised Appendix C.2 Instruction Execution Latency and Penalty on page 284.
• Revised Figure C-5 Load Instruction Followed by an add with a Dependency on the Load on page 289.
Version 1.8
• Revised Table 4-3 UTLB Tag Field Description on page 108.
• Revised Table 4-5 UTLB Data Field Description on page 110.
• Revised Figure 6-1 Relationship of Timer Facilities to the Time Base on page 157.
• Revised Section 7.5.6 Alignment Interrupt on page 192.
• Revised Section 9.1 Processor Core State after Reset on page 243.
• Revised Section 9.4 Initialization Software Requirements on page 250.
April 20, 2012
Version 1.7
• Revised Section 2.10.3 Storage Ordering and Synchronization on page 78.
• Revised Section 4.3.3 Initialize a Single UTLB Entry on page 107.
• Revised Section 4.8.1 TLB Search Indexed (tlbsx) on page 128.
• Revised Section 7.4.12 Machine Check Syndrome Register (MCSR) on page 181.
• Revised Table 9-1 Reset Values of Registers and Other PowerPC 476FP Facilities on page 244.
• Changed ESR[MCI] to ESR[ISMC] throughout the book.
October 26, 2011
Version 1.6
• Revised the book title.
• Revised About this Document on page 23.
• Revised Related Publications on page 23.
• Revised Section 3 Floating-Point Unit Programming Model on page 85.
• Revised Section 3.2.1.2 Floating-Point Status and Control Register (FPSCR) on page 87.
• Revised Section 4.5.5 Guarded (G) on page 117.
• Revised Section 4.6.2 Real Mode Page Description Register (RMPD) on page 120.
• Revised Section 5.2.2.6 icread on page 136.
• Revised Section 6.8 Selection of the Timer Clock Source on page 165.
• Revised Table 9-1 Reset Values of Registers and Other PowerPC 476FP Facilities on page 244.
• Revised Section 9.4 Initialization Software Requirements on page 250.
April 13, 2011
Version 2.2
July 31, 2014
Version 1.5
• Added “PowerPC 470S Synthesizable Core” to the title page and a reference to the 407S core in About
this Document on page 23.
• Revised Section 4.2 Address Translation on page 103.
• Revised Figure 4-1 Address Mapping for each Page Size on page 104.
• Revised Table 4-3 UTLB Tag Field Description on page 108.
• Revised Table 4-5 UTLB Data Field Description on page 110.
• Revised Table A-1 Register Categories on page 263.
Revision Log
Page 19 of 322
User’s Manual
Revision Date
January 19, 2011
Description
Version 1.4
• Changed SSPCR to ISPCR in Section 4.3.12 Invalidating UTLB Entries on page 113.
• Made the following changes in Section 4.8.4 TLB Invalidate, Virtual Address Indexed (tlbivax) on
page 130:
– Removed a reference to tlbivax.
– Changed SSPCR to ISPCR.
– Removed a reference to USPCR entries.
– Changed isync to tlbsync.
• Added references to SPR addresses of x‘23C’ and x‘33C’ in Section 7.4.12 Machine Check Syndrome
Register (MCSR) on page 181.
November 23, 2010
Version 1.3
• Removed a reference to the mfapidi instruction in Section 2.3.1 Defined Instruction Class on page 47.
• Changed SPRG7 to SPRG8 in Section 2.7.1 Special Purpose Registers General (USPRG0, SPRG0 SPRG8) on page 67.
• Added bit 21 DPC to Core Configuration Register 1 (CCR1) on page 70.
• Made corrections to text and code in Section 4.8.2 TLB Read Entry (tlbre) on page 128 and Section 4.8.3
TLB Write Entry (tlbwe) on page 129.
• Added information that tlbivax and tlbsync are never executed simultaneously in Section 4.9 UTLB
Coherency on page 130.
• Added information that an isync must follow an ici instruction in Section 5.2.2.3 Instruction Cache Invalidate (ici) on page 135.
• Added information that an isync instead of an msync can follow a dci instruction in Section 5.5.12 Data
Cache Invalidate (dci) on page 146.
• Changed “privileged instructions cannot be executed” to read “the processor is in privileged state” for bit
49 in Section 7.4.1 Machine State Register (MSR) on page 173.
• Removed a reference to tlbiva in Section 7.5.7 Program Interrupt on page 193.
• Changed dccci to dci, iccci to ici, and removed a reference to tlbiva in Section 7.7.6 Exception Priorities for Privileged Instructions on page 214.
• Added Section 8.6 JTAG and Debug Capabilities in a Multiprocessor SoC Environment on page 241.
• Added substeps to step 4 in Section 9.4 Initialization Software Requirements on page 250.
• Made minor updates to Table 10-3 Cache Operations on page 259.
• Made minor updates to Section A Register Summary on page 263.
• Moved the Debug Bus Out Mask Register (DBOMask) from Appendix A to Section 8.6.1 Debug Bus Out
Mask Register (DBOMask) on page 241.
• Moved the Debug Input Mask Register (DBIMask) from Appendix A to Section 8.6.2 Debug Input Mask
Register (DBIMask) on page 242.
September 16, 2010
Version 1.2
Made minor technical corrections to the following sections:
• Section 1 Overview on page 25.
• Section 4.3.3 Initialize a Single UTLB Entry on page 107.
• Section 4.6.9 Reset Configuration Register (RSTCFG) on page 126.
• Section 4.8.3 TLB Write Entry (tlbwe) on page 129.
• Section 5.1 Cache Array Organization and Operation on page 133.
• Section 9.4 Initialization Software Requirements (step f on page 252).
Revision Log
Page 20 of 322
Version 2.2
July 31, 2014
User’s Manual
Revision Date
Description
September 1, 2010
Version 1.1
• Made various changes to the following sections:
– Section 1.2 Power Control Features on page 26
– Section 2.2 Registers on page 40
– Section 2.11 Storage Model on page 81
– Section 3.2 Floating-Point Registers on page 86
– Section 3.4 Floating-Point Instructions on page 95
– Section 4.3 MMU Implementation on page 104
– Section 4.4 Access Control on page 113
– Section 4.10 tlbsync Special Operations on page 131
– Section 5 Instruction and Data Caches on page 133
– Section 5.1 Cache Array Organization and Operation on page 133
– Section 5.2 Instruction Cache Controller on page 134
– Section 5.3 ICU Special Purpose Registers on page 139
– Section 5.4 Self-Modifying Code on page 140
– Section 5.5 Data Cache Controller on page 141
– Section 7.1 Overview on page 167
– Section 7.5 Interrupt Definitions on page 182
– Section 8 Debug Facilities on page 217 (throughout the entire chapter)
– Section 10.2 L2 Cache Features on page 257
– Section 10.3 L1 Cache UTLB Snoop Interface on page 261
– Appendix A Register Summary on page 263
• Changed the register bit numbers from [0:31] to [32:63] in the following sections:
– Section 2.5.5.2 Count Register (CTR) on page 60
– Section 2.5.5.3 Condition Register (CR) on page 60
– Section 2.6.1 General Purpose Registers (GPRs) on page 63
– Section 2.6.2 Fixed-Point Exception Register (XER) on page 64
– Section 2.7.1 Special Purpose Registers General (USPRG0, SPRG0 - SPRG8) on page 67
– Section 2.7.2 Processor Version Register (PVR) on page 68
– Section 2.7.3 Processor Identification Register (PIR) on page 68
– Section 3.2.1.2 Floating-Point Status and Control Register (FPSCR) on page 87
– Section 6.5 Timer Control Register on page 163
– Section 6.6 Timer Status Register on page 164
– Section 7.4.1 Machine State Register (MSR) on page 173
– Section 7.4.2 Save/Restore Register 0 (SRR0) on page 174
– Section 7.4.4 Critical Save/Restore Register 0 (CSRR0) on page 175
– Section 7.4.5 Critical Save/Restore Register 1 (CSRR1) on page 176
– Section 7.4.6 Machine Check Save/Restore Register 0 (MCSRR0) on page 176
– Appendix A Register Summary on page 263
• Added Appendix C Instruction Execution Performance for Code Optimization on page 279.
September 15, 2009
Initial release.
Version 2.2
July 31, 2014
Revision Log
Page 21 of 322
User’s Manual
Revision Log
Page 22 of 322
Version 2.2
July 31, 2014
User’s Manual
About this Document
This user’s manual describes the IBM® PowerPC® 476FP core. The core can be embedded into higher-function application-specific integrated circuit (ASIC) designs to provide a comprehensive control and computation device. The document provides information about the registers, facilities, initialization, and use of the
processor core. It is intended for the use of programmers and engineers who are creating software to control
the processor.
Related Publications
The following document can be helpful a reference when reading this user’s manual:
• Power ISA, version 2.05
The following documents are IBM confidential. For access to these documents, contact your IBM representative:
• PowerPC 476FP Embedded Processor Core Support Manual
• PowerPC 470S Synthesizable Core Support Manual
• PowerPC 476FP L2 Cache Core Databook
• DCR Arbiter Core Data Book
• Multiprocessor Interrupt Controller Data Book
Documentation Conventions
This section explains numbers, bit fields, instructions, and signals that are in this document.
Representation of Numbers
Numbers are generally shown in decimal format, unless designated as follows:
• Hexadecimal values are preceded by an “x” and enclosed in single quotation marks.
For example: x‘0A00’.
• Binary values in sentences are shown in single quotation marks.
For example: ‘1010’.
Note: A bit value that is immaterial, which is called a “don't care” bit, is represented by an “x.”
Bit Significance
In the PowerPC 476FP documentation, the smallest bit number represents the most significant bit of a field,
and the largest bit number represents the least significant bit of a field.
Other Conventions
PowerPC 476FP processor instruction mnemonics are shown in lower-case, bold text. For example: tlbivax.
I/O signal names are shown in upper case.
Version 2.2
July 31, 2014
About this Document
Page 23 of 322
User’s Manual
About this Document
Page 24 of 322
Version 2.2
July 31, 2014
User’s Manual
1. Overview
The PowerPC 476FP embedded processor core is a 4-issue, 6-pipeline (floating-point [FP] L-pipe and integer
L-pipe are shared), superscalar, 32-bit reduced instruction set computer (RISC) processor. The core supports
the Power Instruction Set Architecture (ISA) Version 2.05. The architectural flexibility of this core enhances
IBM application-specific integrated circuit (ASIC) solutions and applications. The core also supports memory
coherency to broaden ASIC solutions into multiprocessing system environments and to increase its scalability
for emerging wired communications, storage, and pervasive computing applications. Figure 1-1 shows the
overall organization of the processor core.
Figure 1-1. PowerPC 476FP Embedded Processor Core Block Diagram
Snoop Bus
and
L2 Cache Interface
128-bit I-Data L2 Cache Interface
Snoop Interface
Predecoder
32 KB
Instruction Cache
128-bit D-Data L2 Cache Interface
ITLB
1024-Entry
Memory
Management Unit
DTLB
32 KB
Data Cache
4 KB Branch
History Table
Instruction Unit
Branch Unit
Issue Queue (DISSQ)
8-Entry, 4-Issue (includes FP)
Branch Target
Instruction Buffer
Link Stack
Floating-Point Unit
(Four Instructions)
DCR Bus
JTAG
Debug Trace
Timer
Interrupt
Control
Branch
Pipeline
Clock and
Power
Management
DCR
DISSQ
DTLB
D-Data
FP
GPR
ITLB
Version 2.2
July 31, 2014
Multiply
and
Divide
Pipeline
General
Purpose
Registers
MAC
Device Control Register
Decode issue queue
Data translation lookaside buffer
Data-cache data
Floating-Point
General purpose registers
Instruction translation lookaside buffer
Complex
Integer
Pipeline
Simple
Integer
Pipeline
PGPR
PGPR
I-Data
JTAG
KB
L2
MAC
PGPR
General
Purpose
Registers
Load
and
Store
Pipeline
Floating- FloatingPoint
Point
Load
Arithmetic
and
Pipeline
Store
Unit
(4 words) FloatingPoint
Registers
Instruction-cache data
Joint Test Action Group
1024 bytes
Level 2 cache
Multiply and accumulate
Pre-GPR buffers
Overview
Page 25 of 274
User’s Manual
1.1 General Features
The PowerPC 476FP processor core provides the following features:
• Four-issue architecture (decode and issue [DISS] decode complexity)
• Five pipelines and a separate floating-point (FP) arithmetic pipeline:
–
–
–
–
–
–
Branch pipeline
L pipeline (for load and store operations)
J pipeline (for simple arithmetic and logical operations)
Instruction pipeline (simple and complex instruction pipeline, miscellaneous instruction pipeline)
Multiplication and division pipeline
FP execution pipeline
• Two-cycle pipelined cache accesses
• Real-address tagging for both the instruction cache and the data cache
• Indexing of the instruction cache with the virtual address
• Indexing of the data cache with the real address
• Snoopable instruction and data caches
• 1024-entry unified translation lookaside buffer (UTLB)
• Pregeneral purpose register (PGPR) temporary buffers to capture and hold results until commitment time
when the results are transferred to general purpose registers (GPRs)
• Early delivery of instructions to the floating-point unit (FPU) is enabled because all instructions are predecoded
1.2 Power Control Features
The following design features minimize the operating power of the PowerPC 476FP processor core:
• All latches are clock gated so that idle functions do not waste power.
• All nonexecuting and idle functions are disabled.
• Static random access memory (SRAM) is partitioned so that only the required portion of the SRAM is
enabled or selected.
• Doze and idle sleep modes are available.
• The central logic and the floating-point unit have separate clock enables.
1.2.1 Power Control Modes
The power control modes for the PowerPC 476FP core are CPU sleep mode, CPU doze mode, and CPU cold
mode.
CPU sleep mode has the following characteristics:
• The CPU clock is turned off by deasserting the clock enable signal, but the timer clock still runs to maintain the time base.
Overview
Page 26 of 274
Version 2.2
July 31, 2014
User’s Manual
• The CPU can be awakened by enabling the clock using an interrupt such as an external interrupt, decrementer (DEC) interrupt, watchdog timer interrupt, or fixed-interval timer (FIT) interrupt if the application
has implemented those mechanisms.
• Sleep mode can be controlled dynamically and randomly.
• Because the processor does not maintain the cache or memory management unit (MMU) coherency, the
following steps must be taken:
– The L2 cache must be in a state of not having the processor or a special DCR being set.
– The processor caches must be invalidated by using the data cache invalidate (dci) and instruction
cache invalidate (ici) instructions at the beginning of the awakening process.
– The processor MMU must be invalidated by using the translation lookaside buffer write entry (tlbwe)
instruction except for one (or required) entry that handles the exception or interrupt.
CPU doze mode has the following characteristics:
• The processor is in either wait state or halt state when doze mode is used.
• The CPU clock continues to run, but no instructions are executed.
• Doze mode allows the processor to process exceptions and interrupts; however, when the return from
interrupt (rfi) instruction is issued, the processor goes back into doze mode.
• The processor maintains the cache and MMU coherency.
In CPU cold mode, the PowerPC 476FP core is powered off.
Table 1-1 lists the power control modes of the PowerPC 476FP core.
Table 1-1. PowerPC 476FP Power Savings Modes
Operation
Mode
Mode
Core Clock
Core Power
Core State
Definition
Sleep
Off
On
Yes
• Low power
• Timers on
Doze
On
On
Yes
• Lower power
• Timer on
• Interrupt
serviced
Cold
Off
Off
No
• No power
Effect
Off
• Clock gated
• Dynamically awakened
or put to sleep
Standby
• Timer running, interrupt
serviced
Off
• Power off,
lose state
Table 1-2 defines how frequency switching affects the power control modes of the PowerPC 476FP core.
Table 1-2. PowerPC 476FP Frequency switching
Mode
Core
Clock
Core
Power
Core State
Definition
Operation
Mode
Shift
On
On
Yes
Frequency switch
Running
Version 2.2
July 31, 2014
Allowed Scale
CPU, L2, PLB6, and DCR clock ratios
remain constant × glitchless switching
Overview
Page 27 of 274
User’s Manual
Table 1-3 provides examples of frequency switching.
Table 1-3. Frequency Switching Examples
CPU Clock
L2 Clock
PLB6 Clock
Initial
1600 MHz
800 MHz
800 MHz
Supervisor write (SW) 1
800 MHz
400 MHz
400 MHz
SW 2
533+ MHz
266.5 MHz
266.5 MHz
SW 3
400 MHz
200 MHz
200 MHz
1.2.2 Power Control Procedures
Power control procedures consist of putting the PowerPC 476FP core into sleep mode, doze mode, and
waking up the processor from those modes.
1.2.2.1 CPU Sleep Mode
To put the PowerPC 476FP core into sleep mode, perform the following steps.
Note: Ensure the core is not processing storage instructions.
1. Issue the system call (sc) instruction to put the processor in privileged mode.
2. Ensure the PowerPC 476FP core is not processing storage instructions, then issue the instruction synchronize (isync) instruction, then the synchronize (msync) instructions.
3. Set up the L2 cache for sleep mode by setting L2SLEEPREQ[31] = ‘1’, then issue the isync and move to
Special Purpose Register (mtmsr) instructions to set Machine State Register (MSR)[WE] = ‘1’. This puts
the processor in wait mode (similar to doze mode).
4. The processor asserts the core sleep request signal, the clock and power management (CPM) interface
deasserts the CPU clock enable signal, and the processor goes into sleep mode.
1.2.2.2 CPU Doze Mode
To put the processor in CPU doze mode, perform the following steps:
1. Set up MSR[WE] = ‘1’ either by calling an executive routine that sets up the MSR (MSR) by issuing the
mtmsr instruction or by setting Save/Restore Register 1 (SRR1)[WE] = ‘1’ in an interrupt handler.
When in doze mode, the processor stops instruction fetching and executions and goes into stop state.
Any exception or interrupt will wake up the processor.
2. To wake up the processor, reset SRR1[WE] to ‘0’. Otherwise, the processor remains in doze mode after
the rfi instruction is issued.
1.2.2.3 Waking up the Processor
To wake up the processor, perform the following steps:
1. Generate one of the following interrupts: external, DEC, FIT, or watchdog timer.
2. Enable the CPU clock by asserting the clock enable (CPMC476CLKEN) signal. This will take you to an
interrupt handler.
Overview
Page 28 of 274
Version 2.2
July 31, 2014
User’s Manual
3. Set SRR1[WE] = ‘0’.
4. Invalidate MMU entries, except for entries required to handle interrupts and exceptions.
5. Invalidate the L1 instruction cache (I-cache) by issuing the ici instruction.
6. Invalidate the L1 data cache (D-cache) by issuing the dci instruction (note the CT field of instructions).
7. Set up the L2 cache to wake up by setting L2 Sleep Request Register (L2SLEEPREQ)[31] = ‘0’ (See the
L2 Cache Core Controller Databook), then issue the isync instruction.
8. Process the interrupt, and issue the rfi instruction.
The processor will now be awakened.
Consult with IBM PowerPC support for further details about the implementation and coding of sleep mode,
doze mode, or waking up the processor.
1.3 Implemented Instruction Set
All Power ISA Version 2.05 instructions in Book I, Book II, and Book III-E are implemented in the PowerPC
476FP processor core. The following categories of instructions are supported:
•
•
•
•
•
•
•
Base
Embedded
Embedded cache debug
Embedded cache initialization
Embedded little-endian
Embedded cache locking
Memory coherence
See Appendix B Instruction Summary on page 271 for a list of the implemented instructions.
1.4 Test and Debug Facilities
Like the previous PowerPC 4xx processor cores, the PowerPC 476FP processor core provides RISCWatch
and Joint Test Action Group (JTAG) interfaces that enable the following functions:
• Reset the processor core
• Stop, halt, and start the processor core
• Perform debug operations
• Trace the status and operation of the processor core and other cores and devices in the ASIC
Additional debug and status observing capabilities facilitate debugging and monitoring the processor cores
and other cores in a multiprocessor system-on-a-chip (SoC) environment. The following features defined by
the Power.org Common Debug Interface Technical Committee are implemented:
• JTAG-controlled multiprocessor Debug Status Register (DBSR) monitor capability
• JTAG stop and run controls
These capabilities provide the means to observe DBSR bits individually under JTAG control. The processor
can also be started and stopped through JTAG control. See Section 8 Debug Facilities on page 217 for more
information about the use of the JTAG features.
Version 2.2
July 31, 2014
Overview
Page 29 of 274
User’s Manual
1.5 Floating-Point Unit Overview
The FPU is a pipelined, double-precision math computation processing unit that is attached to the processor
core. The FPU conforms to the ANSI/IEEE Standard 754-1985, IEEE Standard for Binary Floating-Point
Arithmetic.
The following key design features are included in the PowerPC 476FP FPU:
• Complies with the ANSI/IEEE 754-1985 floating-point standard:
– Single-precision floating-point standard
– Double-precision floating-point standard
• PowerPC floating-point instruction set
• Compliance with Book E: Enhanced PowerPC Architecture
• Superscalar operation with independent floating-point load-and-store and execution units
• Six-stage super-pipelined floating-point arithmetic execution
– Extended division stages
– Extended operation stages for denormalized operation
See Section 3 Floating-Point Unit Programming Model on page 85 for more information.
1.6 Instruction Cache Overview
The instruction cache unit (ICU) is divided into several subunits: the instruction cache array, the instruction
cache control unit, the instruction-side translation lookaside buffer (ITLB), the branch history table, and the
instruction fetch unit.
The instruction cache array consists of standard SRAMs for instruction data and tag information, arranged as
a 4-way, set-associative cache. The instruction data bus that enters the core is either 128 or 256 bits wide
and is loaded into an 8-word instruction-line fill buffer. The replacement method is a 6-bit least recently used
(LRU) algorithm, with provisions for cache locking or partitioning.
The ITLB, instruction-cache control unit, and instruction fetch unit are completely synthesizable logic blocks.
They contain both data path and control logic.
The ITLB is an 8-entry, fully-associative array. Its main purpose is to enhance performance and reduce TLB
contention between instruction accesses and load-and-store operations. Each ITLB entry contains the translation information for a page. The processor uses this information to translate the address of instruction
accesses when the MSR[IS] = ‘1’.
The branch history module improves branch prediction by tracking the most likely outcome of recently taken
program branches.
See Section 5 Instruction and Data Caches on page 133 for more information about the instruction cache.
Overview
Page 30 of 274
Version 2.2
July 31, 2014
User’s Manual
1.7 Data Cache Unit Overview
The data cache unit (DCU) primarily consists of the following three subunits: data cache arrays, data cache
control, and the data TLB (DTLB).
The data cache array subunit contains three arrays: the LRU, the tag, and the data arrays. The tag and data
arrays are standard SRAMs. The LRU array is a smaller, dual-port register array. Both the tag and data
arrays are 4-way set associative and have a pipelined, 2-cycle access. The DCU incorporates an LRU
replacement algorithm that uses a 6-bit age vector in combination with way locking to determine the best
candidate for a replacement. The DCU can receive 256 bits of read data simultaneously from the bus and can
send up to 128 bits of write data. The data cache is nonblocking. Cache coherency is supported through the
level 2 (L2) cache interface by using write-through mode.
The data cache control subunit includes all of the DCU pipeline controls, cache arbitration, the Special
Purpose Registers (SPRs), and the snoop pipeline. It drives most of the data path flow and operation.
The DTLB is an 8-entry, fully-associative cache that uses the effective address to quickly calculate the real
address. It is accessed in parallel with the other three arrays. If a DTLB miss occurs, a request is made to the
UTLB to calculate the real address.
See Section 5 Instruction and Data Caches on page 133 for more information.
1.8 Memory Management Unit Overview
The memory management unit (MMU) provides cache control, access protection, and address translation.
The MMU contains the UTLB, control logic, and registers that support the UTLB. The MMU interfaces with the
execution unit (EU), the instruction unit (IU), the ICU, the DCU, and the TLB snoop interface. The EU interface performs TLB instructions such as translation lookaside buffer read entry (tlbre), tlbwe, translation
lookaside buffer search index (tlbsx), and translation lookaside buffer invalidate bus transaction (tlbivax).
The IU interface provides the translation space (TS) and data space (DS) selector bits for a lookup request
from the DCU, ICU, or TLB snoop. The ICU generates a lookup request to the UTLB on an ITLB miss. Similarly, the DCU generates a lookup request to the UTLB on a DTLB miss. The MMU arbitrates requests from
the ICU, DCU, EU, and snoop interface, and provides the data for each request.
Software manages the MMU, but hardware-assisting logic is provided for replacing entries. Software writes
entries into the UTLB so that they can be read by using the hash function described in Section 4.3.2 UTLB
Index Address Hash on page 106. Freescale-style MMU enhanced operation is not supported.
See Section 4 Memory Management Unit on page 103 for more information.
1.9 Timers
The PowerPC 476FP processor core contains a time base and three timers: a decrementer (DEC), a fixedinterval timer (FIT), and a watchdog timer.
The time base is a 64-bit counter that is incremented at a frequency either equal to the processor core clock
rate or controlled by a separate asynchronous timer clock supplied to the core. No interrupt is generated if the
time base wraps back to zero.
Version 2.2
July 31, 2014
Overview
Page 31 of 274
User’s Manual
The DEC is a 32-bit register that is decremented at the rate at which the time base is incremented. The user
loads the DEC register with a value to create the required interval. When the register is decremented to zero,
a status bit is set and an exception is generated that can notify software. Optionally, the DEC can be
programmed to automatically reload the value contained in the Decrementer Auto-Reload Register (DECAR),
after which the DEC resumes decrementing.
The FIT can generate periodic interrupts based on a transition of 1 user-selected bit from 4 time base bits.
When the selected bit changes from ‘0’ to ‘1’, a status bit is set and an exception is generated that can notify
software.
The watchdog timer also generates a periodic interrupt based on the transition of a selected bit from the time
base. The user can choose one of four intervals for the watchdog timer. Upon the first transition from ‘0’ to ‘1’
of the selected time base bit, the watchdog timer generates an exception that can notify software. The
watchdog timer can also be configured to initiate a hardware reset if a second transition of the selected time
base bit occurs before the first watchdog exception is serviced. This capability provides an extra measure of
recoverability from potential system lockups.
The timer functions of the PowerPC 476FP processor core are more fully described in Section 6 Timer Facilities on page 157.
Overview
Page 32 of 274
Version 2.2
July 31, 2014
User’s Manual
2. Programming Model
The programming model of the PowerPC 476FP core describes the following features and operations of the
processor from a programmer’s perspective:
•
•
•
•
•
•
•
•
•
•
Storage Addressing (including data types and byte ordering), starting on page 33
Registers, starting on page 40
Instruction Classes, starting on page 47
Instruction Set, starting on page 49
Branch Processing, starting on page 56
Integer Processing, starting on page 63
Processor Control, starting on page 66
User and Supervisor Modes, starting on page 75
Speculative Accesses, starting on page 76
Synchronization, starting on page 76
2.1 Storage Addressing
As a 32-bit implementation of the Power Instruction Set Architecture (ISA) Version 2.05, the PowerPC 476FP
core implements a uniform 32-bit effective address (EA) space. Effective addresses are expanded into virtual
addresses and are then translated to 42-bit (4 TB) real addresses by the memory management unit (see
Section 4 Memory Management Unit on page 103 for more information about the translation process). The
organization of the real address space into a physical address space is system-dependent and is described in
the user’s manuals for chip-level products that incorporate a PowerPC 476FP core.
The PowerPC 476FP core generates an effective address whenever it executes a storage access, branch,
cache management, or translation lookaside buffer (TLB) management instruction, or when it fetches the next
sequential instruction.
2.1.1 Storage Operands
Bytes in storage are numbered consecutively starting with 0. Each number is the address of the corresponding byte.
Data storage operands accessed by the integer load/store instructions can be bytes (8-bit), halfwords (16-bit),
words (32-bit), and double word (64-bit); or, for load/store multiple and string instructions, a sequence of
words or bytes. Data storage operands accessed by floating-point (FP) load or store instructions can be
bytes, halfwords, words, doublewords, or quadwords. The address of a storage operand is the address of its
first byte (that is, of its lowest-numbered byte). Byte ordering can be either big-endian or little-endian, as
controlled by the endian storage attribute (see Section 2.1.3 Byte Ordering on page 36; also see
Section 4.5.6 Endian (E) on page 118 for more information about the endian storage attribute).
Operand length is implicit for each scalar storage access instruction type (that is, each storage access
instruction type other than the load/store multiple and string instructions). The operand of such a scalar
storage access instruction has a natural alignment boundary equal to the operand length. Therefore, the
natural address of an operand is an integral multiple of the operand length. A storage operand is said to be
aligned if it is aligned at its natural boundary; otherwise, it is said to be unaligned.
Table 2-1 on page 34 lists the storage access instructions for the data storage operands.
Version 2.2
July 31, 2014
Programming Model
Page 33 of 322
User’s Manual
Table 2-1. Data Operand Definitions
Storage Access Instruction Type
Operand Length
Addr[28:31] if aligned
Byte (or String)
8 bits
‘xxxx’
Halfword
2 bytes
‘xxx0’
Word (or Multiple)
4 bytes
‘xx00’
Doubleword (FP only)
8 bytes
‘x000’
Note: An x in an address bit position indicates that the bit can be ‘0’ or ‘1’ independently of the state of other bits in the address.
The alignment of the operand effective address of some storage access instructions can affect performance,
and in some cases, can cause an alignment exception to occur. For such storage access instructions, the
best performance is obtained when the storage operands are aligned. Table 2-2 summarizes the effects of
alignment on those storage access instruction types for which such effects exist. If an instruction type is not
shown in the table, there are no alignment effects for that instruction type.
Table 2-2. Alignment Effects for Storage Access Instructions
Storage Access Instruction Type
Integer load/store halfword
Integer load/store word
Integer load/store multiple or string
FP load/store word
FP load/store doubleword
Alignment Effects
Broken into two byte accesses if it crosses the 8-byte boundary; otherwise no effect.
Broken into two accesses if it crosses the 8-byte boundary; otherwise no effect.
Broken into a series of 4-byte accesses until the last byte is accessed or a 8-byte boundary is
reached, whichever occurs first. If bytes remain past a 8-byte boundary, resume accessing
4 bytes at a time until the last byte is accessed or the next 8-byte boundary is reached, whichever occurs first; repeat.
Alignment exception if it crosses the word boundary; otherwise no effect (see note).
Alignment exception if it crosses the double words boundary; otherwise no effect (see note).
Note: The floating-point unit can specify that the EA for a particular FP load or store instruction must be aligned at the operand-size
boundary, or alternatively, at a word boundary. If the FPU indicates this requirement and the calculated EA fails to meet it, the PowerPC
476FP core generates an alignment exception. Alternatively, the FPU can specify that the EA for a particular FP load or store instruction
should be forced to be aligned by ignoring the appropriate number of low-order EA bits and processing the FP load or store as if those
bits were ‘0’. Byte, halfword, word, doubleword, and quadword FP load or store instructions ignore 0, 1, 2, 3, and 4 low-order EA bits.
Cache management instructions access cache block operands; for the PowerPC 476FP core the cache block
size is 32 bytes. However, the effective addresses calculated by cache management instructions are not
required to be aligned on cache block boundaries. Instead, the architecture specifies that the associated loworder effective address bits (bits 27:31 for PowerPC 476FP core) are ignored during the execution of these
instructions.
Similarly, the TLB management instructions access page operands, and, as determined by the page size, the
associated low-order effective address bits are ignored during the execution of these instructions.
Instruction storage operands, however, are always 4 bytes long, and the effective addresses calculated by
branch instructions are therefore always word-aligned.
2.1.2 Effective Address Calculation
For a storage access instruction, if the sum of the effective address and the operand length exceeds the
maximum effective address of 232–1 (that is, the storage operand itself crosses the maximum address
boundary), the result of the operation is undefined, as specified by the architecture. The PowerPC 476FP
core performs the operation as if the storage operand wrapped around from the maximum effective address
Programming Model
Page 34 of 322
Version 2.2
July 31, 2014
User’s Manual
to effective address 0. However, software should not depend upon this behavior, so that it can be ported to
other implementations that do not handle this scenario in the same fashion. Accordingly, software should
ensure that no data storage operands cross the maximum address boundary.
Note: Because instructions are words and the effective addresses of instructions are always implicitly on
word boundaries, it is not possible for an instruction storage operand to cross any word boundary, including
the maximum address boundary.
Effective address arithmetic, which calculates the starting address for storage operands, wraps around from
the maximum address to address 0 for all effective address computations except next sequential instruction
fetching. See Section 2.1.2.2 for more information about next sequential instruction fetching at the maximum
address boundary.
2.1.2.1 Data Storage Addressing Modes
There are two data storage addressing modes supported by the PowerPC 476FP core:
• Base + displacement (D-mode) addressing mode:
The 16-bit D field is sign-extended and added to the contents of the General Purpose Register (GPR)
designated by RA, or to zero if RA = ‘0’; the low-order 32 bits of the sum form the effective address of the
data storage operand.
• Base + index (X-mode) addressing mode:
The contents of the GPR designated by RB (or the value 0 for load string word immediate [lswi] and store
string word immediate ([stswi]) are added to the contents of the GPR designated by RA, or to 0 if
RA = ‘0’; the low-order 32 bits of the sum form the effective address of the data storage operand.
2.1.2.2 Instruction Storage Addressing Modes
There are four instruction storage addressing modes supported by the PowerPC 476FP core:
• I-form branch instructions (unconditional):
The 24-bit LI field is concatenated on the right with ‘00’, sign-extended, and then added to either the
address of the branch instruction if the absolute address (AA) instruction field equals 0, or to 0 if AA = ‘1’;
the low-order 32 bits of the sum form the effective address of the next instruction.
• Taken B-form branch instructions:
The 14-bit branch displacement (BD) field is concatenated on the right with ‘00’, sign-extended, and then
added to either the address of the branch instruction if AA = ‘0’, or to 0 if AA = ‘1’; the low-order 32 bits of
the sum form the effective address of the next instruction.
• Taken XL-form branch instructions:
The contents of bits 0:29 of the Link Register (LR) or bits 32:61 of the Count Register (CTR) are concatenated on the right with ‘00’ to form the 32-bit effective address of the next instruction.
• Next sequential instruction fetching (including nontaken branch instructions):
The value 4 is added to the address of the current instruction to form the 32-bit effective address of the
next instruction. If the address of the current instruction is x‘FFFF FFFC’, the PowerPC 476FP core
wraps the next sequential instruction address back to address 0. This behavior is not required by the
architecture, which specifies that the next sequential instruction address is undefined under these circumstances. Therefore, software should not depend upon this behavior, so that it can be ported to other
implementations that do not handle this scenario in the same fashion. Accordingly, if software must execute across this maximum address boundary and wrap back to address 0, it should place an unconditional branch at the boundary, with a displacement of 4.
Version 2.2
July 31, 2014
Programming Model
Page 35 of 322
User’s Manual
In addition to the four instruction storage addressing modes, the following behavior applies to branch instructions:
• Any branch instruction with the link bit (LK) equal to ‘1’:
The value 4 is added to the address of the current instruction and the low-order 32 bits of the result are
placed into the LR. As for the similar scenario for next sequential instruction fetching, if the address of the
branch instruction is x‘FFFF FFFC’, the result placed into the LR is architecturally undefined, although
once again the PowerPC 476FP core wraps the LR update value back to address 0. Again, however,
software should not depend on this behavior, in order that it can be ported to implementations that do not
handle this scenario in the same fashion.
2.1.3 Byte Ordering
If scalars (individual data items and instructions) were indivisible, there would be no such concept as byte
ordering. It is meaningless to consider the order of bits or groups of bits within the smallest addressable unit
of storage, because nothing can be observed about such order. Only when scalars, which the programmer
and processor regard as indivisible quantities, can comprise more than one addressable unit of storage does
the question of order arise.
For a system in which the smallest addressable unit of storage is the 64-bit doubleword, there is no question
of the ordering of bytes within doublewords. All transfers of individual scalars between registers and storage
are of doublewords, and the address of the byte containing the high-order 8 bits of a scalar is no different
from the address of a byte containing any other part of the scalar.
For the Book III-E Enhanced PowerPC Architecture, as for most current computer architectures, the smallest
addressable unit of storage is the 8-bit byte. Many scalars are halfwords, words, or doublewords, that consist
of groups of bytes. When a word-length scalar is moved from a register to storage, the scalar occupies four
consecutive byte addresses. It thus becomes meaningful to present the order of the byte addresses with
respect to the value of the scalar: which byte contains the highest-order 8 bits of the scalar, which byte
contains the next-highest-order 8 bits, and so on.
Given a scalar that contains multiple bytes, the choice of byte ordering is essentially arbitrary. There are
4! = 24 ways to specify the ordering of 4 bytes within a word, but only two of these orderings are sensible:
• The ordering that assigns the lowest address to the highest-order (left-most) 8 bits of the scalar, the next
sequential address to the next-highest-order 8 bits, and so on.
This ordering is called big-endian because the big end (most significant end) of the scalar, considered as
a binary number, comes first in storage. IBM eServer™ pSeries® and IBM zSeries® are examples of
computer architectures that use this byte ordering.
• The ordering that assigns the lowest address to the lowest-order (right-most) 8 bits of the scalar, the next
sequential address to the next-lowest-order 8 bits, and so on.
This ordering is called little-endian because the little end (least significant end) of the scalar, considered
as a binary number, comes first in storage. The Intel® x86 is an example of a processor architecture that
uses this byte ordering.
Power ISA supports both big-endian and little-endian byte ordering, for both instruction and data storage
accesses. Which byte ordering is used is controlled on a memory page basis by the endian (E) storage attribute, which is a field within the TLB entry for the page. The endian storage attribute is set to ‘0’ for a bigendian page, and is set to ‘1’ for a little-endian page. See Section 4 Memory Management Unit on page 103
for more information about memory pages, the TLB, and storage attributes, including the endian storage attribute.
Programming Model
Page 36 of 322
Version 2.2
July 31, 2014
User’s Manual
2.1.3.1 Structure Mapping Examples
The following C language structure, s, contains an assortment of scalars and a character string. The
comments show the value assumed to be in each structure element; these values show how the bytes
comprising each structure element are mapped into storage.
struct {
int a;
long long b;
char *c;
char d[7];
short e;
int f;
} s;
/*
/*
/*
/*
/*
/*
x‘1112_1314’ word */
x‘2122_2324_2526_2728’ doubleword */
x‘3132_3334’ word */
'A','B','C','D','E','F','G' array of bytes */
x‘5152’ halfword */
x‘6162_6364’ word */
C structure mapping rules permit the use of padding (skipped bytes) to align scalars on desirable boundaries.
Big-Endian Mapping and Little-Endian Mapping show structure-mapping examples where each scalar is
aligned at its natural boundary. This alignment introduces padding of 4 bytes between a and b, 1 byte
between d and e, and 2 bytes between e and f. The same amount of padding is present in both big-endian
and little-endian mappings.
Big-Endian Mapping
The big-endian mapping of structure s is shown in Table 2-3, with the data highlighted in the structure
mappings. The hexadecimal addresses are shown below the data stored at the address. The contents of
each byte, as defined in structure s, is shown as a (hexadecimal) number or character (for the string
elements). The shaded cells correspond to padded bytes.
Table 2-3. Big-Endian Mapping of Structure S
11
12
13
14
x‘00’
x‘01’
x‘02’
x‘03’
x‘04’
x‘05’
x‘06’
x‘07’
21
22
23
24
25
26
27
28
x‘08’
x‘09’
x‘0A’
x‘0B’
x‘0C’
x‘0D’
x‘0E’
x‘0F’
31
32
33
34
'A'
'B'
'C'
'D'
x‘10’
x‘11’
x‘12’
x‘13’
x‘14’
x‘15’
x‘16’
x‘17’
'E'
'F'
'G'
51
52
x‘18’
x‘19’
x‘1A’
x‘1B’
x‘1C’
x‘1D’
x‘1E’
x‘1F’
61
62
63
64
x‘20’
x‘21’
x‘22’
x‘23’
x‘24’
x‘25’
x‘26’
x‘27’
Little-Endian Mapping
Table 2-4 shows structure s is mapped into a little-endian format. The shaded cells correspond to padded
bytes.
Version 2.2
July 31, 2014
Programming Model
Page 37 of 322
User’s Manual
Table 2-4. Little-Endian Mapping of Structure S
14
13
12
11
x‘00’
x‘01’
x‘02’
x‘03’
x‘04’
x‘05’
x‘06’
x‘07’
28
27
26
25
24
23
22
21
x‘08’
x‘09’
x‘0A’
x‘0B’
x‘0C’
x‘0D’
x‘0E’
x‘0F’
34
33
32
31
'A'
'B'
'C'
'D'
x‘10’
x‘11’
x‘12’
x‘13’
x‘14’
x‘15’
x‘16’
x‘17’
'E'
'F'
'G'
52
51
x‘18’
x‘19’
x‘1A’
x‘1B’
x‘1C’
x‘1D’
x‘1E’
x‘1F’
64
63
62
61
x‘20’
x‘21’
x‘22’
x‘23’
x‘24’
x‘25’
x‘26’
x‘27’
2.1.3.2 Instruction Byte Ordering
Power ISA defines instructions as aligned words (4 bytes) in memory. As such, instructions in a big-endian
program image are arranged with the most significant byte (MSB) of the instruction word at the lowestnumbered address.
Consider the big-endian mapping of instruction p at address x‘00’, where, for example, p = add r7, r7, r4:
MSB
x‘00’
LSB
x‘01’
x‘02’
x‘03’
In a little-endian mapping the same instruction is arranged with the least significant byte (LSB) of the instruction word at the lowest-numbered address:
LSB
x‘00’
MSB
x‘01’
x‘02’
x‘03’
By the definition of Power ISA bit numbering, the most significant byte of an instruction is the byte containing
bits 0:7 of the instruction. The most significant byte is the one that contains the primary opcode field (bits 0:5).
Because of this difference in byte orderings, the processor must perform whatever byte reversal is required
(depending on the particular byte ordering in use) to correctly deliver the opcode field to the instruction
decoder. In the PowerPC 476FP core, this reversal is performed between the memory interface and the
instruction cache, according to the value of the endian storage attribute for each memory page, such that the
bytes in the instruction cache are always correctly arranged for delivery directly to the instruction decoder.
If the endian storage attribute for a memory page is reprogrammed from one byte ordering to the other, the
contents of the memory page must be reloaded with program and data structures that are in the appropriate
byte ordering. Furthermore, anytime the contents of instruction memory change, the instruction cache must
be made coherent with the updates by invalidating the instruction cache and refetching the updated memory
contents with the new byte ordering.
Programming Model
Page 38 of 322
Version 2.2
July 31, 2014
User’s Manual
2.1.3.3 Data Byte Ordering
Unlike instruction fetches, data accesses cannot be byte-reversed between memory and the data cache.
Data byte ordering in memory depends upon the data type (byte, halfword, word, and so on) of a specific data
item. It is only when moving a data item of a specific type from or to an architected register, as directed by the
execution of a particular storage access instruction, that it becomes known what kind of byte reversal can be
required due to the byte ordering of the memory page containing the data item. Therefore, byte reversal
during load or store accesses is performed between data cache (or for a data cache miss, between memory)
and the load register target or store register source, depending on the specific type of load or store instruction
(that is, byte, halfword, word, and so on).
Comparing the big-endian and little-endian mappings of structure s, as shown in Section 2.1.3.1 Structure
Mapping Examples on page 37, the differences between the byte locations of any data item in the structure
depends upon the size of the particular data item. For example, again referring to the big-endian and littleendian mappings of structure s:
• The word a has its 4 bytes reversed within the word spanning addresses x‘00’ through x‘03’.
• The halfword e has its two bytes reversed within the halfword spanning addresses x‘1C’ through x‘1D’.
The array of bytes d, where each data item is a byte, is not reversed when the big-endian and little-endian
mappings are compared. For example, the character 'A' is located at address x‘14’ in both the big-endian and
little-endian mappings.
The size of the data item being loaded or stored must be known before the processor can determine whether,
and if so how, to reorder the bytes when moving them between a register and the data cache (or memory):
• For byte loads and stores, including strings, no reordering of bytes occurs, regardless of byte ordering.
• For halfword loads and stores, bytes are reversed within the halfword, for one byte order with respect to
the other.
• For word loads and stores (including load/store multiple), bytes are reversed within the word, for one byte
order with respect to the other.
This mechanism applies independently of the alignment of data. That is, when loading a multibyte data
operand with a scalar load instruction, bytes are accessed from the data cache (or memory) starting with the
byte at the calculated effective address and continuing with consecutively higher-numbered bytes until the
required number of bytes have been retrieved. Then, the bytes are arranged such that either the byte from
the highest-numbered address (for big-endian storage regions) or the lowest-numbered address (for littleendian storage regions) is placed into the least significant byte of the register. The rest of the register is filled
in corresponding order with the rest of the accessed bytes. An analogous procedure is followed for scalar
store instructions.
For load/store multiple instructions, each group of 4 bytes is transferred between memory and the register
according to the procedure for a scalar load word instruction.
For load/store string instructions, the most significant byte of the first register is transferred to or from memory
at the starting (lowest-numbered) effective address, regardless of byte ordering. Subsequent register bytes
(from most significant to least significant, and then moving into the next register, starting with the most significant byte, and so on) are transferred to or from memory at sequentially higher-numbered addresses. This
behavior for byte strings ensures that if two strings are loaded into registers and then compared, the first
bytes of the strings are treated as most significant with respect to the comparison.
Version 2.2
July 31, 2014
Programming Model
Page 39 of 322
User’s Manual
2.1.3.4 Byte-Reverse Instructions
Power ISA defines load/store byte-reverse instructions which can access storage that is specified as being of
one byte ordering in the same manner that a regular (that is, non-byte-reverse) load/store instruction
accesses storage that is specified as being of the opposite byte ordering. That is, a load/store byte-reverse
instruction to a big-endian memory page transfers data between the data cache (or memory) and the register
in the same manner that a normal load/store transfers the data to or from a little-endian memory page. Similarly, a load/store byte-reverse instruction to a little-endian memory page transfers data between the data
cache (or memory) and the register in the same manner that a normal load/store transfers the data to or from
a big-endian memory page.
The function of the load/store byte-reverse instructions is useful when a particular memory page contains a
combination of data with both big-endian and little-endian byte ordering. In such an environment, the endian
storage attribute for the memory page is set according to the predominant byte ordering for the page and the
normal load/store instructions are used to access data operands that used this predominant byte ordering.
Conversely, the load/store byte-reverse instructions are used to access the data operands that were of the
other (less prevalent) byte ordering.
Software compilers cannot typically make general use of the load/store byte-reverse instructions. Such
instructions are ordinarily used only in special, hand-coded device drivers.
2.2 Registers
This section provides an overview of the register categories and types provided by the PowerPC 476FP core.
Detailed descriptions of each of the registers are provided within the sections covering the functions with
which they are associated (for example, the cache control and cache debug registers are described in
Section 5 Instruction and Data Caches on page 133). An alphabetic summary of all registers, including bit
definitions, is provided in Appendix A Register Summary on page 263.
All registers in the PowerPC 476FP core are architected as 32 bits wide (bits 32:63 and the higher order bits
0:31 are ignored unless specified otherwise), although certain bits in some registers are reserved and thus
not necessarily implemented. For all registers with fields marked as reserved, these reserved fields should be
written as 0 and read as undefined. The recommended coding practice is to perform the initial write to a
register with reserved fields set to 0 and to perform all subsequent writes to the register by using a readmodify-write strategy: read the register; use logical instructions to alter defined fields, leaving reserved fields
unmodified; and write the register.
All Floating-Point Registers (FPRs) are 64 bits, and specified as bits 0:63. See the floating-point processor
chapter in Power ISA Version 2.05, Book-I for more information. All of the registers are grouped into categories according to the processor functions with which they are associated. In addition, each register is classified as being of a particular type, as characterized by the specific instructions that are used to read and write
registers of that type. Finally, most of the registers contained within the PowerPC 476FP core are defined by
the Power ISA, although some registers are implementation-specific and unique to the PowerPC 476FP core.
Figure 2-1 on page 41 illustrates the PowerPC 476FP core registers contained in the user programming
model, that is, those registers to which access is nonprivileged and that are available to both user and supervisor programs.
Programming Model
Page 40 of 322
Version 2.2
July 31, 2014
User’s Manual
Figure 2-1. User Programming Model Registers
Integer Processing
General Purpose
Branch Control
Condition Register
GPR0
CR
GPR1
Count Register
GPR2
CTR
‚
‚
‚
Link Register
LR
GPR31
Processor Control
Integer Exception Register
XER
Timer
SPR General 4 - 7
SPRG4
SPRG5
Time Base
SPRG5
TBL
TBU
SPRG7
User SPR General 0
USPRG0
Figure 2-2 on page 42 illustrates the PowerPC 476FP core registers contained in the supervisor programming model, to which access is privileged and that are available to supervisor programs only. See Section 2.8
User and Supervisor Modes on page 75 for more information about privileged instructions and register
access and the user and supervisor programming models.
Version 2.2
July 31, 2014
Programming Model
Page 41 of 322
User’s Manual
Figure 2-2. Supervisor Programming Model Registers
Processor Control
Machine State Register
Timer
Time Base
Storage Control
Process ID
MSR
TBU
PID
Processor Version Register
TBL
MMU Control Register
PVR
Timer Control Register
Processor ID Register
TCR
PIR
Timer Status Register
Core Configuration Registers
TSR
DBSR
CCR0
Decrementer
Debug Data Register
CCR1
MMUCR
Debug
Debug Status Register
DEC
DBDR
Decrementer Auto-Reload
Debug Control Registers
Reset Configuration
DECAR
DBCR0
RSTCFG
DBCR1
SPR General
XER
DBCR2
SPRG0
‚
‚
‚
SPRG8
Link Register
Data Address Compares
LR
DAC1
CCR2
Count Register
DAC2
Interrupt Processing
Exception Syndrome Register
CTR
Data Value Compares
ESR
User SPR General 0
DVC1
USPGR0
DVC2
MCSR
Real Mode Page Descriptor Register
Instruction Address Compares
Data Exception Address Register
RMPD
IAC1
MMU Bolted Entry Specification Registers
IAC2
MMUBE0
IAC3
MMUBE1
IAC4
Critical Save/Restore Registers
Supervisor Search Priority
Configuration Register
Cache Debug
Instruction Cache Debug Data Registers
CSRR0
SSPCR
ICDBDR0
CSRR1
User Search Priority
ICDBDR1
Machine Check Syndrome Register
DEAR
Save/Restore Registers
SRR0
SRR1
Machine Check Save/Restore Registers
Instruction Cache Debug Tag Registers
MCSRR0
USPCR
MCSRR1
tlbivax, tlbsx Search Priority
ICDBTRL
IVPR
ISPCR
Data Cache Debug Tag Registers
Interrupt Vector Offset Registers
Instruction Opcode Compare
Control Register
Interrupt Vector Prefix Register
ICDBTRH
DCDBTRH
DCDBTRL
IVOR0
‚
‚
‚
IVOR15
Instruction Opcode Compare Registers
DCR Immediate Prefix Register
IOCR1
Data Cache Exception Syndrome Register
DCRIPR
IOCR2
DCESR
Programming Model
Page 42 of 322
IOCCR
Instruction Cache Exception Syndrome Register
ICESR
Version 2.2
July 31, 2014
User’s Manual
Table 2-5 lists the PowerPC 476FP Special Purpose Registers (SPRS), their decimal and binary SPR
numbers (SPRNs), their access, and a cross-reference to the section that describes them more fully. Registers that are not part of Power ISA, and are thus specific to the PowerPC 476FP core, are shown in italics.
Unless otherwise indicated, all registers have read/write access.
Note: See Table A-1 on page 263 for the register categories, the registers that belong to each category, and
with their types.
Table 2-5. PowerPC 476FP SPRs (Page 1 of 3)
SPR
SPR Name
Decimal
SPRN
Binary SPRN
Privileged
Access
Page
CCR0
Core Configuration Register 0
947
‘11101 10011’
Yes
R/W
69
CCR1
888
‘11011 11000’
Yes
R/W
70
CCR2
889
‘11011 11001’
Yes
R/W
73
CTR
Count Register
9
‘00000 01001’
No
R/W
60
CSRR0
Critical Save/Restore Register 0
58
‘00001 11010’
Yes
R/W
175
CSRR1
Critical Save/Restore Register 1
59
‘00001 11011’
Yes
R/W
176
DAC1
Data Address Compare 1
316
‘01001 11100’
Yes
R/W
266
DAC2
Data Address Compare 2
317
‘01001 11101’
Yes
R/W
266
DCDBTRH
Data Cache Debug Tag Register High
925
‘11100 11101’
Yes
Read
153
DCDBTRL
Data Cache Debug Tag Register Low
924
‘11100 11100’
Yes
Read
152
DEAR
Data Exception Address Register
61
‘00001 11101’
Yes
R/W
177
DVC1
Data Value Compare 1
318
‘01001 11110’
Yes
R/W
266
DVC2
Data Value Compare 2
319
‘01001 11111’
Yes
R/W
266
DCESR
D-cache Exception Syndrome Register
850
‘11010 10010’
Yes
R/W
268
DCRIPR
891
‘11011 11011’
Yes
R/W
74
DBCR0
Debug Control Register 0
308
‘01001 10100’
Yes
R/W
235
DBCR1
309
‘01001 10101’
Yes
R/W
236
DBCR2
310
‘01001 10110’
Yes
R/W
237
DBDR
Debug Data Register
1011
‘11111 10011’
Yes
R/W
266
DBSR
304
‘01001 10000’
Yes
Read/Clear
239
816
‘11001 10000’
Yes
Write
DEC
Decrementer
22
‘00000 10110’
Yes
R/W
159
DECAR
Decrementer Autoreload
54
‘00001 10110’
Yes
R/W
159
ESR
62
‘00001 11110’
Yes
R/W
179
ICESR
I-cache Exception Syndrome Register
851
‘11010 10011’
Yes
R/W
140
IAC1
Instruction Address Compare 1
312
‘01001 11000’
Yes
R/W
240
Note: R = read; W = write.
Version 2.2
July 31, 2014
Programming Model
Page 43 of 322
User’s Manual
SPR
SPR Name
Decimal
SPRN
Binary SPRN
Privileged
Access
Page
IAC2
313
‘01001 11001’
Yes
R/W
240
IAC3
314
‘01001 11010’
Yes
R/W
240
IAC4
315
‘01001 11011’
Yes
R/W
240
ICDBDR0
Instruction Cache Debug Data Register 0
979
‘11110 10011’
Yes
Read
137
ICDBDR1
980
‘11110 10100’
Yes
Read
137
ICDBTRH
Instruction Cache Debug Tag Register High
927
‘11100 11111’
Yes
Read
138
ICDBTRL
Instruction Cache Debug Tag Register Low
926
‘11100 11110’
Yes
Read
137
IOCCR
Instruction Opcode Compare Control Register
860
‘11010 11100’
Yes
R/W
269
IOCR1
Instruction Opcode Compare Register 1
861
‘11010 11101’
Yes
R/W
269
IOCR2
862
‘11010 11110’
Yes
R/W
270
XER
Fixed-Point Exception Register
1
‘00000 00001’
No
R/W
64
IVOR[0-15]
Interrupt Vector Offset Register
400 - 415
‘01100 1xxxx’
Yes
R/W
178
IVPR
63
‘00001 11111’
Yes
R/W
179
LR
Link Register
8
‘00000 01000’
No
R/W
59
MCSRR0
Machine Check Save/Restore Register 0
570
‘10001 11010’
Yes
R/W
176
MCSRR1
Machine Check Save/Restore Register 1
571
‘10001 11011’
Yes
R/W
177
MCSR
572
‘10001 11100’
Yes
R/W
181
828
‘11001 11100’
Yes
Clear
MSR
-
-
Yes
R/W
173
MMUBE0
MMU Bolted Entry-0 Spec Register
820
‘11001 10100’
Yes
R/W
121
MMUBE1
MMU Bolted Entry-1 Spec Register
821
‘11001 10101’
Yes
R/W
121
MMUCR
MMU Control Register
946
‘11101 10010’
Yes
R/W
126
PMUCC0
PMU Core Control Register
858
‘11010 11010’
Yes
R/W
259
PMUCC0
PMU Core Control Register, User
842
‘11010 01010’
No
Read
259
PID
Process ID Register
48
‘00001 10000’
Yes
R/W
120
PIR
286
‘01000 11110’
Yes
Read
68
PVR
287
‘01000 11111’
Yes
Read
68
PWM
Pulse Width Margin Register
886
‘11011 10110’
Yes
R/W
RMPD
Real Mode Page Descriptor Register
825
‘11001 11001’
Yes
R/W
120
RSTCFG
Reset Configuration Register
923
‘11100 11011’
Yes
Read
126
SRR0
Save/Restore Register 0
26
‘00000 11010’
Yes
R/W
174
Programming Model
Page 44 of 322
Version 2.2
July 31, 2014
User’s Manual
SPR
SPR Name
Decimal
SPRN
Binary SPRN
Privileged
Access
Page
SRR1
Save/Restore Register 1
27
‘00000 11011’
Yes
R/W
175
SPRG[0-3]
SPR General 0 - 3
272 - 275
‘01000 100xx’
Yes
R/W
67
SPRG3
SPR General 3
259
‘01000 00011’
No
Read
67
SPRG[4-7]
SPR General 4 -7
260 - 263
‘01000 001xx’
No
Read
67
SPRG[4-7]
SPR General 4 -7
276 - 279
‘01000 101xx’
Yes
R/W
67
SPRG8
SPR General 8
604
‘10010 11100’
Yes
R/W
67
SSPCR
Supervisor Search Priority Configuration Register
830
‘11001 11110’
Yes
R/W
122
TBL
Time Base Register
268
‘01000 01100’
No
Read
158
TBL
Time Base Register Lower
284
‘01000 11100’
Yes
Write
158
TBU
Time Base Register Upper
269
‘01000 01101’
No
Read
158
285
‘01000 11101’
Yes
Write
TCR
340
‘01010 10100’
Yes
R/W
163
TSR
336
‘01010 10000’
Yes
R/C
164
848
‘11010 10000’
Yes
Write
ISPCR
tlbivax, tlbsx Search Priority Configuration Register
829
‘11001 11101’
Yes
R/W
123
USPCR
User Search Priority Configuration Register
831
‘11001 11111’
Yes
R/W
125
USPGR0
User SPR General 0
256
‘01000 00000’
No
R/W
67
2.2.1 Register Types
There are five register types contained within or supported by the PowerPC 476FP core. Each register type is
characterized by the instructions that are used to read and write the registers of that type. The following
subsections provide an overview of each of the register types and the instructions associated with them.
2.2.1.1 General Purpose Registers
The PowerPC 476FP core contains 32 GPRs; each contains a 32-bit integer. Data from the data cache or
memory can be loaded into GPRs by using integer load instructions; the contents of GPRs can be stored to
the data cache or memory by using integer store instructions. Most of the integer instructions reference
GPRs. The GPRs are also used as targets and sources for most of the instructions that read and write the
other register types.
Section 2.6 Integer Processing on page 63 provides more information about integer operations and the use of
GPRs.
Version 2.2
July 31, 2014
Programming Model
Page 45 of 322
User’s Manual
2.2.1.2 Special Purpose Registers
Special Purpose Registers (SPRs) (see Table 2-5 on page 43 and Table A-1 on page 263) are directly
accessed by using the mtspr and mfspr instructions. In addition, certain SPRs can be updated as a sideeffect of the execution of various instructions. For example, the Fixed-Point Exception Register (XER) (see
Section 2.6.2 Fixed-Point Exception Register (XER) on page 64) is an SPR that is updated with arithmetic
status (such as carry and overflow) upon execution of certain forms of integer arithmetic instructions.
SPRs control the use of the debug facilities, timers, interrupts, memory management, caches, and other
architected processor resources. Table A-1 Register Categories on page 263 shows the name, mnemonic,
and address for each SPR. Each of the SPRs is described in more detail within the section covering the function with which it is associated. See Table 2-5 on page 43 for a list of all SPRs.
2.2.1.3 Condition Register
The Condition Register (CR) is a unique type of 32-bit register and is divided into eight independent 4-bit
fields (CR0 - CR7). The CR can be used to record certain conditional results of various arithmetic and logical
operations. Subsequently, conditional branch instructions can designate a bit of the CR as one of the branch
conditions (see Section 2.5 Branch Processing on page 56). Instructions are also provided for performing
logical bit operations and for moving fields within the CR.
See Section 2.5.5.3 Condition Register (CR) on page 60 and the condition register section of the branch
chapter in Power ISA Version 2.05, Book-I for more information about the various instructions that can update
the CR.
2.2.1.4 Machine State Register
The Machine State Register (MSR) is a unique type of register that controls important chip functions, such as
enabling or disabling various interrupt types.
The MSR can be written from a GPR by using the mtmsr instruction. The contents of the MSR can be read
into a GPR by using the mfmsr instruction. The MSR[EE] bit can be set or cleared atomically by using the
wrtee or wrteei instructions. The MSR contents are also automatically saved, altered, and restored by the
interrupt-handling mechanism. See Section 7.4.1 Machine State Register (MSR) on page 173 for more
detailed information about the MSR and the function of each of its bits.
2.2.1.5 Device Control Registers
Device Control Registers (DCRs) are on-chip registers that exist architecturally and physically outside the
PowerPC 476FP core, and thus are not specified by the Power ISA, nor by this user’s manual for the
PowerPC 476FP core. Rather, Power ISA defines the existence of the DCR address space and the instructions that access the DCRs and does not define any particular DCRs. The DCR access instructions are move
to device control register (mtdcr) and move from device control register (mfdcr), which move data between
GPRs and the DCRs.
DCRs can be used to control various on-chip system functions, such as the operation of on-chip buses,
peripherals, and certain processor behaviors.
Programming Model
Page 46 of 322
Version 2.2
July 31, 2014
User’s Manual
To accommodate additional DCRs and SPRs in a system, the PowerPC 476FP core has added additional
instructions, such as mtdcrx (move to device control register indexed), mtdcrux (move to device control
register user-mode indexed), mfdcrx (move from device control register indexed), and mfdcrux (move from
device control register user-mode indexed).
2.3 Instruction Classes
Power ISA defines all instructions as falling into one of the following four classes, as determined by the
primary opcode (and the extended opcode, if any):
1. Defined
2. Preserved
3. Reserved (-illegal or -no-op)
2.3.1 Defined Instruction Class
This class of instructions consists of all the instructions defined in Power ISA Version 2.05, Book-I, Book-II,
and Book-III E. In general, defined instructions are guaranteed to be supported within a Power ISA system as
specified by the architecture, either within the processor implementation itself or within emulation software
supported by the system operating software.
One exception to this is that, for implementations (such as the PowerPC 476FP core) that only provide the
32-bit subset of Power ISA, it is not expected (and likely not even possible) that emulation of the 64-bit
behavior of the defined instructions will be provided by the system.
As defined by Power ISA, any attempt to execute a defined instruction produces one of the following effects:
• An illegal instruction exception type program interrupt, if the instruction is not recognized by the implementation; or
• An unimplemented instruction exception type program interrupt, if the instruction is recognized by the
implementation and is not a floating-point instruction, but is not supported by the implementation; or
• A floating-point unavailable interrupt if the instruction is recognized as a floating-point instruction, but
floating-point processing is disabled; or
• Performance of the actions described in the rest of this document, if the instruction is recognized and
supported by the implementation. The architected behavior can cause other exceptions.
The PowerPC 476FP core recognizes and fully supports all of the instructions in the defined class, with a few
exceptions. First, because the PowerPC 476FP core is a 32-bit implementation, those operations that are
defined specifically for 64-bit operation are not supported at all, and always cause an illegal instruction exception type program interrupt.
There is one defined instruction that is not supported within the PowerPC 476FP core. The instruction,
mfapidi (move from auxiliary processor ID indirect), is a special instruction intended to assist with identification of the auxiliary processors that can be attached to a particular processor implementation. Because the
PowerPC 476FP core does not have an auxiliary processor, the mfapidi instruction is not supported. Execution of mfapidi causes an illegal instruction exception type program interrupt.
Version 2.2
July 31, 2014
Programming Model
Page 47 of 322
User’s Manual
2.3.2 Preserved Instruction Class
The preserved instruction class is provided to support compatibility with earlier versions of either the
PowerPC Architecture or the Power ISA. This instruction class includes opcodes defined for these previous
architectures, but that are no longer defined for Power ISA.
Any attempt to execute a preserved instruction results in one of the following effects:
• Performance of the actions described in the previous version of the architecture, if the instruction is recognized
• An illegal instruction exception type program interrupt, if the instruction is not recognized
The only preserved instruction recognized and supported by the PowerPC 476FP core is the mftb (move
from time base) opcode. This instruction was used in the PowerPC Architecture to read the Time Base Upper
(TBU) and Time Base Lower (TBL) registers. Power ISA instead defines TBU and TBL as SPRs, and thus the
mfspr (move from special purpose register) instruction is used to read them. To enable previous time base
management software to be run on the PowerPC 476FP core, the core also supports the preserved opcode
of mftb. However, the mftb instruction is not included in the various sections of this document that describe
the implemented instructions, and software should take care to use the currently architected mechanism of
mfspr to read the time base registers, to guarantee portability between the PowerPC 476FP core and future
implementations of Power ISA.
2.3.3 Reserved Instruction Class
This class of instructions consists of all instruction primary opcodes (and associated extended opcodes, if
applicable) that do not belong to any of the defined, or preserved instruction classes.
Reserved instructions are available for future versions of Power ISA. That is, future versions of Power ISA
can define any of these instructions to perform new functions or make them available for implementationdependent use as allocated instructions. There are two types of reserved instructions: reserved-illegal and
reserved-no-op.
Any attempt to execute a reserved-illegal instruction causes an illegal instruction exception type program
interrupt on the PowerPC 476FP core. Therefore, reserved-illegal instructions are available for future extensions to Power ISA that might affect the architected state. Such extensions might include new forms of
integer or floating-point arithmetic instructions, or new forms of load or store instructions that affect architected registers or the contents of memory.
However, any attempt to execute a reserved-no-op instruction either has no effect (that is, is treated as a nooperation instruction) or causes an illegal instruction exception type program interrupt on the PowerPC
476FP core. Because implementations are typically expected to treat reserved-no-op instructions as true noops, these instruction opcodes are thus available for future extensions to the Power ISA that have no effect
on architected state. Such extensions might include performance-enhancing hints, such as new forms of
cache touch instructions. Software can take advantage of the functions offered by the new instructions, and
still remain backwards-compatible with implementations of previous versions of Power ISA.
The PowerPC 476FP core implements all of the reserved-no-op instruction opcodes as true no-ops.
Programming Model
Page 48 of 322
Version 2.2
July 31, 2014
User’s Manual
2.4 Implemented Instruction Set Summary
This section provides an overview of the various types and categories of instructions implemented within the
PowerPC 476FP core. In addition, Appendix B Instruction Summary on page 271 provides a listing of instructions that are unique to the PowerPC 476FP core. Table 2-6 summarizes the PowerPC 476FP core instruction set by category. Instructions within each category are described in subsequent sections.
Table 2-6. Instruction Categories
Category
Integer
Subcategory
Integer Storage Access
load, store
Integer Arithmetic
add, subtract, multiply, divide, negate
Integer Logical
and, andc, or, orc, xor, nand, nor, xnor, extend sign, count
leading zeros
Integer Compare
compare, compare logical
Integer Select
select operand
Integer Trap
trap
Integer Rotate
rotate and insert, rotate and mask
Integer Shift
shift left, shift right, shift right algebraic
Branch
Processor control
Storage control
Allocated
Instruction Types
branch, branch conditional, branch to link, branch to count
Condition register logical
crand, crandc, cror, crorc, crnand, crnor, crxor, crxnor
Register management
move to/from SPR, move to/from DCR, move to/from MSR, write
to external interrupt enable bit, move to/from CR
System linkage
system call, return from interrupt, return from critical interrupt,
return from machine check interrupt
Processor synchronization
instruction synchronize
Cache management
data allocate, data invalidate, data touch, data zero, data flush,
data store, instruction invalidate, instruction touch
TLB management
read, write, search, synchronize
Storage synchronization
memory synchronize, memory barrier
Allocated arithmetic
multiply-accumulate, negative multiply-accumulate, multiply halfword
Allocated logical
detect left-most zero byte
Allocated cache management
data congruence-class invalidate, instruction congruence-class
invalidate
Allocated cache debug
data read, instruction read
2.4.1 Integer Instructions
Integer instructions transfer data between memory and the GPRs and perform various operations on the
GPRs. This category of instructions is further divided into the eight subcategories that are described in the
subsequent sections.
Version 2.2
July 31, 2014
Programming Model
Page 49 of 322
User’s Manual
2.4.1.1 Integer Storage Access Instructions
Integer storage access instructions load and store data between memory and the GPRs. These instructions
operate on bytes, halfwords, and words. Integer storage access instructions also support loading and storing
multiple registers, character strings, and byte-reversed data, and loading data with sign-extension.
Table 2-7 on page 50 shows the integer storage access instructions in the PowerPC 476FP core. In the table,
the syntax [u] indicates that the instruction has both an update form (in which the RA addressing register is
updated with the calculated address) and a nonupdate form. Similarly, the syntax [x] indicates that the
instruction has both an indexed form (in which the address is formed by adding the contents of the RA and
RB GPRs) and a base + displacement form (in which the address is formed by adding a 16-bit signed immediate value (specified as part of the instruction) to the contents of GPR RA. See the detailed instruction
descriptions in Appendix B Instruction Summary on page 271.
Table 2-7. Integer Storage Access Instructions
Loads
Stores
Byte
Halfword
Word
Multiple/String
Byte
Halfword
Word
Multiple/String
lbz[u][x]
lha[u][x]
lhbrx
lhz[u][x]
lwarx
lwbrx
lwz[u][x]
lmw
lswi
lswx
stb[u][x]
sth[u][x]
sthbrx
stw[u][x]
stwbrx
stwcx.
stmw
stswi
stswx
2.4.1.2 Integer Arithmetic Instructions
Arithmetic operations are performed on integer or ordinal operands stored in registers. Instructions that
perform operations on two operands are defined in a 3-operand format; an operation is performed on the
operands, which are stored in two registers. The result is placed in a third register. Instructions that perform
operations on one operand are defined in a 2-operand format; the operation is performed on the operand in a
register and the result is placed in another register. Several instructions also have immediate formats in which
one of the source operands is a field in the instruction.
Most integer arithmetic instructions have versions that can update CR[CR0] or XER[SO, OV] based on the
result of the instruction. Some integer arithmetic instructions also update XER[CA] (carry) implicitly. See
Section 2.6 Integer Processing on page 63 for more information about how these instructions update the CR
or the XER.
Table 2-8 lists the integer arithmetic instructions in the PowerPC 476FP core. In the table, the syntax [o] indicates that the instruction has both an o form (that updates the XER[SO,OV] fields) and a non-o form. Similarly, the syntax [.] indicates that the instruction has both a record form (that updates CR[CR0]) and a
nonrecord form.
Table 2-8. Integer Arithmetic Instructions
Add
Subtract
Multiply
Divide
Negate
add[o][.]
addc[o][.]
adde[o][.]
addi
addic[.]
addis
addme[o][.]
addze[o][.]
subf[o][.]
subfc[o][.]
subfe[o][.]
subfic
subfme[o][.]
subfze[o][.]
mulhw[.]
mulhwu[.]
mulli
mullw[o][.]
divw[o][.]
divwu[o][.]
neg[o][.]
Programming Model
Page 50 of 322
Version 2.2
July 31, 2014
User’s Manual
2.4.1.3 Integer Logical Instructions
Table 2-9 on page 51 lists the integer logical instructions in the PowerPC 476FP core. See Section 2.4.1.2
Integer Arithmetic Instructions for an explanation of the [.] syntax.
Table 2-9. Integer Logical Instructions
And
And with
complement
Nand
Or
andc[.]
nand[.]
or[.]
ori
oris
and[.]
andi.
andis.
Or with
complement
Nor
Xor
Equivalence
Extend sign
orc[.]
nor[.]
xor[.]
xori
xoris
eqv[.]
extsb[.]
extsh[.]
Count
leading
zeros
cntlzw[.]
2.4.1.4 Integer Compare Instructions
These instructions perform arithmetic or logical comparisons between two operands and update the CR with
the result of the comparison.
Table 2-10 lists the integer compare instructions in the PowerPC 476FP core.
Table 2-10. Integer Compare Instructions
Arithmetic
Logical
cmp
cmpi
cmpl
cmpli
2.4.1.5 Integer Trap Instructions
Table 2-11 lists the integer trap instructions in the PowerPC 476FP core.
Table 2-11. Integer Trap Instructions
Trap
tw
twi
2.4.1.6 Integer Rotate Instructions
These instructions rotate operands stored in the GPRs. Rotate instructions can also mask rotated operands.
Table 2-12 lists the rotate instructions in the PowerPC 476FP core. See Section 2.4.1.2 Integer Arithmetic
Instructions on page 50 for an explanation of the [.] syntax.
Table 2-12. Integer Rotate Instructions
Version 2.2
July 31, 2014
Rotate and Insert
Rotate and Mask
rlwimi[.]
rlwnm[.]
rlwinm[.]
Programming Model
Page 51 of 322
User’s Manual
2.4.1.7 Integer Shift Instructions
Table 2-13 lists the integer shift instructions in the PowerPC 476FP core. Note that the shift right algebraic
instructions implicitly update the XER[CA] field. See Section 2.4.1.2 Integer Arithmetic Instructions on
page 50 for an explanation of the [.] syntax.
Table 2-13. Integer Shift Instructions
Shift Left
Shift Right
Shift Right Algebraic
slw[.]
srw[.]
sraw[.]
srawi[.]
2.4.1.8 Integer Select Instruction
Table 2-14 lists the integer select instruction in the PowerPC 476FP core. The RA operand is 0 if the RA field
of the instruction is 0, or is the contents of GPR[RA] otherwise.
Table 2-14. Integer Select Instruction
Integer Select
isel
2.4.2 Branch Instructions
These instructions unconditionally or conditionally branch to an address. Conditional branch instructions can
test condition codes set in the CR by a previous instruction and branch accordingly. Conditional branch
instructions can also decrement and test the CTR as part of branch determination and can save the return
address in the Link Register (LR). The target address for a branch can be a displacement from the current
instruction address or an absolute address, or contained in the LR or CTR.
See Section 2.5 Branch Processing on page 56 for more information about branch operations.
Table 2-15 on page 52 lists the branch instructions in the PowerPC 476FP core. In the table, the syntax [l]
indicates that the instruction has both a link update form (that updates LR with the address of the instruction
after the branch) and a nonlink update form. Similarly, the syntax [a] indicates that the instruction has both an
absolute address form (in which the target address is formed directly by using the immediate field specified
as part of the instruction) and a relative form (in which the target address is formed by adding the specified
immediate field to the address of the branch instruction).
Table 2-15. Branch Instructions
Branch
b[l][a]
bc[l][a]
bcctr[l]
bclr[l]
2.4.3 Processor Control Instructions
Processor control instructions manipulate system registers, perform system software linkage, and synchronize processor operations. The following sections describe instructions in these three subcategories of
processor control instructions.
Programming Model
Page 52 of 322
Version 2.2
July 31, 2014
User’s Manual
2.4.3.1 Condition Register Logical Instructions
These instructions perform logical operations on a specified pair of bits in the CR, placing the result in another
specified bit. The benefit of these instructions is that they can logically combine the results of several comparison operations without incurring the extra processing time of conditional branching between each one. Software performance can significantly improve if multiple conditions are tested in a group as part of a branch
decision.
Table 2-16 lists the condition register logical instructions in the PowerPC 476FP core.
Table 2-16. Condition Register Logical Instructions
Condition Register Logical
crand
crandc
creqv
crnand
crnor
cror
crorc
crxor
2.4.3.2 Register Management Instructions
These instructions move data between the GPRs and control registers in the PowerPC 476FP core.
Table 2-17 lists the register management instructions in the PowerPC 476FP core.
Table 2-17. Register Management Instructions
CR
DCR
MSR
SPR
mcrf
mcrxr
mfcr
mtcrf
mfdcr
mfdcrux
mfdcrx
mtdcr
mtdcrux
mtdcrx
mfmsr
mtmsr
wrtee
wrteei
mfspr
mtspr
2.4.3.3 System Linkage Instructions
These instructions start supervisor software level for system services and return from interrupts.
Table 2-18 lists the system linkage instructions in the PowerPC 476FP core.
Table 2-18. System Linkage Instructions
System Linkage
rfi
rfci
rfmci
sc
Version 2.2
July 31, 2014
Programming Model
Page 53 of 322
User’s Manual
2.4.3.4 Processor Synchronization Instruction
The processor synchronization instruction, isync, forces the processor to complete all instructions preceding
the isync before allowing any context changes as a result of any instructions that follow the isync. Additionally, all instructions that follow the isync execute within the context established by the completion of all the
instructions that precede the isync. See Section 2.10 Synchronization on page 76 for more information about
the synchronizing effect of isync.
Table 2-19 shows the processor synchronization instruction in the PowerPC 476FP core.
Table 2-19. Processor Synchronization Instruction
Processor Synchronization
isync
2.4.4 Storage Control Instructions
These instructions manage the instruction and data caches and the TLB of the PowerPC 476FP core. Instructions are also provided to synchronize and order storage accesses. The following sections describe the
instructions in these three subcategories of storage control instructions.
2.4.4.1 Cache Management Instructions
These instructions control the operation of the data and instruction caches. Instructions are provided to fill,
flush, invalidate, or zero data cache blocks, where a block is defined as a 32-byte cache line. Instructions are
also provided to fill or invalidate instruction cache blocks.
Table 2-20 lists the cache management instructions in the PowerPC 476FP core.
Table 2-20. Cache Management Instructions
Data Cache
Instruction Cache
dcba (no-op)
dcbf
dcbi
dcbst
dcbt
dcbtst
dcbz
dcbtls
dcbtstls
dcblc
icbi
icbt
icbtls
icblc
2.4.4.2 TLB Management Instructions
The TLB management instructions read and write entries of the TLB array and search the TLB array for an
entry that translates to a particular virtual address.
Table 2-21 on page 55 lists the TLB management instructions in the PowerPC 476FP core. See
Section 2.4.1.2 Integer Arithmetic Instructions on page 50 for an explanation of the [.] syntax.
Programming Model
Page 54 of 322
Version 2.2
July 31, 2014
User’s Manual
Table 2-21. TLB Management Instructions
TLB Management
tlbre
tlbsx[.]
tlbsync
tlbwe
tlbivax
2.4.4.3 Storage Synchronization Instructions
The storage synchronization instructions allow software to enforce ordering among the storage accesses
caused by load and store instructions, which by default are weakly ordered by the processor. Weakly ordered
means that the processor is architecturally permitted to perform loads and stores generally out-of-order with
respect to their sequence within the instruction stream, with some exceptions. However, if a storage synchronization instruction is executed, all storage accesses prompted by instructions preceding the synchronizing
instruction must be performed before any storage accesses prompted by instructions that come after the
synchronizing instruction. See Section 2.10 Synchronization on page 76 for more information about storage
synchronization.
Table 2-22 shows the storage synchronization instructions in the PowerPC 476FP core.
Table 2-22. Storage Synchronization Instructions
Storage Synchronization
lwsync
msync
mbar
2.4.5 Previous Integer Multiply-Accumulate Instructions
The previous integer multiply-accumulate instructions implemented within the PowerPC 476FP core are
divided into four subcategories and are shown in Table 2-23 on page 56. See Section 2.4.1.2 Integer Arithmetic Instructions on page 50 for an explanation of the [.] and [o] syntax.
Version 2.2
July 31, 2014
Programming Model
Page 55 of 322
User’s Manual
Table 2-23. Previous Integer Multiply-Accumulate Instructions
Arithmetic
Multiply-Accumulate
macchw[o][.]
macchws[o][.]
macchwsu[o][.]
macchwu[o][.]
machhw[o][.]
machhws[o][.]
machhwsu[o][.]
machhwu[o][.]
maclhw[o][.]
maclhws[o][.]
maclhwsu[o][.]
maclhwu[o][.]
Negative
Multiply-Accumulate
Multiply Halfword
nmacchw[o][.]
nmacchws[o][.]
nmachhw[o][.]
nmachhws[o][.]
nmaclhw[o][.]
nmaclhws[o][.]
mulchw[.]
mulchwu[.]
mulhhw[.]
mulhhwu[.]
mullhw[.]
mullhwu[.]
2.5 Branch Processing
The following sections provide additional information about branch addressing, instruction fields, prediction,
and registers.
2.5.1 Branch Addressing
The branch instruction (b[l][a]) specifies the displacement of the branch target address as a 26-bit value (the
24-bit LI field right-extended with ‘00’). This displacement is regarded as a signed 26-bit number covering an
address range of ±32 MB. Similarly, the branch conditional instruction (bc[l][a]) specifies the displacement as
a 16-bit value (the 14-bit BD field right-extended with ‘00’). This displacement covers an address range of
±32 KB.
For the relative form of the branch and branch conditional instructions (b[l] and bc[l], with instruction field
AA = ‘0’), the target address is the address of the branch instruction itself (the current instruction address)
plus the signed displacement. This address calculation is defined to wrap around from the maximum effective
address (x‘FFFF FFFF’) to x‘0000 0000’, and vice-versa.
For the absolute form of the branch and branch conditional instructions (ba[l] and bca[l], with instruction field
AA = ‘1’), the target address is the sign-extended displacement. This means that with absolute forms of the
branch instruction, the branch target can be within the first or last 32 MB of the address space. With the absolute form of the branch conditional instructions, the branch target can be within the first or last 32 KB of the
address space.
The other two branch instructions, bclr (branch conditional to LR) and bcctr (branch conditional to CTR), do
not use absolute nor relative addressing. Instead, they use indirect addressing, in which the target of the
branch is specified indirectly as the contents of the LR or CTR.
Programming Model
Page 56 of 322
Version 2.2
July 31, 2014
User’s Manual
2.5.2 Branch Instruction BI Field
Conditional branch instructions can optionally test one bit of the CR, as indicated by the branch option (BO)
instruction field bit 0 (see the BO field description in Section 2.5.3). The value of the branch index (BI) instruction field specifies the CR bit to be tested (0 - 31). The BI field is ignored if BO[0] =‘1’. The branch (b[l][a])
instruction is by definition unconditional, and hence does not have a BI instruction field. Instead, the position
of this field is part of the LI displacement field.
2.5.3 Branch Instruction BO Field
The BO field specifies the condition under which a conditional branch is taken and whether the branch decrements the CTR. The branch (b[l][a]) instruction is by definition unconditional and hence does not have a BO
instruction field. Instead, the position of this field is part of the LI displacement field.
Conditional branch instructions can optionally test one bit in the CR. This option is selected when BO[0] = ‘0’;
if BO[0] = ‘1’, the CR does not participate in the branch condition test. If the CR condition option is selected,
the condition is satisfied (branch can occur) if the CR bit selected by the BI instruction field matches BO[1].
Conditional branch instructions can also optionally decrement the CTR by one and test whether the decremented value is 0. This option is selected when BO[2] = ‘0’; if BO[2] = ‘1’, the CTR is not decremented and
does not participate in the branch condition test. If CTR decrement option is selected, BO[3] specifies the
condition that must be satisfied to allow the branch to be taken. If BO[3] = ‘0’, CTR ≠ ‘0’ is required for the
branch to occur. If BO[3] = ‘1’, CTR = ‘0’ is required for the branch to occur.
Table 2-24 summarizes the use of the bits of the BO field. BO[4] is further discussed in Section 2.5.4 Branch
Prediction on page 58.
Table 2-24. BO Field Definition
BO Bit
Description
BO[0]
CR test control.
0
Test CR bit specified by BI field for value specified by BO[1].
1
Do not test CR.
BO[1]
CR test value.
0
If BO[0] = ‘0’, test for CR[BI] = ‘0’.
1
If BO[0] = ‘0’, test for CR[BI] = ‘1’.
BO[2]
CTR decrement and test control.
0
Decrement CTR by one and test whether the decremented CTR satisfies the condition specified by BO[3].
1
Do not decrement CTR; do not test CTR.
BO[3]
CTR test value.
0
If BO[2] = ‘0’, test for decremented CTR ≠ ‘0’.
1
If BO[2] = ‘0’, test for decremented CTR = ‘0’.
BO[4]
Branch prediction reversal.
0
Apply standard branch prediction.
1
Reverse the standard branch prediction.
Table 2-25 on page 58 lists specific BO field contents and the resulting actions; z represents a mandatory
value of zero and y is a branch prediction option discussed in Section 2.5.4 on page 58.
Version 2.2
July 31, 2014
Programming Model
Page 57 of 322
User’s Manual
Table 2-25. BO Field Examples
BO Value
Description
0000y
Decrement the CTR, then branch if the decremented CTR ≠ ‘0’ and CR[BI] = ‘0’.
0001y
Decrement the CTR, then branch if the decremented CTR = ‘0’ and CR[BI] = ‘0’.
001zy
Branch if CR[BI] = ‘0’.
0100y
Decrement the CTR, then branch if the decremented CTR ≠ ‘0’ and CR[BI] = ‘1’.
0101y
Decrement the CTR, then branch if the decremented CTR = ‘0’ and CR[BI] = ‘1’.
011zy
Branch if CR[BI] = ‘1’.
1z00y
Decrement the CTR, then branch if the decremented CTR ≠ ‘0’.
1z01y
Decrement the CTR, then branch if the decremented CTR = ‘0’.
1z1zz
Branch always.
The “a” and “t” bits of the BO field can be used by software to provide a hint about whether the branch is likely
to be taken or is likely not to be taken, as follows:
• BO = 0z1at (“at” bits = ‘11’): The branch is very likely to be taken.
• BO = 0z1at (“at” bits = ‘10’): The branch is very likely not to be taken.
• This branch hint is enabled when CCR2[SPC5C1] is set. Otherwise, this branch prediction hint is ignored.
2.5.4 Branch Prediction
Conditional branches might be taken or not taken; if taken, instruction fetching is redirected to the target
address. If the branch is not taken, instruction fetching falls through to the next sequential instruction. The
PowerPC 476FP core attempts to predict whether a branch is taken before all information necessary to determine the branch direction is available. This action is called branch prediction. The core can then prefetch
instructions down the predicted path. If the prediction is correct, performance is improved because the branch
target instruction is available immediately, instead of having to wait until the branch conditions are resolved. If
the prediction is incorrect, the prefetched instructions (that were fetched from addresses down the wrong path
of the branch) must be discarded and new instructions fetched from the correct path.
The PowerPC 476FP core combines the static prediction mechanism defined by Power ISA and a dynamic
branch prediction mechanism to provide correct branch prediction as often as possible. The dynamic branch
prediction mechanism is an implementation optimization and is not part of the architecture, nor is it visible to
the programming model.
The static branch prediction mechanism enables software to designate the preferred branch prediction
through bits in the instruction encoding. The default static branch prediction for conditional branches is as
follows:
Predict that the branch is to be taken if ((BO[0] ∧ BO[2]) ∨ s) = 1
where s is bit 16 of the instruction (the sign bit of the displacement for all branch conditional [bc] forms and
zero for all branch conditional to link register [bclr] and branch conditional to count register [bcctr] forms).
That is, conditional branches are predicted taken if their branch displacement is negative (that is, the branch
is branching backwards from the current instruction address). The standard prediction for this case derives
from considering the relative form of bc, often used at the end of loops to control the number of times that a
Programming Model
Page 58 of 322
Version 2.2
July 31, 2014
User’s Manual
loop is executed. Because the branch is taken each time the loop is executed except the last, it is best if the
branch is predicted taken. The branch target is the beginning of the loop, so the branch displacement is negative and s = ‘1’. Because this situation is most common, a branch is taken if s = ‘1’.
If branch displacements are positive, s = ‘0’, the branch is predicted not taken. Also, if the branch instruction
is any form of bclr or bcctr except the unconditional form, s = ‘0’, and the branch is predicted not taken.
There is a peculiar consequence of this prediction algorithm for the absolute forms of bc (bca and bcla). As
described in Section 2.5.1 Branch Addressing on page 56, if s = ‘1’, the branch target is in high memory. If
s = ‘0’, the branch target is in low memory. Because these are absolute-addressing forms, there is no reason
to treat high and low memory differently. Nevertheless, for the high memory case, the standard prediction is
taken, and for the low memory case the standard prediction is not taken.
Another bit in the BO field allows software further control over branch prediction. Specifically, BO[4] is the
prediction reversal bit. If BO[4] = ‘0’, the default prediction is applied. If BO[4] = ‘1’, the reverse of the default
prediction is applied. For the cases in Table 2-25 BO Field Examples on page 58 where BO[4] = y, software
can reverse the default prediction by setting y to ‘1’. This should only be done when the default prediction is
likely to be wrong. Note that for the branch always condition, reversal of the default prediction is not allowed,
as BO[4] is designated as z for this case, meaning the bit must be set to 0 or the instruction form is not valid.
2.5.5 Branch Control Registers
There are three registers in the PowerPC 476FP core that are associated with branch processing, and they
are described in the following sections.
2.5.5.1 Link Register (LR)
The LR is written from a GPR by using mtspr and can be read into a GPR by using mfspr. The LR can also
be updated by the link update form of branch instructions (instruction field LK = ‘1’). Such branch instructions
load the LR with the address of the instruction that follows the branch instruction (4 + address of the branch
instruction). Thus, the LR contents can be used as a return address for a subroutine that was entered by
using a link update form of branch. The bclr instruction uses the LR in this fashion, enabling indirect
branching to any address.
When being used as a return address by a bclr instruction, bits 30:31 of the LR are ignored, because all
instruction addresses are on word boundaries.
Access to the LR is nonprivileged.
LR
0
1
2
3
4
5
6
Bits
Field Name
0:31
LR
Version 2.2
July 31, 2014
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Description
Link Register contents.
Target address of bclr instruction.
Programming Model
Page 59 of 322
User’s Manual
2.5.5.2 Count Register (CTR)
The CTR is written from a GPR by using mtspr and can be read into a GPR by using mfspr. The CTR
contents can be used as a loop count that gets decremented and tested by conditional branch instructions
that specify count decrement as one of their branch conditions (instruction field BO[2] = ‘0’). Alternatively, the
CTR contents can specify a target address for the bcctr instruction, enabling indirect branching to any
address.
Access to the CTR is nonprivileged.
Count
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:63
Count
Description
Used as the count for branch conditional with decrement instructions, or as the target address for
bcctr instructions.
2.5.5.3 Condition Register (CR)
The CR is used to record certain information (conditions) related to the results of the various instructions that
are enabled to update the CR. A bit in the CR can also be selected to be tested as part of the condition of a
conditional branch instruction.
The CR is organized into eight 4-bit fields (CR0 - CR7). Table 2-26 on page 61 lists the instructions that
update the CR.
Access to the CR is nonprivileged.
CR0
CR1
CR2
CR3
CR4
CR5
CR6
CR7
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:35
CR0
Condition register field 0.
36:39
CR1
40:43
CR2
44:47
CR3
48:51
CR4
52:55
CR5
56:49
CR6
60:63
CR7
Programming Model
Page 60 of 322
Description
Version 2.2
July 31, 2014
User’s Manual
Table 2-26. CR Updating Instructions
Integer
Storage
Access
stwcx.
Arithmetic
Logical
Compare
Rotate
Shift
add.[o]
addc.[o]
adde.[o]
addic.
addme.[o]
addze.[o]
and.
andi.
andis.
cmp
cmpi
rlwimi.
slw.
rlwinm.
rlwnm.
srw.
andc.
cmpl
cmpli
sraw.
srawi.
nand.
subf.[o]
subfc.[o]
subfe.[o]
subfme.[o]
subfze.[o]
or.
orc.
nor.
xor.
mulhw.
mulhwu.
mullw.[o]
eqv.
divw.[o]
divwu.[o]
extsb.
extsh.
neg.[o]
cntlzw.
Processor
Control
Storage
Control
Auxiliary
Processor
CR-Logical
and Register
Management
TLB
Management
Arithmetic
and Logical
tlbsx.
macchw.[o]
macchws.[o]
macchwsu.[o]
macchwu.[o]
machhw.[o]
machhws.[o]
machhwsu.[o]
machhwu.[o]
maclhw.[o]
maclhws.[o]
maclhwsu.[o]
maclhwu.[o]
crand
crandc
creqv
crnand
crnor
cror
crorc
crxor
mcrf
mcrxr
mtcrf
nmacchw.[o]
nmacchws.[o]
nmachhw.[o]
nmachhws.[o]
nmaclhw.[o]
nmaclhws.[o]
mulchw.
mulchwu.
mulhhw.
mulhhwu.
mullhw.
mullhwu.
dlmzb.
The Power ISA provides detailed information about how each of these instructions updates the CR. To
summarize, the CR can be accessed in any of the following ways:
• mfcr (move from Condition Register) reads the CR into a GPR. Note that this instruction does not update
the CR and is therefore not listed in Table 2-26.
• Conditional branch instructions can designate a CR bit to be used as a branch condition. Note that these
instructions do not update the CR and are therefore not listed in Table 2-26.
• mtcrf (move to Condition Register fields) sets specified CR fields by writing to the CR from a GPR, under
control of a mask field specified as part of the instruction.
• mcrf (move to Condition Register from Floating-Point Status and Control Register [FPSCR]) updates a
specified CR field by copying another specified CR field into it.
• mcrxr (move to Condition Register from Integer Exception Register [XER]) copies certain bits of the XER
into a specified CR field and clears the corresponding XER bits.
• Integer compare instructions update a specified CR field.
Version 2.2
July 31, 2014
Programming Model
Page 61 of 322
User’s Manual
• CR-logical instructions update a specified CR bit with the result of any one of eight logical operations on a
specified pair of CR bits.
• Certain forms of various integer instructions (the “.” forms) implicitly update CR[CR0], as do certain forms
of the auxiliary processor instructions implemented within the PowerPC 476FP core.
CR[CR0] Implicit Update By Integer Instructions
Most of the CR-updating instructions listed in Table 2-26 on page 61 implicitly update the CR0 field. These
are the various dot-form instructions, indicated by a “.” in the instruction mnemonic. Most of these instructions
update CR[CR0] according to an arithmetic comparison of 0 with the 32-bit result that the instruction writes to
the GPR file. That is, after performing the operation defined for the instruction, the 32-bit result that is written
to the GPR file is compared to 0 by using a signed comparison, independent of whether the actual operation
being performed by the instruction is considered signed. For example, logical instructions such as and., or.,
and nor. update CR[CR0] according to this signed comparison to 0, even though the result of such a logical
operation is not typically interpreted as a signed value. For each of these dot-form instructions, the individual
bits in CR[CR0] are updated as follows:
CR[CR0[0]] — LT
Less than 0; set if the most significant bit of the 32-bit result is ‘1’.
CR[CR0[1]] — GT Greater than 0; set if the 32-bit result is nonzero and the most significant bit of the
result is ‘0’.
CR[CR0[2]] — EQ Equal to 0; set if the 32-bit result is 0.
CR[CR0[3]] — SO Summary overflow; a copy of XER[SO] at the completion of the instruction (including
any XER[SO] update being performed the instruction itself.
Note: If an arithmetic overflow occurs, the sign of an instruction result indicated in CR[CR0] might not represent the true (infinitely precise) algebraic result of the instruction that set CR0. For example, if an add.
instruction adds two large positive numbers and the magnitude of the result cannot be represented as a twoscomplement number in a 32-bit register, an overflow occurs and CR[CR0[0]] is set, even though the infinitely
precise result of the add is positive.
Similarly, adding the largest 32-bit twos-complement negative number (x‘8000 0000’) to itself results in an
arithmetic overflow and x‘0000 0000’ is recorded in the target register. CR[CR0[2]] is set, indicating a result of
0, but the infinitely precise result is negative.
CR[CR0[3]] is a copy of XER[SO] at the completion of the instruction, whether or not the instruction that is
updating CR[CR0] is also updating XER[SO].
Note: If an instruction causes an arithmetic overflow but is not of the form that actually updates XER[SO], the
value placed in CR[CR0[3]] does not reflect the arithmetic overflow that occurred on the instruction; it is
merely a copy of the value of XER[SO] that was already in the XER before the execution of the instruction
updating CR[CR0].
There are a few dot-form instructions that do not update CR[CR0] in the fashion described previously. These
instructions are: store word conditional indexed (stwcx.), TLB search indexed (tlbsx.), and determine left
most zero byte (dlmzb). See the instruction descriptions in Power ISA for details on how these instructions
update CR[CR0].
Programming Model
Page 62 of 322
Version 2.2
July 31, 2014
User’s Manual
CR Update By Integer Compare Instructions
Integer compare instructions update a specified CR field with the result of a comparison of two 32-bit
numbers, the first of which is from a GPR and the second of which is either an immediate value or from
another GPR. There are two types of integer compare instructions, arithmetic and logical, and they are distinguished by the interpretation given to the 32-bit numbers being compared. For arithmetic compares, the
numbers are considered to be signed, whereas for logical compares, the numbers are considered to be
unsigned. For example, consider the comparison of 0 with x‘FFFF FFFF’. In an arithmetic compare, 0 is
larger; in a logical compare, x‘FFFF FFFF’ is larger.
A compare instruction can direct its result to any CR field. The BF field (bits 6:8) of the instruction specifies
the CR field to be updated. After a compare, the specified CR field is interpreted as follows:
CR[BF]0 — LT
The first operand is less than the second operand.
CR[BF]1 — GT
The first operand is greater than the second operand.
CR[BF]2 — EQ
The first operand is equal to the second operand.
CR[BF]3 — SO
Summary overflow; a copy of XER[SO].
2.6 Integer Processing
Integer processing includes loading and storing data between memory and GPRs and performing various
operations on the values in GPRs and other registers (the categories of integer instructions are summarized
in Table 2-6 on page 49). The sections that follow describe the registers that are used for integer processing
and how they are updated by various instructions. In addition, Section 2.5.5.3 Condition Register (CR) on
page 60 provides more information about the CR updates caused by integer instructions. Finally, Power ISA
also provides details on the various register updates performed by integer instructions.
2.6.1 General Purpose Registers (GPRs)
The PowerPC 476FP core contains 32 GPRs. The contents of these registers can be transferred to and from
memory by using integer storage access instructions. Operations are performed on GPRs by most other
instructions.
Access to the GPRs is nonprivileged.
Data
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:63
Data
Version 2.2
July 31, 2014
Description
General Purpose Register data.
Programming Model
Page 63 of 322
User’s Manual
2.6.2 Fixed-Point Exception Register (XER)
The XER records overflow and carry indications from integer arithmetic and shift instructions. It also provides
a byte count for string indexed integer storage access instructions (lswx and stswx). Note that the term
exception in the name of this register does not refer to exceptions as they relate to interrupts, but rather to the
arithmetic exceptions of carry and overflow.
The fields of the XER are shown here; Table 2-27 and Table 2-28 list the instructions that update
XER[SO,OV] and the XER[CA] fields. The sections that follow the figure and tables describe the fields of the
XER in more detail.
Access to the XER is nonprivileged.
SO OV CA
Reserved
TBC
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32
SO
Summary overflow.
0
No overflow has occurred.
1
Overflow has occurred.
Can be set by mtspr or by integer or auxiliary processor instructions with the [o] option; can be
reset by mtspr or by mcrxr.
33
OV
Overflow.
0
No overflow has occurred.
1
Overflow has occurred.
Can be set by mtspr or by integer or allocated instructions with the [o] option; can be reset by
mtspr, by mcrxr, or by integer or allocated instructions with the [o] option.
34
CA
Carry.
0
Carry has not occurred.
1
Carry has occurred.
Can be set by mtspr or by certain integer arithmetic and shift instructions; can be reset by mtspr,
by mcrxr, or by certain integer arithmetic and shift instructions.
35:56
Reserved
57:63
TBC
Programming Model
Page 64 of 322
Description
Transfer byte count.
Used as a byte count by lswx and stswx; written by dlmzb[.] and by mtspr.
Version 2.2
July 31, 2014
User’s Manual
Table 2-27. XER[SO,OV] Updating Instructions
Integer Arithmetic
Processor
Control
Auxiliary Processor
Add
Subtract
Multiply
Divide
Negate
addo[.]
addco[.]
addeo[.]
addmeo[.]
addzeo[.]
subfo[.]
subfco[.]
subfeo[.]
subfmeo[.]
subfzeo[.]
mullwo[.]
divwo[.]
divwuo[.]
nego[.]
MultiplyAccumulate
Negative MultiplyAccumulate
Register
Management
macchwo[.]
macchwso[.]
macchwsuo[.]
macchwuo[.]
machhwo[.]
machhwso[.]
machhwsuo[.]
machhwuo[.]
maclhwo[.]
maclhwso[.]
maclhwsuo[.]
maclhwuo[.]
nmacchwo[.]
nmacchwso[.]
nmachhwo[.]
nmachhwso[.]
nmaclhwo[.]
nmaclhwso[.]
mtspr
mcrxr
Table 2-28. XER[CA] Updating Instructions
Integer Arithmetic
Integer Shift
Processor Control
Add
Subtract
Shift Right Algebraic
Register Management
addc[o][.]
adde[o][.]
addic[.]
addme[o][.]
addze[o][.]
subfc[o][.]
subfe[o][.]
subfic
subfme[o][.]
subfze[o][.]
sraw[.]
srawi[.]
mtspr
mcrxr
2.6.2.1 Summary Overflow (SO) Field
This field is set to ‘1’ when an instruction is executed that causes XER[OV] to be set to ‘1’, except for the case
of mtspr(XER), which writes XER[SO] with the values in (RS[32]) and writes XER[OV] with the values in
(RS[33]). After it is set, XER[SO] is not reset until either an mtspr(XER) is executed with data that explicitly
writes ‘0’ to XER[SO], or until an mcrxr instruction is executed. The mcrxr instruction sets XER[SO] (and
XER[OV,CA]) to ‘0’ after copying all three fields into CR[CR0[0:2]] (and setting CR[CR0[3]] to ‘0’).
Given this behavior, XER[SO] does not necessarily indicate that an overflow occurred on the most recent
integer arithmetic operation, but rather that one occurred at some time subsequent to the last clearing of
XER[SO] by mtspr(XER) or mcrxr.
XER[SO] is read (with the rest of the XER) into a GPR by mfspr(XER). In addition, various integer instructions copy XER[SO] into CR[CR0[3]] (see Section 2.5.5.3 Condition Register (CR) on page 60).
2.6.2.2 Overflow (OV) Field
This field is updated by certain integer arithmetic instructions to indicate whether the infinitely precise result of
the operation can be represented in 32 bits. For those integer arithmetic instructions that update XER[OV]
and produce signed results, XER[OV] = ‘1’ if the result is greater than 231 – 1 or less than –231; otherwise,
XER[OV] = ‘0’. For those integer arithmetic instructions that update XER[OV] and produce unsigned results
Version 2.2
July 31, 2014
Programming Model
Page 65 of 322
User’s Manual
(certain integer divide instructions and multiply-accumulate auxiliary processor instructions), XER[OV] = ‘1’ if
the result is greater than 232–1; otherwise, XER[OV] = ‘0’. See the instruction descriptions in the Power ISA
for more details on the conditions under which the integer divide instructions set XER[OV] to ‘1’.
The mtspr(XER) and mcrxr instructions also update XER[OV]. Specifically, mcrxr sets XER[OV] (and
XER[SO,CA]) to ‘0’ after copying all three fields into CR[CR0[0:2]] (and setting CR[CR0[3]] to ‘0’).
mtspr(XER) writes XER[OV] with the value in (RS[33]).
XER[OV] is read (along with the rest of the XER) into a GPR by mfspr(XER).
2.6.2.3 Carry (CA) Field
This field is updated by certain integer arithmetic instructions (the carrying and extended versions of add and
subtract) to indicate whether there is a carry-out of the most significant bit of the 32-bit result. XER[CA] = ‘1’
indicates a carry. The integer shift right algebraic instructions update XER[CA] to indicate whether any 1-bits
were shifted out of the least significant bit of the result, if the source operand was negative.
The mtspr(XER) and mcrxr instructions also update XER[CA]. Specifically, mcrxr sets XER[CA] (and
XER[SO,OV]) to ‘0’ after copying all three fields into CR[CR0[0:2]] (and setting CR[CR0[3]] to ‘0’).
mtspr(XER) writes XER[CA] with the value in (RS[34]).
XER[CA] is read (with the rest of the XER) into a GPR by mfspr(XER). In addition, the extended versions of
the add and subtract integer arithmetic instructions use XER[CA] as a source operand for their arithmetic
operations.
2.6.2.4 Transfer Byte Count (TBC) Field
The TBC field is used by the string indexed integer storage access instructions (lswx and stswx) as a byte
count. The TBC field is updated by the dlmzb[.] instruction with a value indicating the number of bytes up to
and including the zero byte detected by the instruction. The TBC field is also written by mtspr(XER) with the
value in (RS[25:31]).
XER[TBC] is read (with the rest of the XER) into a GPR by mfspr(XER).
2.7 Processor Control
The PowerPC 476FP core provides several registers for general processor control and status. It includes the
following registers:
• Machine State Register (MSR)
Controls interrupts and other processor functions.
• Special Purpose Registers General (SPRGs)
SPRs for general purpose software use.
• Processor Version Register (PVR)
Indicates the specific implementation of a processor.
• Processor Identification Register (PIR)
Indicates the specific instance of a processor in a multiprocessor system.
• Core Configuration Register 0 (CCR0)
Controls specific processor functions, such as instruction prefetch.
Programming Model
Page 66 of 322
Version 2.2
July 31, 2014
User’s Manual
CCR1 can cause all possible parity error exceptions to verify correct machine check exception handler
operation. Other CCR1 bits can force a full-line data cache flush and select a processor timer clock input
other than CPUCLOCK.
CCR2 defines additional cache parameters.
• Reset Configuration (RSTCFG)
Reports the values of certain fields of the TLB as supplied at reset.
• Device Control Register Immediate Prefix Register (DCRIPR)
The DCRIPR provides the upper order 22 bits of the DCR address to be used by the mtdcr and mfdcr.
This SPR has hex address x‘37B’, can be read and written, and is privileged.
Except for the MSR, each of these registers is described in more detail in the following sections. The MSR is
described in more detail in Section 7 Processor Interrupts and Exceptions on page 167.
2.7.1 Special Purpose Registers General (USPRG0, SPRG0 - SPRG8)
USPRG0 and SPRG0 - SPRG8 are provided for general purpose, system-dependent software use. One
common system use of these registers is as temporary storage locations. For example, a routine might save
the contents of a GPR to an SPRG and later restore the GPR from it. This is faster than a save/restore to a
memory location. These registers are written by using mtspr and read by using mfspr.
Access to USPRG0 is nonprivileged for both read and write.
Access to SPRG4 - SPRG7 is nonprivileged for read but privileged for read/write, and hence, different SPR
numbers are used for reading than for writing. See Table 2-5 on page 43 for their accesses.
Access to SPRG0 - SPRG3 is privileged for both read and write; access to SPRG8 is privileged for both read
and write.
General data
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:63
General data
Version 2.2
July 31, 2014
Description
Software value; hardware does not use the value.
Programming Model
Page 67 of 322
User’s Manual
2.7.2 Processor Version Register (PVR)
The PVR is a read-only register typically used to identify a specific processor core and chip implementation.
Software can read the PVR to determine processor core and chip hardware features. The PVR can be read
into a GPR by using mfspr.
Access to the PVR is privileged.
OWN
PVN
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
Description
32:43
OWN
Owner identifier.
Identifies the owner of a core. This implementation-specific value (after reset and otherwise) is
specified by core input signals.
44:63
PVN
Processor version number.
This implementation-specific value identifies the specific version and use of a processor core within
a chip. This value (after reset and otherwise) is specified by core input signals.
2.7.3 Processor Identification Register (PIR)
The PIR is a read-only register that uniquely identifies a specific instance of a processor core, within a multiprocessor configuration, enabling software to determine exactly which processor it is running on. This capability is important for operating system software within multiprocessor configurations. The PIR can be read
into a GPR by using mfspr.
Access to the PIR is privileged.
Reserved
PIN
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:59
Reserved
60:63
PIN
Programming Model
Page 68 of 322
Description
Processor identification number (PIN).
Version 2.2
July 31, 2014
User’s Manual
2.7.4 Core Configuration Register 0 (CCR0)
The CCR0 controls a number of special chip functions, including data cache and auxiliary processor operation, speculative instruction fetching, trace, and the operation of the cache block touch instructions. The
CCR0 is written from a GPR by using mtspr, and can be read into a GPR by using mfspr. A cross reference
after the bit-field description indicates the section of this document that describes each field in more detail.
2
Bits
3
4
5
6
7
8
9
Reserved
IQWPM
DQWPM[0:1]
Reserved
DBTAC
Reserved
Reserved
FLSTA
ICWRIDX[0:3]
DTB
Reserved
DAPUIB
ICS
1
CRPE
PRE
0
Reserved
ITE
Access to the CCR0 is privileged.
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Field Name
Description
ITE
Internal trace enable.
0
Disable internal trace.
1
Enable internal trace.
The debugger tool or debug software must turn this bit on to enable instruction trace. If the user
software clears ITE, any currently running debugger trace operation is terminated.
1
PRE
Parity recoverability enable.
0
Semirecoverable parity mode enabled for data cache.
1
Fully recoverable parity mode enabled for data cache.
Must be set to ‘1’ to guarantee full recoverability from memory management unit (MMU) and datacache parity errors.
2:3
Reserved
0
4
CRPE
5:9
Reserved
10
ICS
11
DAPUIB
12:15
ICWRIDX[0:3]
16
Version 2.2
July 31, 2014
DTB
Cache read parity enable.
0
Disable parity information reads.
1
Enable parity information reads.
When enabled, execution of the following instructions loads parity information into the associated
register:
Instruction
Register
icread
ICDBTRH, ICDBTRL, ICDBDR1
dcread
DCDBTRH, DCDBTRL
tlbre
GPR (see tlbre operation)
icbi request size.
0
32-byte icbi request.
1
128-byte icbi request.
Disable APU instruction broadcast.
0
Enabled.
1
Disabled. Instructions are not broadcast to the APU for decoding.
Instruction cache write index (for JTAG).
Specifies the index value to write to the instruction cache.
Disable trace broadcast.
0
Enabled.
1
Disabled; no trace information is broadcast.
This mechanism is provided as a means of reducing power consumption when instruction tracing is
not needed. See Initialization on page 243.
Programming Model
Page 69 of 322
User’s Manual
Bits
Field Name
17:22
Reserved
Description
Force load/store alignment.
0
No alignment exception occurs on integer storage access instructions, regardless of
alignment.
1
An alignment exception occurs on integer storage access instructions if the data address is
not on an operand boundary.
23
FLSTA
24
Reserved
25
DBTAC
26:27
Reserved
28:29
DQWPM[0:1]
Data cache quadword prediction mode.
00
No prediction, cause a hold.
01
Use EA[19].
10
Use last value for quadword EA[19].
11
Use NOT EA[19].
30
IQWPM
Instruction cache quadword prediction mode.
0
Use last value for quadword EA[19].
1
Use EA[19].
31
Reserved
Disable the branch target address CAM (BTAC).
0
Use the BTAC in the branch prediction unit.
1
Disable the BTAC in the branch prediction unit.
DCDPEI
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
DPC
ICTPEI
5
TSS
ICTPEI
Reserved
ICLPEI
Reserved
ICLPEI
MMUDPEI
ICDPEI
4
DCTPEI
ICDPEI
3
DCTPEI
FPRPEI
2
DCLPEI
FPRPEI
1
DCLPEI
GPRPEI
0
DCDPEI
GPRPEI
MMUTPEI
Bits 0:17 of CCR1 can cause all possible parity error exceptions to verify correct machine check exception
handler operation. Other CCR1 bits can force a full-line data cache flush, or select a processor timer clock
input other than CPUClock. The CCR1 is written from a GPR by using mtspr, and can be read into a GPR by
using mfspr. Access to the CCR1 is privileged.
TCS
Reserved
Bits
Field Name
0:1
GPRPEI
GPR parity error insert.
GPRPEI[0]: Records parity in the I-pipe of the GPR file if set.
GPRPEI[1]: Records parity in the L-pipe of the GPR file if set.
2:3
FPRPEI
Floating-Point Register (FPR) parity error insert.
FPRPEI[2]: Records parity in the first FPR if set.
FPRPEI[3]: Records parity in the second FPR if set.
4:5
ICDPEI
Instruction cache data parity error insert.
0
Record odd parity (normal).
1
Record even parity (simulate parity errors).
Controls inversion of parity bits that are recorded when the instruction cache is filled.
ICDPEI[4]: Records parity in the left array.
ICDPEI[5]: Records parity in the right array.
Programming Model
Page 70 of 322
Description
Version 2.2
July 31, 2014
User’s Manual
Bits
Field Name
6:7
ICLPEI
Instruction cache LRU parity error insert.
0
1
Record even parity (simulate parity error).
ICLPEI[6]: Records in the left array.
ICLPEI[7]: Records in the right array.
8:9
ICTPEI
Instruction cache tag parity error insert.
0
1
Controls inversion of parity bits that are recorded for the data field in the instruction cache.
ICTPEI[8]: Records parity in the left array.
ICTPEI[9]: Records parity in the right array.
10:11
DCDPEI
Data cache data parity error insert.
0
1
Controls inversion of parity bits recorded for the data field in the data cache.
DCDPEI[10]: Records parity in the even array.
DCDPEI[11]: Records data parity in the odd array.
12:13
DCLPEI
Data cache LRU parity error insert (even array).
0
1
Controls inversion of parity bits recorded for the LRU field in the data cache.
DCLPEI[12]: Records data cache LRU parity in the even array.
DCLPEI[13]: Records data cache LRU parity in the odd array.
14:15
DCTPEI
Data cache Tag parity error insert (even array).
0
1
Controls inversion of parity bits recorded for the Tag field in the data cache.
DCTPEI[14]: Records parity in the even array
DCTPEI[15]: Records parity in the odd array
16
MMUTPEI
Memory management unit tag parity error insert.
0
1
Controls inversion of parity bits recorded for the tag field in the MMU.
17
MMUDPEI
Memory management unit data parity error insert.
0
1
Controls inversion of parity bits recorded for the tag field in the MMU.
18
Reserved
19
TSS
20
Reserved
21
DPC
Version 2.2
July 31, 2014
Description
Timer clock source select
0
CPU timer source is the CPU clock.
1
CPU timer source is an alternate timer clock.
Disable parity checking (at reset, this bit is set to ‘1’).
0
Parity checking is enabled in the L1 cache core.
1
Disable all parity checking in the L1 cache core.
Programming Model
Page 71 of 322
User’s Manual
Bits
Field Name
22:23
TCS
24:31
Reserved
Programming Model
Page 72 of 322
Description
Timer clock select, watchdog timer select.
00
The CPU timer advances by one at each rising edge of the CPU input clock.
01
The CPU timer advances by every fourth rising edge of the CPU input clock.
10
The CPU timer advances by every eighth rising edge of the CPU input clock.
11
The CPU timer advances by every sixteenth rising edge of the CPU input clock.
Version 2.2
July 31, 2014
User’s Manual
5
6
7
8
9
MCDTO
Reserved
SPC5C1
STGCTR
DISTG
4
DCSTGW
3
Reserved
Reserved
2
PMUD
DSTI
1
Reserved
0
DLFPD
DSTG
Reserved
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bits
Field Name
Description
0:1
DSTG
Disable store gathering.
00
When enabled, stores to all bytes within an L1 halfline can be gathered into a single transfer.
01
Only contiguous, overlapping store gathering is permitted. Noncontiguous store gathering
is disabled.
10
Reserved
11
All store gathering is disabled.
2
DLFPD
Data cache line fill prediction disable.
0
Line fill match prediction is enabled.
1
Line fill match prediction is disabled.
3
Reserved
4
DSTI
5:8
Reserved
9
PMUD
10
Reserved
11
DCSTGW
Disable Cacheable Store Gathering Write-Through
0
Cacheable stores with W = 1 can gather, but must be contiguous.
1
Cacheable stores with W = 1 cannot gather.
12:15
STGCTR
Store Gathering Counter
This field describes how long a store request remains in the SBQ before a write request is sent. This
counter is initialized to STGCTR × 2 whenever SBQ0 is loaded for a store. It decrements by one
each cycle until it reaches zero or is initialized again. When it reaches zero, it forces the store in
SBQ0 to be transmitted. It is gathered only by gatherable stores.
16
DISTG
Disable Cache Inhibited Store Gathering
0
Inhibited stores can gather if they are on the same half of L1 cache line, and the cache line
is contiguous and not guarded.
1
Inhibited stores do not gather.
17:19
Reserved
20
SPC5C1
ICU ‘AT’ Field Static Branch Predict on Code C5 and C1
0
No ‘AT’ field static branch predict.
1
Use ‘AT’ field static branch predict.
21
MCDTO
Machine Check on DCR Timeout Enable
0
No DCR timeout
1
DCR timeout machine check enabled.
22:31
Reserved
Version 2.2
July 31, 2014
Disable shadow TLB invalidate.
0
When context synchronization occurs, invalidate shadow TLBs (ITLB, DTLB).
1
If set, do not invalidate shadow TLBs upon isync context synchronization.
Performance Monitor Unit Disable
0
Enable PMU counting
1
Disable PMU counting of various events
Programming Model
Page 73 of 322
User’s Manual
2.7.7 Reset Configuration (RSTCFG)
Reserved
The read-only RSTCFG Register reports the values of certain fields of TLB as supplied at reset. Access to
RSTCFG is privileged.
0
ERPN
1
2
3
4
5
6
Bits
Field Name
0:1
Reserved
2:11
ERPN
12:16
Reserved
17
E
18:27
Reserved
28:31
U
7
Reserved
8
9
E
Reserved
U
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Description
Extended real page number read only.
Set to the value strapped by core inputs (chip implementation-specific configuration values).
Endian read only.
Set to the value strapped by core input (chip implementation-specific configuration values).
U0 - U3 read only.
Set to the value strapped by core inputs (chip implementation-specific configuration values).
2.7.8 Device Control Register Immediate Prefix Register (DCRIPR)
The Device Control Register Immediate Prefix Register (DCRIPR) provides the upper order 22 bits of DCR
address to be used by the mtdcr and mfdcr instructions. This SPR has hexadecimal address x‘37B’, can be
read and written, and is privileged. It is implementation dependent; that is, it is not part of the Book E-III Architecture specification.
To support the mtdcr[u]x and mfdcr[u]x instructions, the DCR interface adds 22 output pins for the upper
order address bits and one output pin for the privileged/nonprivileged indicator. Note that privileged signal
indicates which type of opcode caused the DCR operation to be presented on the DCR interface and is not
directly related to the MSR[PR] bit. Privileged (also known as supervisor-mode) code can execute any of the
six DCR opcodes, and hence can produce DCR operations on the interface with either value indicated on the
privileged signal. Nonprivileged (user-mode) code only generates DCR traffic with a nonprivileged indication
on the interface. If user-mode code attempts to execute a privileged opcode, an exception is signaled due to
the privilege violation.
Access to the DCRIPR is privileged.
UOA
0
1
2
3
4
5
6
Bits
Field Name
0:21
UOA
22:31
Reserved
Programming Model
Page 74 of 322
7
8
9
Reserved
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Description
Upper order address.
Implementation-specific. Used for the upper order address bits of DCR address.
Version 2.2
July 31, 2014
User’s Manual
2.8 User and Supervisor Modes
Power ISA defines two operating states or modes: supervisor (privileged), and user (nonprivileged). Which
mode the processor is operating in is controlled by MSR[PR]. When MSR[PR] is ‘0’, the processor is in supervisor mode, and can execute all instructions and access all registers, including privileged ones. When
MSR[PR] is ‘1’, the processor is in user mode, and can only execute nonprivileged instructions and access
nonprivileged registers. An attempt to execute a privileged instruction or to access a privileged register while
in user mode causes a privileged instruction exception type program interrupt to occur.
Note that the name PR for the MSR field refers to an historical alternative name for user mode, which is
problem state. Hence, the value ‘1’ in the field indicates problem state and not privileged as one might expect.
2.8.1 Privileged Instructions
The following instructions are privileged and cannot be executed in user mode:
Table 2-29. Privileged Instructions
Instruction
Comments
dcbi
dci
dcread
ici
icread
mfdcr
mfmsr
mfspr
For any SPR number with SPRN[5] = ‘1’. See Section 2.8.2 Privileged SPRs on page 75.
mtdcr
mtmsr
mtspr
For any SPR number with SPRN[5] = ‘1’. See Section 2.8.2 Privileged SPRs on page 75.
rfci
rfi
rfmci
tlbre
tlbsx
tlbsync
tlbwe
wrtee
wrteei
2.8.2 Privileged SPRs
Most SPRs are privileged. The only defined nonprivileged SPRs are the LR, CTR, XER, USPRG0,
SPRG 3 - 7 (read access only), TBU (read access only), and TBL (read access only). The PowerPC 476FP
core also treats all SPR numbers with a ‘1’ in bit 5 of the SPRN field as privileged, whether the particular SPR
Version 2.2
July 31, 2014
Programming Model
Page 75 of 322
User’s Manual
number is defined or not. Therefore, the core causes a privileged instruction exception type program interrupt
on any attempt to access such an SPR number while in user mode. In addition, the core causes an illegal
instruction exception type program interrupt on any attempt to access an undefined SPR number with a ‘0’ in
SPRN[5] while in user mode. However, the result of attempting to access an undefined SPR number in supervisor mode is undefined, regardless of the value in SPRN[5].
2.9 Speculative Accesses
The Power ISA permits implementations to perform speculative accesses to memory, either for instruction
fetching, or for data loads. A speculative access is defined as any access that is not required by the sequential execution model (SEM).
For example, the PowerPC 476FP core speculatively prefetches instructions down the predicted path of a
conditional branch; if the branch is later determined to not go in the predicted direction, the fetching of the
instructions from the predicted path is not required by the SEM and thus is speculative. Similarly, the
PowerPC 476FP core executes load instructions out-of-order and can read data from memory for a load
instruction that is past an undetermined branch.
However, sometimes speculative accesses are inappropriate. For example, attempting to access data at
addresses to which I/O devices are mapped can cause problems. If the I/O device is a serial port, reading it
speculatively can cause data to be lost.
The architecture provides two mechanisms for protecting against errant accesses to such non-well-behaved
memory addresses. The first is the guarded (G) storage attribute, and protects against speculative data
accesses. The second is the execute permission mechanism, and protects against speculative instruction
fetches. Both of these mechanisms are described in Section 4 Memory Management Unit on page 103.
2.10 Synchronization
The PowerPC 476FP core supports the synchronization operations of the d. There are three kinds of
synchronization defined by the architecture, each of which is described in the following sections.
2.10.1 Context Synchronization
The context of a program is the environment in which the program executes. For example, the mode (user or
supervisor) is part of the context, as are the address translation space and storage attributes of the memory
pages being accessed by the program. Context is controlled by the contents of certain registers and other
resources, such as the MSR and the TLB.
Under certain circumstances, it is necessary for the hardware or software to force the synchronization of a
program’s context. Context synchronizing operations include all interrupts except machine check, and the
isync, sc, rfi, rfci, mtmsr, and rfmci instructions. Context synchronizing operations satisfy the following
requirements:
1. The operation is not initiated until all instructions preceding the operation have completed to the point at
which they have reported any and all exceptions that they will cause.
2. All instructions preceding the operation must complete in the context in which they were initiated. That is,
they must not be affected by any context changes caused by the context synchronizing operation, or any
instructions after the context synchronizing operation.
Programming Model
Page 76 of 322
Version 2.2
July 31, 2014
User’s Manual
3. If the operation is the sc instruction (which causes a system call interrupt) or is itself an interrupt, the
operation is not initiated until no higher priority interrupt is pending (see Section 7 Processor Interrupts
and Exceptions on page 167).
4. All instructions that follow the operation must be refetched and executed in the context that is established
by the completion of the context synchronizing operation and all of the instructions that preceded it.
5. If the operation is an mtmsr instruction, the operation is not initiated until all instructions preceding the
operation have completed to the point in which they have reported any exceptions that they will cause.
Then, MSR is updated with the contents of GPR bits 32:63, and the context synchronizing operation is
performed.
Context synchronizing operations do not force the completion of storage accesses, nor do they enforce any
ordering among accesses before or after the context synchronizing operation. If such behavior is required, a
storage synchronizing instruction must be used (see Section 2.10.3 Storage Ordering and Synchronization
on page 78).
Also, architecturally, machine check interrupts are not context synchronizing. Therefore, an instruction that
precedes a context synchronizing operation can cause a machine check interrupt after the context synchronizing operation occurs and additional instructions have completed. For the PowerPC 476FP core, this can
only occur with data machine check exceptions, and not instruction machine check exceptions.
The following scenarios use pseudocode examples to illustrate the effects of context synchronization. Subsequent text explains how software can further guarantee storage ordering.
1. Consider the following self-modifying code instruction sequence:
stw XYZ
isync
Store to caching inhibited address XYZ.
Fetch and execute the instruction at address XYZ.
In this sequence, the isync instruction does not guarantee that the XYZ instruction is fetched after the
store has occurred to memory. There is no guarantee which XYZ instruction will execute; either the old
version or the new (stored) version might.
2. Now consider the required self-modifying code sequence:
stw
dcbst
msync
icbi
isync
Write new instruction to data cache.
Push the new instruction from the data cache to memory.
Order copy before invalidating the old instruction in the instruction cache.
Invalidate the copy in the instruction cache.
Discard prefetched instructions and refetch of new instruction, context switch.
3. This example illustrates the use of isync with context changes to the debug facilities.
mtdbcr0
isync
XYZ
Enable the instruction address compare (IAC) debug event.
Wait for the new Debug Control Register 0 (DBCR0) context to be established.
This instruction is at the IAC address; an isync is necessary to guarantee that the
IAC event is recognized on the execution of this instruction; without the isync,
the XYZ instruction can be prefetched and dispatched to execution before
recognizing that the IAC event has been enabled.
4. The last example is the use of isync to access DCRs with mtdcr or mfdcr instructions based on DCRIPR
register:
mtspr
isync
mtdcr
Version 2.2
July 31, 2014
DCRIPR set up DCRIPR value for DCRn.
Ensures new DCRn by context synchronization.
Access new DCR with new value.
Programming Model
Page 77 of 322
User’s Manual
2.10.2 Execution Synchronization
Execution synchronization is a subset of context synchronization. An execution synchronizing operation satisfies the first two requirements of context synchronizing operations, but not the latter two. That is, execution
synchronizing operations guarantee that preceding instructions execute in the old context, but do not guarantee that subsequent instructions operate in the new context. For example, a scenario requiring execution
synchronization is just before the execution of a TLB-updating instructions (such as tlbwe). An execution
synchronizing instruction should be executed to guarantee that all preceding storage access instructions
have performed their address translations before executing tlbwe to invalidate an entry that might be used by
those preceding instructions.
There are five execution synchronizing instructions: wrtee, wrteei, msync, mbar, and lwsync. All context
synchronizing instruction are also implicitly execution synchronizing, because context synchronization is a
superset of execution synchronization.
The Power ISA imposes additional requirements on updates to MSR[EE] (the external interrupt enable bit).
Specifically, if a wrtee, or wrteei instruction sets MSR[EE] = ‘1’, and an external input, decrementer, or fixedinterval timer exception is pending, the interrupt must be taken before the instruction that follows the
MSR[EE]-updating is executed. In this sense, these MSR[EE]-updating instructions can be thought of as
being context synchronizing with respect to the MSR[EE] bit, in that it guarantees that subsequent instructions execute (or are prevented from executing and an interrupt taken) according to the new context of
MSR[EE].
2.10.3 Storage Ordering and Synchronization
Storage synchronization enforces ordering between storage access instructions executed by the PowerPC
476FP core. There are three storage synchronizing instructions: msync, mbar, and lwsync. The Power ISA
defines different ordering requirements for these three instructions, but the PowerPC 476FP core implements
msync and mbar in an identical fashion. Architecturally, msync is the stronger of the two, and is also execution synchronizing, whereas mbar is intended to be an equivalent of eieio. Thus, users are recommended to
use mbar instead of eieio or a storage barrier operation for future compatibility.
The lwsync instruction is a lighter version of msync. For more information, see the lightweight sync information in the storage control instructions chapter of Book II of Power ISA Version 2.05. However, msync guarantees that all preceding storage accesses have actually been performed with respect to the memory
subsystem execution synchronization, before the execution of any instruction after the msync.
Note: This requirement goes beyond the requirements of mere execution synchronization, in that execution
synchronization does not require the completion of preceding storage accesses.
The following two examples illustrate the distinctive use of mbar versus msync.
stw
lwz
msync
mtdcr
Store data to an I/O device.
Dummy load from the same 32-byte line to ensure that the store takes place before msync.
Wait for store to actually complete.
Reconfigure the I/O device.
In this example, the mtdcr is reconfiguring the I/O device in a manner that would cause the preceding store
instruction to fail, if the mtdcr changed the device before the completion of the store. Because mtdcr is not a
storage access instruction, the use of mbar instead of msync does not guarantee that the store is performed
before letting the mtdcr reconfigure the device. It only guarantees that subsequent storage accesses are not
performed to memory or any device before the earlier store.
Programming Model
Page 78 of 322
Version 2.2
July 31, 2014
User’s Manual
Another example follows:
stb X
mbar
lbz Y
Store data to an I/O device at address X, causing a status bit at address Y to be reset.
Guarantee preceding store is performed to the device before any subsequent storage
accesses are performed.
Load status from the I/O device at address Y.
Here, mbar is appropriate instead of msync because all that is required is that the store to the I/O device
happens before the load does. Other instructions subsequent to the mbar are not executed before the store.
2.10.4 SPRs Requiring Context Synchronization
The following is a list of SPRs that may require context synchronization when written by the mtspr instruction:
•
•
•
•
•
•
•
•
•
•
•
•
•
Process ID (PID)
Debug Control Register 0 - 2 (DBCR0 - DBCR2)
Instruction Address Compare 1 - 4 (IAC1 - IAC4)
Data Cache Address Compare 1 - 2 (DAC1 - DAC2)
Data Cache Value Compare 1 - 2 (DVC1 - DVC2)
MMU Configuration Register (MMUCR)
Real Mode Page Description Register (RMPD)
Supervisor Search Priority Configuration Register (SSPCR)
User Search Priority Configuration Register (USPCR)
Invalidate Search Priority Configuration Register (ISPCR)
Core Configuration Register 0 - 2 (CCR0 - CCR2)
Instruction Opcode Compare Control Register (IOCCR)
Instruction Opcode Compare Register 1 - 2 (IOCR1 - IOCR2)
The following examples demonstrate the effects of the context synchronization.
Example 1:
mtPID
isync
XYZ
Change PID or virtual addressing.
Context switch to wait for and ensure the new PID to use next.
XYZ instruction is based on the new PID.
Example 2:
mtIAC
mtIAC2
mtIAC3
mtIAC4
mtDAC1
mtDAC2
mtDVC1
mtDVC2
mtDBCR1
mtDBCR2
mtDBCR0
isync
XYZ
Version 2.2
July 31, 2014
IAC1 setup.
DA1C setup.
DAV setup.
IAC debug control setup.
DAC, DVC debug control setup.
Enable debug events.
Ensure all debug controls context are established.
All debug events set up in the previous code are now in effect.
Programming Model
Page 79 of 322
User’s Manual
Example 3:
mtCCR0
mtCCR1
mtCCR2
isync
XYZ
CCR0 change.
Ensure the new configuration context is established.
New configuration is in effect.
Example 4:
mtMMUCR
mtRMPD
mtSSPCR
mtUSPCR
mtISPCR
isync
XYZ
MMU configuration register is updated.
Ensure the new MMU configuration context is established.
The new MMU environment is in effect.
Example 5:
mtIOCR1
mtIOCR2
mtIOCCR
isync
XYZ
IOCR1 update.
IOCR2 update.
IOCCR update.
Ensure the new instruction trap control is established.
The new instruction trap is in effect.
2.10.5 Instructions Requiring a Context Synchronization Instruction
The following instructions require a context synchronization instruction (CSI) to ensure the effect on the
subsequent instruction operations:
tlbwe:
• Instruction fetch:
A CSI (isync) is required after a tlbwe is executed.
• Operand (data) access:
A CSI (isync) is required before and after a tlbwe is executed.
The recommended sequence for the tlbwe instruction is as follows:
1. isync
(If operand access is concerned).
2. tlbwe
Write all or necessary words.
3. isync
Ensure the new TLB mapping.
tlbivax:
Note: See Section 4.9 UTLB Coherency on page 130 for more information about tlbivax.
• Instruction fetch:
A CSI (isync) is required after a tlbivax is executed.
Programming Model
Page 80 of 322
Version 2.2
July 31, 2014
User’s Manual
• Operand (data) access:
A CSI (isync) is required before and after a tlbivax is executed.
The recommended sequence for the tlbivax instruction is as follows:
1. isync
(if operand access is concerned)
2. tlbivax
3. tlbivax
(multiple tlbivax instructions if needed)
4. tlbsync
5. msync
6. isync
2.11 Storage Model
The PowerPC 476FP core and PowerPC subsystem support memory coherency in full time, whether or not
page attribute M is being set.
Also, the PowerPC 476FP storage is a weakly-consistent model, and therefore, loads can generally be
accessed out-of-order, except for the following cases:
• lwarx operands are always accessed in order.
• Cache-inhibited operand accesses are performed in order, including the case in which G = I = ‘1’.
• Operand accesses within the coherency granule, which is the L2 cache line, are in order.
See the storage model chapter in Power ISA Version 2.05, Book I for further details about the following
topics:
• Atomicity
• Cache model
• Storage Control Attributes
• Shared Storage
The PowerPC 476FP weakly-consistent model has the following characteristics:
• Allows load misses to bypass or return out of order as long as they are not in the same cache line (the L1
cache line granule).
• Keeps stores in order (though some Power ISA designs can allow out-of-order stores).
• Allows store data to be forwarded to subsequent loads on the same processor.
• Allows the first operand use even if the line is snoop invalidated. Generally the L2 sends the newest data.
Because of this storage model design, there might be an issue in the following data dependency scenario:
CPU
lwz
sth
lwz
0
R3,X
R3,Y
R31,Y
Version 2.2
July 31, 2014
CPU 1
st addrY
msync
st addrX
Programming Model
Page 81 of 322
User’s Manual
In the previous code example, CPU 0 might miss on X but hit on Y and get new data X. However, R31 is
loaded with both new and old data. But if CPU 0 has a cache miss on sth Y (store half word or store byte),
R31is new data Y, and there is no issue.
Consult with IBM PowerPC support for further details and when you have such issues in your applications.
The following examples demonstrate the Power ISA specification and recommendations for store operations
ordering. These examples avoid the data dependency issues described previously.
Note: msync is used to ensure store operations ordering.
Example 1: Ordering Type #1, Operand Boundary Matches
CPU 0
lwz R3,X
sync
stw R3,Y
lwz R31,Y
CPU 1
st addrY
msync
st addrX
..
Example 2: Ordering Type #2, Memory Barrier Use
CPU 0
lwz R3,X
sync
sth R3,Y
lwz R31,Y
CPU 1
st addrY
msync
st addrX
..or
CPU 0
lwz R3,X
sync
lwz R31,Y
CPU 1
st addrY
msync
st addrX
Example 3: Ordering Type #3, Test Flag for Store
CPU 0
lwz R3,X
cmp X (test X)
bne ..
st R31,Z
CPU 1
st addrY
msync
st addrX
In the previous example, CPU 0 st Z is not an issue because Z is updated only if X is updated.
Example 4: Ordering Type #4, Test Flag (See Example 5: Server Practice)
CPU
lwz
cmp
bne
sth
lwz
0
R3,X
X (test X)
loop A
R3,Y
R31,Y
Programming Model
Page 82 of 322
CPU 1
st addrY
msync
st addrx
..
Version 2.2
July 31, 2014
User’s Manual
..
loop A
Example 5: Server Practice
A barrier operation is needed between the following two loads (with or without compare and branch instructions).
loop1
Version 2.2
July 31, 2014
CPU 0
lw X
cmp
bne
loop1
msync/mabar/isync
lw Y
CPU 1
st Y
msync
st X
Programming Model
Page 83 of 322
User’s Manual
Programming Model
Page 84 of 322
Version 2.2
July 31, 2014
User’s Manual
3. Floating-Point Unit Programming Model
The programming model of the PowerPC 476FP core describes how the features and operations appear to
programmers.
The floating-point processor chapter in Book-I of Power ISA Version 2.05 specifies that the floating-point unit
(FPU) implements a floating-point system as defined in ANSI/IEEE Standard 754-1985, IEEE Standard for
Binary Floating-Point Arithmetic (referred to as IEEE 754), but the architecture requires software support to
conform fully with the standard. IEEE 754 defines certain required operations (addition, subtraction, and so
on); the term floating-point operation is used to refer to one of these required operations, or to the operation
performed by one of the multiply-add or reciprocal estimate instructions. In the PowerPC 476FP core, all
floating-point operations conform to the IEEE standard.
3.1 Floating-Point Exceptions
Each floating-point exception, and each category of invalid operation exception, is associated with an exception bit in the FPSCR. The following floating-point exceptions are detected by the processor; the associated
FPSCR fields are listed with each exception and invalid operation exception category:
• Invalid operation exception (VX) (seeTable 3-1)
Table 3-1. Invalid Operation Exception Categories
Category
FPSCR Field
SNaN
VXSNAN
Infinity – Infinity
VXISI
Infinity ÷ Infinity
VXIDI
Zero ÷ Zero
VXZDZ
Infinity × Zero
VXIMZ
Invalid Compare
VXVC
Software Request
VXSOFT
Invalid Square Root
VXSQRT
Invalid Integer Convert
VXCVI
• Zero divide exception (ZX)
• Overflow exception (OX)
• Underflow exception (UX)
• Inexact exception (XI)
Each floating-point exception also has a corresponding enable bit in the FPSCR. See Section 3.4.8 FloatingPoint Status and Control Register Instructions on page 102 for descriptions of these exception and enable
bits.
Version 2.2
July 31, 2014
Floating-Point Unit Programming Model
Page 85 of 322
User’s Manual
3.2 Floating-Point Registers
This section provides an overview of the register types implemented in the PowerPC 476FP core. Detailed
descriptions of the floating-point registers are provided within the sections covering the functions with which
they are associated. An alphabetical summary of all registers, including bit definitions, is provided in
Appendix A Register Summary on page 263.
Certain bits in some registers are reserved and are not necessarily implemented. For all registers with fields
marked as reserved, these reserved fields should be written as ‘0’ and read as undefined. The recommended
coding practice is to perform the initial write to a register with reserved fields set to ‘0’, and to perform all
subsequent writes to the register using a read-modify-write strategy: read the register; use logical instructions
to alter defined fields, leaving reserved fields unmodified; and write the register.
Each register is classified as being of a particular type, as characterized by the specific instructions used to
read and write registers of that type. The registers contained within the PowerPC 476FP processor are
defined by the floating-point processor chapter in Book-I of Power ISA Version 2.05.
3.2.1 Register Types
The PowerPC 476FP processor provides two types of floating-point registers: Floating-Point Registers
(FPRs) and the FPSCR. Each type is characterized by the instructions that are used to read and write the
registers. The following subsections provide an overview of each register type and the instructions that are
associated with them.
3.2.1.1 Floating-Point Registers (FPR0 - FPR31)
The PowerPC 476FP processor provides 32 FPRs, each 64-bits wide. In any cycle, the FPR file can read the
operands for a store instruction and an arithmetic instruction, or write the data from a load instruction and the
result of an arithmetic instruction.
Data
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Data
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
0:63
Data
Description
Floating-point register data.
The FPRs are numbered FPR0 - FPR31. The floating-point instruction formats provide 5-bit fields to specify
the FPRs used as operands in the execution of the associated instructions.
Each FPR contains 64 bits that support the floating-point double format (see the floating-point processor
chapter in Book-I of Power ISA Version 2.05 for details). All instructions that interpret the contents of an FPR
as a floating-point value uses the floating-point double format for this interpretation. Though architecturally
FPRs are 64-bits, the FPRs consist of 66-bit wide encoded data plus an additional 8 bits of parity protection
(74 bits total).
Page 86 of 322
Version 2.2
July 31, 2014
User’s Manual
The computational instructions, and the move and select instructions, operate on data located in FPRs and,
with the exception of the compare instructions, place the result value into a FPR and optionally place status
information into the Condition Register (CR).
Load and store double instructions are provided that transfer 64 bits of data between storage and the FPRs
with no conversion. Load single instructions transfer and convert floating-point values in floating-point single
format from storage to the same value in floating-point double format in the FPRs. Store single instructions
are provided to transfer and convert floating-point values in floating-point double format from the FPRs to the
same value in floating-point single format in storage.
Some floating-point instructions update the FPSCR and CR explicitly. Some of these instructions move data
to and from an FPR to the FPSCR, or from the FPSCR to an FPR.
The computational instructions and the select instruction accept values from the FPRs in double format. For
single-precision arithmetic instructions, all input values must be representable in single format; if not, the
result placed into the target FPR, and the setting of status bits in the FPSCR are undefined.
3.2.1.2 Floating-Point Status and Control Register (FPSCR)
VE OE UE ZE XE
Reserved
VXCVI
VXSQRT
VXSOFT
FL FG FE FU
Reserved
FR FI
FPRF
VXVC
VXIMZ
VXZDZ
VXIDI
VXISI
VX OX UX ZX XX
VXSNAN
FX
FEX
The FPSCR controls the handling of floating-point exceptions and records status resulting from the floatingpoint operations. FPSCR bits 0:31 are reserved.
RN
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
Description
32
FX
Floating-point exception summary.
0
No FPSCR exception bits changed from 0 to 1.
1
At least one FPSCR exception bit changed from 0 to 1.
All floating-point instructions, except mtfsfi and mtfsf, implicitly set this field to 1 if the instruction
causes any floating-point exception bits in the FPSCR to change from 0 to 1. mcrfs, mtfsfi, mtfsf,
mtfsb0, and mtfsb1 can alter this field explicitly.
33
FEX
34
VX
Floating-point invalid operation exception summary.
The OR of all the invalid operation exception fields. The mcrfs, mtfsfi, mtfsf, mtfsb0, and mtfsb1
instructions cannot alter this field explicitly.
35
OX
Floating-point overflow exception.
0
A floating-point overflow exception did not occur.
1
A floating-point overflow exception occurred.
36
UX
Floating-point underflow exception.
0
A floating-point underflow exception did not occur.
1
A floating-point underflow exception occurred.
37
ZX
Floating-point zero divide exception.
0
A floating-point zero divide exception did not occur.
1
A floating-point zero divide exception occurred.
Version 2.2
July 31, 2014
Floating-point enabled exception summary.
The OR of all the floating-point exception fields masked by their respective enable fields. mcrfs,
mtfsfi, mtfsf, mtfsb0, and mtfsb1 cannot alter this field explicitly.
Page 87 of 322
User’s Manual
Bits
Field Name
Description
38
XX
Floating-point inexact exception.
0
A floating-point inexact exception did not occur.
1
A floating-point inexact exception occurred.
This field is a sticky version of FPSCR[FI]. The following rules describe how a given instruction sets
this field:
• If the instruction affects FPSCR[FI], the new value of this field is obtained by ORing the old
value of this field with the new value of FPSCR[FI].
• If the instruction does not affect FPSCR[FI], the value of this field is unchanged.
39
VXSNAN
40
VXISI
Floating-point invalid operation exception (∞ – ∞).
0
A floating-point invalid operation exception (VXISI) did not occur.
1
A floating-point invalid operation exception (VXISI) occurred.
41
VXIDI
Floating-point invalid operation exception (∞ ÷ ∞).
0
A floating-point invalid operation exception (VXIDI) did not occur.
1
A floating-point invalid operation exception (VXIDI) occurred.
42
VXZDZ
Floating-point invalid operation exception (0 ÷ 0).
0
A floating-point invalid operation exception (VXZDZ) did not occur.
1
A floating-point invalid operation exception (VXZDZ) occurred.
43
VXIMZ
Floating-point invalid operation exception (∞ × 0).
0
A floating-point invalid operation exception (VXIMZ) did not occur.
1
A floating-point invalid operation exception (VXIMZ) occurred.
44
VXVC
Floating-point invalid operation exception (invalid compare).
0
A floating-point invalid operation exception (VXVC) did not occur.
1
A floating-point invalid operation exception (VXVC) occurred.
45
FR
Floating-point invalid operation exception (SNaN).
0
A floating-point invalid operation exception (VXSNAN) did not occur.
1
A floating-point invalid operation exception (VXSNAN) occurred.
Floating-point fraction rounded.
The last arithmetic or rounding and conversion instruction either produced an inexact result during
rounding or caused a disabled overflow exception. See Section 3.3.6 Rounding on page 94. This bit
is not sticky.
46
FI
Floating-point fraction inexact.
The last arithmetic or rounding and conversion instruction either produced an inexact result during
rounding or caused a disabled overflow exception. See Section 3.3.6 Rounding. This bit is not
sticky.
See the definition of FPSCR[XX] regarding the relationship between FPSCR[FI] and FPSCR[XX].
Floating-point result flag (FPRF).
47
FPRF
48
FL
Floating-point less than or negative.
49
FG
Floating-point greater than or positive.
50
FE
Floating-point equal to zero.
51
FU
Floating-point unordered or not-a-number (NaN).
52
Reserved
Reserved.
53
VXSOFT
Floating-point invalid operation exception (software request).
0
A floating-point invalid operation exception (software request) did not occur.
1
A floating-point invalid operation exception (software request) occurred.
54
VXSQRT
Floating-point invalid operation exception (invalid square root).
0
A floating-point invalid operation exception (invalid square root) did not occur.
1
A floating-point invalid operation exception (invalid square root) occurred.
Page 88 of 322
Version 2.2
July 31, 2014
User’s Manual
Bits
Field Name
Description
55
VXCVI
56
VE
Floating-point invalid operation exception enabled.
0
Floating-point invalid operation exceptions are disabled.
1
Floating-point invalid operation exceptions are enabled.
57
OE
Floating-point overflow exception enable.
0
Floating-point overflow exceptions are disabled.
1
Floating-point overflow exceptions are enabled.
58
UE
Floating-point underflow exception enable.
0
Floating-point underflow exceptions are disabled.
1
Floating-point underflow exceptions are enabled.
59
ZE
Floating-point zero divide exception enable.
0
Floating-point zero divide exceptions are disabled.
1
Floating-point zero divide exceptions are enabled.
60
XE
Floating-point inexact exception enable.
0
Floating-point inexact exceptions are disabled.
1
Floating-point inexact exceptions are enabled.
61
Reserved
62:63
RN
Floating-point invalid operation exception (invalid integer convert).
0
A floating-point invalid operation exception (invalid integer convert) did not occur.
1
A floating-point invalid operation exception (invalid integer convert) occurred.
Floating-point rounding control.
00
Round to nearest.
01
Round toward zero.
10
Round toward +∞.
11
Round toward –∞.
See Rounding on page 94.
Note: Setting FPSCR[NI] = ‘1’ is intended to permit results to be approximate and to cause performance to
be more predictable and less data-dependent than when FPSCR[NI] = ‘0’. For example, in non-IEEE mode, 0
is returned instead of a denormalized number, and non-IEEE mode may return a large number instead of an
infinity.
The following section describes floating-point data formats, representation of floating-point values, data
handling and precision, and rounding.
3.3 Floating-Point Data Formats
Floating-point values are represented in two binary fixed-length formats. Single-precision values are represented in the 32-bit single format. Double-precision values are represented in the 64-bit double format. The
single format can be used for data in storage, but cannot be stored in the FPRs. The double format can be
used for data in storage and for data in the FPRs. When a floating-point value is loaded from storage using a
load single instruction, it is converted to double format and placed in the target FPR. Conversely, a floatingpoint value stored from an FPR into storage using a store single instruction is converted to single format
before being placed in storage. See the FP load instructions and FP store instructions in the floating-point
processor chapter of Book-I in Power ISA Version 2.05
Values in floating-point format are composed of three fields, as shown in Table 3-2.
Version 2.2
July 31, 2014
Page 89 of 322
User’s Manual
Table 3-2. Format Fields
Field
Description
S
Sign Bit
EXP
Exponent + bias
FRACTION
Fraction
The lengths of the exponent and the fraction fields differ between the single and double formats. See
Table 3-3 for more information.
3.3.1 Value Representation
Representation of numeric values in the floating-point formats consists of a sign bit (S), a biased exponent
(EXP), and the fraction portion (FRACTION) of the significand. The significand consists of a leading implied
bit concatenated on the right with the FRACTION. This leading implied bit is ‘1’ for normalized numbers and
‘0’ for denormalized numbers and is located in the unit bit position (that is, the first bit to the left of the binary
point). Values representable within the two floating-point formats can be specified by the parameters listed in
Table 3-3.
Table 3-3. IEEE 754 Floating-Point Fields
Parameter
Single
Double
Exponent Bias
+127
+1023
Maximum Exponent
+127
+1023
Minimum Exponent
–126
–1022
Sign
1
1
Exponent
8
11
Fraction
23
52
Significand
24
53
Field Widths (Bits)
The FPRs support the floating-point double format only.
The numeric and nonnumeric values representable within each of the two supported formats are approximations to the real numbers and include the normalized numbers, denormalized numbers, and zero values. The
nonnumeric values that are representable are the infinities and the not a numbers (NaNs). The infinities are
adjoined to the real numbers, but are not numbers themselves, and the standard rules of arithmetic do not
hold when they are used in an operation. They are related to the real numbers by order alone. It is possible,
however, to define restricted operations among numbers and infinities. The relative location on the real
number line for each of the defined entities is shown in Figure 3-1.
Figure 3-1. Approximation to Real Numbers
–INF
Page 90 of 322
–NOR
–DEN -0 +0 +DEN
+NOR
+INF
Version 2.2
July 31, 2014
User’s Manual
The NaNs are not related to the numeric values or infinities by order or value, but are encodings used to
convey diagnostic information such as the representation of uninitialized variables.
The different floating-point values defined in the architecture are described in the following sections.
3.3.2 Binary Floating-Point Numbers
Machine-representable values used as approximations to real numbers. Three categories of numbers are
supported: normalized numbers, denormalized numbers, and zero values.
3.3.2.1 Normalized Numbers
Normalized numbers (±NOR) have an unbiased exponent value in the range:
• –126 to 127 in single format
• –1022 to 1023 in double format
They are values in which the implied unit bit is 1. Normalized numbers are interpreted as follows:
NOR = (–1)s × 2E × (1.fraction)
where s is the sign, E is the unbiased exponent, and 1.fraction is the significand, which is composed of a
leading unit bit (implied bit) and a fraction part.
The ranges covered by the magnitude (M) of a normalized floating-point number are approximately equal to:
• Single format:
1.2 × 10–38 ≤ M ≤ 3.4 × 1038
• Double format:
2.2 × 10–308 ≤ M ≤ 1.8 × 10308
3.3.2.2 Denormalized Numbers
Denormalized numbers (±DEN) are values that have a biased exponent value of zero and a nonzero fraction
value. They are nonzero numbers smaller in magnitude than the representable normalized numbers. They
are values in which the implied unit bit is 0. Denormalized numbers are interpreted as follows:
DEN = (–1)s × 2Emin × (0.fraction)
where Emin is the minimum representable exponent value (–126 for single-precision, –1022 for double-precision).
3.3.2.3 Zero Values
Zero values (±0) have a biased exponent value of zero and a fraction value of zero. Zeros can have a positive
or negative sign. The sign of zero is ignored by comparison operations; comparison treats +0 as equal to –0).
3.3.3 Infinities
Infinities (±∞) are values that have the maximum biased exponent value:
Version 2.2
July 31, 2014
Page 91 of 322
User’s Manual
• 255 in single format
• 2047 in double format
and a zero fraction value. They are used to approximate values greater in magnitude than the maximum
normalized value.
Infinity arithmetic is defined as the limiting case of real arithmetic, with restricted operations defined among
numbers and infinities. Infinities and the real numbers can be related by ordering in the affine sense:
–∞ < every finite number < +∞
Arithmetic on infinities is always exact and does not signal any exception, except when an exception occurs
due to the invalid operations.
3.3.3.1 Not a Numbers
Not a numbers (NaNs) are values that have the maximum biased exponent value and a nonzero fraction
value. The sign bit is ignored, that is, NaNs are neither positive nor negative. If the high-order bit of the fraction field is ‘0’, the NaN is a signalling NaN (SNaN); otherwise, it is a quiet NaN (QNaN).
Signaling NaNs are used to signal exceptions when they appear as operands of computational instructions.
Quiet NaNs are used to represent the results of certain invalid operations, such as invalid arithmetic operations on infinities or on NaNs, when invalid operation exception is disabled (FPSCR[VE] = ‘0’). Quiet NaNs
propagate through all floating-point instructions except fcmpo, frsp, and fctiw. Quiet NaNs do not signal
exceptions, except for ordered comparison and conversion to integer operations. Specific encodings in
QNaNs can thus be preserved through a sequence of floating-point operations, and used to convey diagnostic information to help identify results from invalid operations.
When a QNaN is the result of a floating-point operation because one of the operands is a NaN or because a
QNaN was generated due to a disabled invalid operation exception, the following rule is applied to determine
the NaN with the high-order fraction bit set to 1 that is to be stored as the result.
if FPR(FRA) is a NaN
then FPR(FRT) ← FPR(FRA)
else if FPR(FRB) is a NaN
then if instruction is frsp
then FPR(FRT) ← FPR(FRB)[0:34] || 290
else FPR(FRT) ← FPR(FRB)
else if FPR(FRC) is a NaN
then FPR(FRT) ← FPR(FRC)
else if generated QNaN
then FPR(FRT) ← generated QNaN
If the operand specified by FRA is a NaN, that NaN is stored as the result. Otherwise, if the operand specified
by FRB is a NaN (if the instruction specifies an FRB operand), that NaN is stored as the result, with the loworder 29 bits of the result set to ‘0’ if the instruction is frsp. Otherwise, if the operand specified by FRC is a
NaN (if the instruction specifies an FRC operand), that NaN is stored as the result. Otherwise, if a QNaN was
generated due to a disabled invalid operation exception, that QNaN is stored as the result. If a QNaN is to be
generated as a result, the QNaN generated has a sign bit of ‘0’, an exponent field of all ‘1’s, and a high-order
fraction bit of ‘1’ with all other fraction bits 0. Any instruction that generates a QNaN as the result of a disabled
invalid operation must generate this QNaN (that is, x‘7FF8 0000 0000 0000’).
Page 92 of 322
Version 2.2
July 31, 2014
User’s Manual
A double-precision NaN is representable in single format if and only if the low-order 29 bits of the doubleprecision NaNs fraction are zero.
3.3.4 Sign of Result
The following rules govern the sign of the result of an arithmetic, rounding, or conversion operation, when the
operation does not yield an exception. They apply even when the operands or results are zeros or infinities.
• The sign of the result of an add operation is the sign of the operand having the larger absolute value. The
sign of the result of the subtract operation x – y is the same as the sign of the result of the add operation
x + (–y).
When the sum of two operands with opposite sign, or the difference of two operands with the same sign, is
exactly zero, the sign of the result is positive in all rounding modes except round toward -Infinity, in which
mode the sign is negative.
• The sign of the result of a multiply or divide operation is the exclusive OR of the signs of the operands.
• The sign of the result of a frsqrte instruction is always positive, except that the reciprocal square root of
–0 is –Infinity.
• The sign of the result of an frsp[.] or fctiw operation is the sign of the operand being converted.
For the multiply-add instructions, the preceding rules are applied first to the multiply operation and then to the
add or subtract operation (one of the inputs to the add or subtract operation is the result of the multiply operation).
3.3.5 Data Handling and Precision
Instructions are defined to move floating-point data between the FPRs and storage. For double format data,
the data are not altered during the move. For single format data, a format conversion from single to double is
performed when loading from storage into an FPR. A format conversion from double to single is performed
when storing from an FPR to storage. The load/store instructions do not cause floating-point exceptions.
All computational, move, and fsel instructions use the floating-point double format.
Floating-point single-precision values are obtained with the following types of instruction.
• Load floating-point single.
This form of instruction accesses a single-precision operand in single format in storage, converts it to
double format, and loads it into an FPR. No floating-point exceptions are caused by these instructions.
• Round to floating-point single-precision.
The frsp instruction rounds a double-precision operand to single-precision, checking the exponent for
single-precision range and handling any exceptions according to respective enable bits, and places that
operand into an FPR as a double-precision operand. For results produced by single-precision arithmetic
instructions, single-precision loads, and other instances of the frsp instruction, this operation does not
alter the value.
Note: The frsp instruction enables value conversion from double-precision to single-precision with appropriate exception checking and rounding. This instruction should be used to convert double-precision floatingpoint values (produced by double-precision load and arithmetic instructions) to single-precision values before
storing them into single format storage elements or using them as operands for single-precision arithmetic
instructions. Values produced by single-precision load and arithmetic instructions are already single-precision
values and can be stored directly into single format storage elements, or used directly as operands for singleVersion 2.2
July 31, 2014
Page 93 of 322
User’s Manual
precision arithmetic instructions, without preceding the store, or the arithmetic instruction, by an frsp instruction.
• Single-precision arithmetic instructions.
This form of instruction takes operands from the FPRs in double format, performs the operation as if it
produced an intermediate result having infinite precision and unbounded exponent range, and then
coerces this intermediate result to fit in single format. Status bits in the FPSCR are set to reflect the single-precision result. The result is then converted to double format and placed into an FPR. The result lies
in the range supported by the single format.
All input values must be representable in single format. If they are not, the result placed into the target
FPR, and the setting of status bits in the FPSCR, are undefined.
• Store floating-point single.
This form of instruction converts a double-precision operand to single format and stores that operand into
storage. No floating-point exceptions are caused by these instructions. (The value being stored is effectively assumed to be the result of an instruction of one of the preceding three types.)
When the result of a load floating-point single, frsp, or single-precision arithmetic instruction is stored in an
FPR, the low-order 29 fraction bits are zero.
Note: A single-precision value can be used in double-precision arithmetic operations. The reverse is true
only if the double-precision value is representable in single format.
3.3.6 Rounding
Rounding applies to operations that have numeric operands (operands that are not infinities or NaNs).
Rounding the intermediate result of such operations might cause an overflow exception, an underflow exception, or an inexact exception. The following description assumes that the operations cause no exceptions and
that the result is numeric. See Section 3.3.1 Value Representation on page 90 for the cases not covered
here.
The arithmetic and rounding and conversion instructions produce intermediate results that can be regarded
as having infinite precision and unbounded exponent range. Such intermediate results are normalized or
denormalized if required, then rounded to the target format. The final result is then placed into the target FPR
in double format or in integer format, depending on the instruction.
The arithmetic and rounding and conversion instructions, which round intermediate results, set FPSCR[FR,
FI]. If the fraction was incremented during rounding, FPSCR[FR] = ‘1’; otherwise, FPSCR[FR] = ‘0’. If the
rounded result is inexact, FPSCR[FI] = ‘1’; otherwise, FPSCR[FI] = ‘0’.
The estimate instructions set FPSCR[FR, FI] to undefined values. The remaining floating-point instructions do
not alter FPSCR[FR, FI].
FPSCR[RN] specifies one of four programmable rounding modes.
Let z be the intermediate arithmetic result or the operand of a convert operation. If z can be represented
exactly in the target format, then the result in all rounding modes is z as represented in the target format. If z
cannot be represented exactly in the target format, let z1 and z2 bound z as the next larger and next smaller
numbers representable in the target format. Then, z1 or z2 can be used to approximate the result in the target
format.
Figure 3-2 shows the relation of z, z1, and z2 in this case. The following rules specify the rounding in the four
modes. LSb means least-significant bit.
Page 94 of 322
Version 2.2
July 31, 2014
User’s Manual
Figure 3-2. Selection of z1 and z2
By Incrementing LSb of z
Infinitely Precise Value
By Truncating after LSb
z2
z z1
z2
0
Negative values
z z1
Positive values
Table 3-4 describes the rounding modes.
Table 3-4. Rounding Modes
FPSCR[RN]
Rounding Mode
Description
00
Round to nearest.
Choose the value that is closest to z, either z1 or z2. In case of a tie, choose the one
that is even (the LSb is 0).
01
Round toward zero.
10
Round toward +infinity.
Choose z1.
11
Round toward –infinity.
Choose z2.
Choose the smaller in magnitude (z1 or z2).
3.4 Floating-Point Instructions
Primary opcode 63 is used for the double-precision arithmetic instructions and miscellaneous instructions,
such as the floating-point status and control register manipulation instructions. Primary opcode 59 is used for
the single-precision arithmetic instructions.
The single-precision instructions for which there is a corresponding double-precision instruction have the
same format and extended opcode as the corresponding double-precision instruction.
Instructions are provided to perform arithmetic, rounding, conversion, comparison, and other operations in
floating-point registers; to move floating-point data between storage and these registers; and to manipulate
the FPSCR explicitly.
Version 2.2
July 31, 2014
Page 95 of 322
User’s Manual
These instructions are divided into two categories.
• Computational instructions
The computational instructions are those that perform addition, subtraction, multiplication, division,
extracting the square root, rounding, conversion, comparison, and combinations of these operations.
These instructions provide the floating-point operations. They place status information into the FPSCR.
They are the instructions described in Section 3.4.5 Floating-Point Arithmetic Instructions on page 100,
Section 3.4.6 Floating-Point Rounding and Conversion Instructions on page 101, and Section 3.4.7
Floating-Point Compare Instructions on page 101.
• Noncomputational instructions
The noncomputational instructions that perform loads and stores, move the contents of a floating-point
register to another floating-point register possibly altering the sign, manipulate the FPSCR explicitly, and
select a value from one of two floating-point registers based on the value in a third floating-point register.
These operations are not considered floating-point operations. With the exception of the instructions that
manipulate the FPSCR explicitly, they do not alter the FPSCR. Those instructions are described in
Section 3.4.8 Floating-Point Status and Control Register Instructions on page 102.
A floating-point number consists of a signed exponent and a signed significand. The quantity expressed by
this number is the product of the significand and the number 2exponent. Encodings are provided in the data
format to represent finite numeric values, ±infinity, and values that are not a number (NaN). Operations
involving infinities produce results following traditional mathematical conventions. NaNs have no mathematical interpretation, but their encoding supports a variable diagnostic information field. NaNs may be used to
indicate such things as uninitialized variables, and can be produced by certain invalid operations.
One class of exceptions that occur during floating-point instruction execution is unique to floating-point operations: the floating-point exception. Bits set in the FPSCR indicate floating-point exceptions. They can cause
an enabled exception type program interrupt to be taken, precisely or imprecisely, if the proper control bits
are set.
3.4.1 Instructions By Category
The floating-point instructions can be classified into computational and noncomputational categories. The
computational instructions include those that perform arithmetic operations or conversions on operands.
Noncomputational instructions perform loads/stores and moves (with possible sign changes), or select data.
Additionally, some noncomputational instructions can write directly to the FPSCR. All instructions executed in
the load/store pipeline are noncomputational, while most executed in the arithmetic pipe are computational.
All floating-point operands are stored internally in double-precision format. Arithmetic operations specified as
single, require that the internal data is representable as single (that is, having an unbiased exponent between
-126 and 127 and a significand accurately representable in 24 bits). If the data cannot be represented in this
way, the results stored in FPR, and the status bits set in FPSCR and CR (as appropriate), are undefined.
For consistency, to reduce the likelihood of causing a serious malfunction resulting from user error, and to
enable random testing, single-precision operations are performed on double-precision operands. For all
cases except for fdivs, the operation is performed as if it were double-precision; the result is then rounded to
single-precision. For fdivs, the appropriate number of iterations are performed to accomplish a single-precision result (potentially with early out); the quotient is then properly rounded.
Page 96 of 322
Version 2.2
July 31, 2014
User’s Manual
In all cases, result exceptions (overflow, underflow, and inexact) are detected and reported based on the
result, not on the source operands. Default (masked exception) results are the same as for the single-precision instructions. In the case of masked overflow or underflow exceptions, the least significant 11 bits of the
adjusted true exponent are returned.
The results of all single-precision operations are rounded to single-precision. These results are stored in
double-precision format, but are restricted to single-precision range (exponent and fraction). All status bits are
set based upon the single-precision result.
3.4.2 Load and Store Instructions
The PowerPC 476FP processor instruction set includes instructions to load from memory to an FPR, and to
store from an FPR to memory.
Data received from PowerPC 476FP core can be single or double-precision, and in the big or little-endian
formats. Also, the data received is word aligned. Data to the FPR must be in the big-endian, double-precision
format.
There are two basic forms of load instruction: single-precision and double-precision. Because the FPRs
support only floating-point double format, single-precision load floating-point instructions convert single-precision data to double format before loading the operand into the target FPR. The conversion and loading steps
are as follows.
Let WORD[0:31] be the floating-point single-precision operand accessed from storage.
Normalized Operand
if WORD[1:8] > 0 and WORD[1:8] < 255 then
FPR(FRT)[0:1] ← WORD[0:1]
FPR(FRT)[2] ← ¬WORD[1]
FPR(FRT)[5:63] ← WORD[2:31] || 290
Denormalized Operand
if WORD[1:8] = 0 and WORD[9:31] ≠ 0 then
sign ← WORD[0]
exp ← -126
frac[0:52] ← 0b0 || WORD[9:31] || 290
normalize the operand
do while frac[0] = 0
frac ← frac[1:52] || 0b0
exp ← exp - 1
FPR(FRT)[0] ← sign
FPR(FRT)[1:11] ← exp + 1023
FPR(FRT)[12:63] ← frac[1:52]
Zero / Infinity / NaN
if WORD[1:8] = 255 or WORD[1:31] = 0 then
FPR(FRT)[0:1] ← WORD[0:1]
FPR(FRT)[2] ← WORD[1]
FPR(FRT)[5:63] ← WORD[2:31] || 290
Version 2.2
July 31, 2014
Page 97 of 322
User’s Manual
For double-precision load floating-point instructions no conversion is required because the data from storage
are copied directly into the FPR.
Some of the floating-point load instructions update GPR(RA) with the effective address. For these forms, if
RA ≠ 0, the effective address is placed into GPR(RA) and the storage element (byte, halfword, word, or
doubleword) addressed by EA is loaded into FPR(RT). If RA = 0, the instruction form is invalid.
Floating-point load storage accesses cause data storage exceptions if the program is not allowed to read the
storage location. Floating-point load storage accesses cause data TLB error exceptions if the program
attempts to access storage that is unavailable.
Note: RA and RB denote GPRs, while FRT denotes an FPR.
Both big-endian and little-endian byte orderings are supported.
Table 3-5. Floating-Point Load Instructions
Mnemonic
Operands
Instruction
lfd
FRT, D(RA)
Load floating-point double.
lfdu
FRT, D(RA)
Load floating-point double with update.
lfdux
FRT, RA, RB
Load floating-point double with update indexed.
lfdx
FRT, RA, RB
Load floating-point double indexed.
lfs
FRT, D(RA)
Load floating-point single.
lfsu
FRT, D(RA)
Load floating-point single with update.
lfsux
FRT, RA, RB
Load floating-point single with update indexed.
lfsx
FRT, RA, RB
Load floating-point single indexed.
lfiwax
FRT, RA, RB
Load floating-point as integer word algebraic indexed
3.4.3 Floating-Point Store Instructions
There are three basic forms of store instruction: single-precision, double-precision, and integer. The integer
form is provided by the stfiwx instruction. Because the FPRs support only floating-point double format for
floating-point data, single-precision store floating-point instructions convert double-precision data to single
format before storing the operand in storage. The conversion steps are as follows.
Let WORD[0:31] be the word in storage written to.
No Denormalization Required (includes Zero / Infinity / NaN)
if FPR(FRS)[1:11] > 896 or FPR(FRS)[1:63] = 0 then
WORD[0:1] ← FPR(FRS)[0:1]
WORD[2:31] ← FPR(FRS)[5:34]
Denormalization Required
if 874 ≤ FRS[1:11] ≤ 896 then
sign ← FPR(FRS)[0]
exp ← FPR(FRS)[1:11] – 1023
frac ← 0b1 || FPR(FRS)[12:63]
denormalize operand
do while exp < –126
frac ← 0b0 || frac[0:62]
exp ← exp + 1
Page 98 of 322
Version 2.2
July 31, 2014
User’s Manual
WORD[0] ← sign
WORD[1:8] ← 0x00
WORD[9:31] ← frac[1:23]
else WORD ← undefined
Notice that if the value to be stored by a single-precision store floating-point instruction is larger in magnitude
than the maximum number representable in single format, the first case (no denormalization required)
applies. The result stored in WORD is then a well-defined value, but is not numerically equal to the value in
the source register. The result of a single-precision load floating-point from WORD will not compare equal to
the contents of the original source register.
For double-precision store floating-point instructions and for the store floating-point as integer word instruction, no conversion is required because the data from the FPR are copied directly into storage.
Some of the floating-point store instructions update GPR(RA) with the effective address. For these forms, if
RA ≠ 0, the effective address is placed into GPR(RA).
Floating-point store storage accesses cause a data storage interrupt if the program is not allowed to write to
the storage location. Integer store storage accesses cause a data TLB error interrupt if the program attempts
to access storage that is unavailable.
Note: RA and RB denote GPRs, and FRS denotes an FPR.
Both big-endian and little-endian byte orderings are supported.
Table 3-6. Floating-Point Store Instructions
Mnemonic
Operands
Instruction
stfd
FRS, D(RA)
Store floating-point double.
stfdu
FRS, D(RA)
Store floating-point double with update.
stfdux
FRS, RA, RB
Store floating-point double with update indexed.
stfdx
FRS, RA, RB
Store floating-point double indexed.
stfiwx
FRS, RA, RB
Store floating-point as integer word indexed.
stfs
FRS, D(RA)
Store floating-point single.
stfsu
FRS, D(RA)
Store floating-point single with update.
stfsux
FRS, RA, RB
Store floating-point single with update indexed.
stfsx
FRS, RA, RB
Store floating-point single indexed.
3.4.4 Floating-Point Move Instructions
These instructions copy data from one floating-point register to another, altering the sign bit (bit 0) as
described in the instruction descriptions in the Power Instruction Set Architecture (ISA) Version 2.05 specification for fneg, fabs, and fnabs. These instructions treat NaNs just like any other kind of value (for example,
the sign bit of an NaN can be altered by fneg, fabs, and fnabs). These instructions do not alter the FSPCR.
Version 2.2
July 31, 2014
Page 99 of 322
User’s Manual
Table 3-7. Floating-Point Move Instructions
Mnemonic
Operands
Instruction
fabs[.]
FRT, FRB
Floating absolute value.
fmr[.]
FRT, FRB
Floating move register.
fnabs[.]
FRT, FRB
Floating negative absolute value.
fneg[.]
FRT, FRB
Floating negate.
3.4.5 Floating-Point Arithmetic Instructions
These instructions perform elementary arithmetic operations.
Table 3-8. Floating-Point Elementary Arithmetic Instructions
Mnemonic
Operands
Instruction
fadd[.]
FRT, FRA, FRB
Floating add.
fadds[.]
FRT, FRA, FRB
Floating add single.
fcfid[.]
FRT, FRB
Floating convert from integer doubleword
fcpsgn[.]
FRT, FRB
Floating copy sign.
fctid[.]
FRT, FRB
Floating convert to integer doubleword.
fctiw[.]
FRT, FRB
Floating convert to integer word.
fctiwz[.]
FRT, FRB
Floating convert to integer word with round toward zero
fdiv[.]
FRT, FRA, FRB
Floating divide.
fdivs[.]
FRT, FRA, FRB
Floating divide single.
fmul[.]
FRT, FRA, FRB
Floating multiply.
fmuls[.]
FRT, FRA, FRB
Floating multiply single.
fre[.]
FRT, FRB
Float reciprocal estimate.
fres[.]
FRT, FRB
Floating reciprocal estimate single.
frsqrte[.]
FRT, FRB
Floating reciprocal square root estimate.
frsqrtes[.]
FRT, FRB
Float reciprocal square root estimate single.
fsqrt[.]
FRT, FRB
Float square root.
fsqrts[.]
FRT, FRB
Float square root single.
fsub[.]
FRT, FRA, FRB
Floating subtract.
fsubs[.]
FRT, FRA, FRB
Floating subtract single.
3.4.5.1 Floating-Point Multiply-Add Instructions
These instructions combine a multiply and an add operation without an intermediate rounding operation. The
fraction part of the intermediate product is 106 bits wide (L bit, FRACTION), and all 106 bits take part in the
add or subtract portion of the instruction.
Page 100 of 322
Version 2.2
July 31, 2014
User’s Manual
FPSCR bits are set as follows:
• Overflow, underflow, and inexact exception bits, the FR and FI bits, and the FPRF field are set based on
the final result of the operation: not on the result of the multiplication.
• Invalid operation exception bits are set as if the multiplication and the addition were performed using two
separate instructions (fmul[s], followed by fadd[s] or fsub[s]. That is, multiplication of infinity by 0 or of
anything by an SNaN, and addition of an SNaN, cause the corresponding exception bits to be set.
Table 3-9. Floating-Point Multiply-Add Instructions
Mnemonic
Operands
Instruction
fmadd[.]
FRT, FRA, FRB, FRC
Floating multiply-add.
fmadds[.]
FRT, FRA, FRB, FRC
Floating multiply-add single.
fmsub[.]
FRT, FRA, FRB, FRC
Floating multiply-subtract.
fmsubs[.]
FRT, FRA, FRB, FRC
Floating multiply-subtract single.
fnmadd[.]
FRT, FRA, FRB, FRC
Floating negative multiply-add.
fnmadds[.]
FRT, FRA, FRB, FRC
Floating negative multiply-add single.
fnmsub[.]
FRT, FRA, FRB, FRC
Floating negative multiply-subtract.
fnmsubs[.]
FRT, FRA, FRB, FRC
Floating negative multiply-subtract single.
3.4.6 Floating-Point Rounding and Conversion Instructions
The floating-point rounding instructions are shown in Table 3-10.
Table 3-10. Floating-Point Rounding and Conversion Instructions
Mnemonic
Operand
Instruction
frim[.]
FRT, FRB
Floating round to integer minus.
frin[.]
FRT, FRB
Floating round to integer nearest.
frip[.]
FRT, FRB
Floating round to integer plus.
friz[.]
FRT, FRB
Floating round to integer toward zero.
frsp[.]
FRT, FRB
Floating round to single-precision.
3.4.7 Floating-Point Compare Instructions
The floating-point compare instructions compare the contents of two floating-point registers. Comparison
ignores the sign of zero (+0 is treated as equal to –0). The comparison result can be ordered or unordered.
The comparison sets one bit in the designated CR field to ‘1’ and the other three bits to ‘0’. FPSCR[FPCC] is
set in the same way.
The CR field and FPSCR[FPCC] are set as shown in Table 3-11 on page 102.
Version 2.2
July 31, 2014
Page 101 of 322
User’s Manual
Table 3-11. Comparison Sets
Bit
Name
Description
0
FL
(FRA) < (FRB)
1
FG
(FRA) > (FRB)
2
FE
(FRA) = (FRB)
3
FU
(FRA) ? (FRB) (unordered)
Table 3-12. Floating-Point Compare and Select Instructions
Mnemonic
Operands
Instruction
fcmpo
BF, FRA, FRB
Floating compare ordered.
fcmpu
BF, FRA, FRB
Floating compare unordered.
fsel[.]
FRT, FRA, FRB, FRC
Floating select.
3.4.8 Floating-Point Status and Control Register Instructions
Every Floating-Point Status and Control Register instruction synchronizes the effects of all floating-point
instructions executed by a given processor. Executing a Floating-Point Status and Control Register instruction ensures that all floating-point instructions previously initiated by the given processor have completed
before the Floating-Point Status and Control Register instruction is initiated, and that no subsequent floatingpoint instructions are initiated by the given processor until the Floating-Point Status and Control Register
instruction has completed. In particular:
• All exceptions that will be caused by the previously initiated instructions are recorded in the FPSCR
before the Floating-Point Status and Control Register instruction is initiated.
• All invocations of the enabled exception type program interrupt that will be caused by the previously initiated instructions have occurred before the Floating-Point Status and Control Register instruction is initiated.
• No subsequent floating-point instruction that depends on or alters the settings of any FPSCR bits is initiated until the Floating-Point Status and Control Register instruction has completed.
Floating-point load and floating-point store instructions are not affected.
Table 3-13 lists floating-point status and control register instructions.
Table 3-13. Floating-Point Status and Control Register Instructions
Mnemonic
Operands
mcrfs
Instruction
Move to condition register from FPSCR.
mffs[.]
FRT
mtfsb0[.]
BT
Move to FPSCR bit 0.
mtfsb1[.]
BT
mtfsf[.]
FLM, FRB
Move to FPSCR fields.
mtfsfi[.]
BF, U
Page 102 of 322
Move from FPSCR.
Move to FPSCR field immediate.
Version 2.2
July 31, 2014
User’s Manual
4. Memory Management Unit
4.1 Overview
The PowerPC 476FP memory management unit (MMU) provides cache control, access protection, and
address translation. The MMU contains the unified translation lookaside buffer (UTLB), control logic, and
registers that support the UTLB. The MMU interfaces with the execution unit (EU), the instruction cache unit
(ICU), the data cache unit (DCU), and the TLB snoop interface. The EU interface provides the ability to
perform translation lookaside buffer (TLB) operation instructions: tlbre, tlbwe, tlbsx, tlbivax (see Section 4.8
Software Considerations on page 127 for more information about these instructions). The instruction unit (IU)
interface provides the translation space (TS) and DSIZ bits for a lookup request from the DCU, ICU, or TLB
snoop. The ICU interface generates a lookup request to the UTLB on an instruction translation lookaside
buffer (ITLB) miss. Similarly, the DCU interface generates a lookup request to the UTLB on a data translation
lookaside buffer (DTLB) miss.
The MMU is a software managed unit with hardware assistance available for replacing entries. Software is
responsible for writing entries into the UTLB so that they can be read by using the hash function described in
Section 4.3.2 UTLB Index Address Hash on page 106. Freescale-style MMU operation is not supported.
Table 4-1 lists the MMU features of the PowerPC 476FP processor:
Table 4-1. PowerPC 476FP Processor MMU
Function
PowerPC 476FP
UTLB size
1024 entries.
Memory array architecture
SRAM1P.
UTLB associativity
4-way set associative; reads four entries at a time.
Data cache MMU access time
Variable, from 6 to 30 cycles, depending on simultaneous requests and hashes used.
Page sizes support
4 KB, 16 KB, 64 KB, 1 MB, 16 MB, 256 MB, 1 GB.
Page descriptors
WIMGE, U[0:3], IL1I, IL1D.
See Table 4-5 on page 110 for a definition of these fields.
Translation ID (TID) field
16 bits.
Extended real page number (ERPN)
10 bits.
UTLB search mechanism
Search of up to seven hashes, optimized for page size, in an order set in an SPR supervisor or
user register.
4.2 Address Translation
A description of the MMU address translation is shown in Figure 4-1 on page 104. The 49-bit virtual address
(VA) is formed by prepending the 32-bit effective address (EA) with a 1-bit address space (AS) and 16-bit
process ID (PID). Using the AS, PID, and EA, the 10-bit extended real page number (ERPN), and 20-bit real
page number (RPN) are obtained from the UTLB. These are concatenated together with a 12-bit offset to
form the 42-bit real address (RA).
The 4 K page size requires 20 bits of EA (and RPN) while the 1G page size only requires 2 bits of EA (and
RPN), as shown in Figure 4-1 on page 104.
Version 2.2
July 31, 2014
Memory Management Unit
Page 103 of 322
User’s Manual
Figure 4-1. Address Mapping for each Page Size
0
31
19 20
EA
0
1
AS
VA
PID
0
RA
36 37
16 17
9
48
EA
10
n
ERPN
n+1
41
Offset
RPN
Page Size
0
4 KB
9 10
ERPN
RPN
9 10
0
16 KB
9 10
0
ERPN
RPN
9 10
0
ERPN
9 10
0
256 MB
ERPN
0
1 GB
21
RPN
ERPN
16 MB
25
9 10
0
1 MB
27
RPN
ERPN
64 KB
29
17
RPN
AS
Address space from MSR[IS] or MSR[DS]
13
EA
Effective address
ERPN
Extended real page number
PID
Process ID (or Process Identifier)
RA
Real address
RPN
Real page number
VA
Virtual address
RPN
9 10 11
ERPN
RPN
4.3 MMU Implementation
Figure 4-2 MMU Block Diagram on page 105 shows the basic design of the MMU. It is a 4-way, set-associative memory structure with hashed addressing. The MMU performs request arbitration, and consists of a
UTLB tag array, compare logic, and a UTLB data array. The PowerPC 476FP implementation supports a
1024-entry UTLB.
Page 104 of 322
Version 2.2
July 31, 2014
User’s Manual
Figure 4-2. MMU Block Diagram
DCU
Snoop
EU
ICU
Two 256 × 95-bit SRAMs are used for tag
Two 256 × 100-bit SRAMs are used for data
1024-Entry UTLB
UTLB TAG
4-way, 256 sets
(tlbwe, tlbsx, and tlbivax
use hashing function)
W0
W1
W2
UTLB Data
4-way, 256 sets
W3
W0
W1
W2
W3
UTLB
Index Address
Hash
tlbre Index
Compare Logic
MSR[PR]
Supervisor Search
Priority Configuration
Register
User Search
Priority Configuration
Register
Hit
DSIZ
ERPN, * RPN, Attributes, Description
*The ITLB and DTLB maximum page size is 256 MB.
A 1 GB page must be converted into 256 MB granules.
DCU
DSIZ
DTLB
ERPN
EU
ICU
Data cache unit
Decoded Page size
Data shadow translation lookaside buffer
Execution unit
Instruction cache unit
Version 2.2
July 31, 2014
ITLB
MSR[PR]
RPN
SRAM
UTLB
Instruction shadow translation lookaside buffer
Machine State Register, problem state
Real page number
Static random access memory
Unified translation lookaside buffer
Page 105 of 322
User’s Manual
4.3.1 Translation Lookaside Buffer
The unified translation lookaside buffer (UTLB) is the hardware resource that controls translation, protection,
and storage attributes. A single unified 1024-entry, 4-way-set-associative TLB is used for both instruction and
data accesses. In addition, the PowerPC 476FP core implements two separate, smaller shadow TLB arrays,
one for instruction fetch accesses and one for data accesses. These shadow TLBs improve performance by
lowering the latency for address translation, and by reducing contention for the main unified TLB between
instruction fetching and data storage accesses.
Maintenance of TLB entries is under software control. System software determines the TLB entry replacement strategy or hardware assisted replacement, and use of any page table information. A TLB entry
contains all of the information required to identify the page, specify the address translation, control the access
permissions, and designate the storage attributes.
A TLB entry is written by copying information from a GPR and the MMUCR[STID] field, using a series of three
tlbwe instructions. A TLB entry is read by copying the information into a GPR and the MMUCR[STID] field,
using a series of three tlbre instructions. Software can also search for specific TLB entries using the tlbsx[.]
instruction. The PowerPC 476FP core also allows software to invalidate each TLB entry using either tlbivax
or tlbwe instruction.
UTLB access method, look-up operation, and attributes and access control information for ERPN, RPN, and
storage are described in the subsequent sections.
4.3.2 UTLB Index Address Hash
To increase the UTLB use and to provide better distribution for use, a hash function is implemented that
indexes the UTLB arrays.
This exclusive OR (XOR) based hash function is used when an entry is searched by an instruction such as
the TLB search indexed (tlbsx) instructions, or when it is invalidated by local or remote tlbivax operations.
The hash is bypassed on TLB read entry (tlbre) operations, because the tlbre instruction provides both the
way and the index address.
For example, a 4 KB page with a 16-bit PID of x‘00D5’ and a 20-bit effective address (EA) of x‘E9B6C’ must
be placed at an 8-bit UTLB index address of x‘E0’. This is calculated as shown here:
UTLB index address bit 7 = PID[15] XOR EA[19] XOR EA[7].
UTLB index address bit 3 = PID[11] XOR EA[15] XOR EA[11] XOR EA[3].
Each of the UTLB indexes hold up to four entries, referred to as way 0 - 3.
Page 106 of 322
Version 2.2
July 31, 2014
User’s Manual
Different page size hashes with different, nonoverlapping effective addresses can arrive at the same UTLB
index address. For example, a 4 KB hash used for an EA of x‘000F0’ translates to an index address of x‘F0’.
A 1 MB hash used for a nonoverlapping EA of x‘F0000’ also translates to an index address of x‘F0’. This is
not a problem, because way 0 can be used for the 4 KB page, and way 1 for the 1 MB page. However, software must take this into account when setting up the entries in the UTLB to avoid unintentionally overwriting
entries.
Table 4-2 UTLB Set Address Generation Hashing Function on page 107 shows the PID and EA address bits
that are used.
Table 4-2. UTLB Set Address Generation Hashing Function
PID
UTLB
PID Bits
Index Bit
PID ≠ ‘0’
PID = ‘0’
Effective Address Bits For Each Page Size
4 KB
16 KB
64 KB
1 MB
16 MB
256 MB
1 GB
7
31
19, —, 7
17, —, 7
15, 7
11, —
7
—
—
6
30
18, —, 6
16, —, 6
14, 6
10, —
6
—
—
5
29
17, —, 5
15, —, 5
13, 5
9, —
5
—
—
4
28
16, —, 4
14, —, 4
12, 4
8, —
4
—
—
3
27
15, 11, 3
13, —, 3
11, 3
7, 3
3
3
—
2
26
14, 10, 2
12, —, 2
10, 2
6, 2
2
2
—
1
25
13, 9, 1
11, 9, 1
9, 1
5, 1
1
1
1
0
24
12, 8, 0
10, 8, 0
8, 0
4, 0
0
0
0
7
—
19, —, 7
17, —, 7
15, 7
11, —
7
—
—
6
—
18, —, 6
16, —, 6
14, 6
10, —
6
—
—
5
—
17, —, 5
15, —, 5
13, 5
9, —
5
—
—
4
—
16, —, 4
14, —, 4
12, 4
8, —
4
—
—
3
—
15, 11, 3
13, —, 3
11, 3
7, 3
3
3
—
2
—
14, 10, 2
12, —, 2
10, 2
6, 2
2
2
—
1
—
13, 9, 1
11, 9, 1
9, 1
5, 1
1
1
1
0
—
12, 8, 0
10, 8, 0
8, 0
4, 0
0
0
0
Note: A dash (—) indicates that an EA is not used in the hash calculation for the page size.
4.3.3 Initialize a Single UTLB Entry
Hardware initializes one UTLB entry at reset. This entry is set up to access privileged cache inhibited and
guarded space at the 4 GB location (top of the 4 GB space). This entry corresponds to the reset vector of the
processor at x‘FFFF FFFC’, and has the following characteristics:
Index address
x‘F0’ (4 KB hash equivalent to PID[0:15] = x‘0000’ and EA[0:19] = x‘FFFFF’)
Way 3
EPN
x‘FFFFF’
TS
‘0’
DSIZ
‘000000’ (4 KB page)
RPN
x‘FFFFF’
Version 2.2
July 31, 2014
Page 107 of 322
User’s Manual
WIMG
‘0101’
IL1I, IL1D
‘11’
UX, UW, UR
‘000’
SX, SW, SR
‘101’
ERPN
From chip implementation-specific configuration values.
U0-U3
E
Initializing this single UTLB entry duplicates the function of driving the same values to the ICU and DCU
during reset, initializing the reset vector entry into the ITLB and DTLB. However, the entry in the ITLB and
DTLB can be invalidated by snooping an msync, isync, rfi, or CSI (context switching instruction) before software writes entries into the UTLB. Writing the reset vector entry into the UTLB ensures that an ITLB and
DTLB miss finds a matching entry in this case.
4.3.4 Tag Array
The hashed index address (only the tlbre index address is not hashed) is presented to the tag array. The tag
array contains the information required to determine if a UTLB request from the EU, ICU, DCU, or snoop
interface matches a valid entry. The tag array consists of two SRAM1Ps, each 256 × 94 bits. Each tag entry is
44-bits plus 3 bits of parity. Therefore, one SRAM1P stores tag way 0 and way1, and the other SRAM1P
stores way 2 and way 3. The index address is presented to both SRAM1Ps in the same clock so that a
comparison of all four ways can be performed at the same time in a subsequent clock. The information stored
in the tag array is listed in Table 4-3. Odd parity is stored when CCR1[MMUTPEI] = ‘0’. Even parity is stored
when CCR1[MMUTPEI] = ‘1’.
Table 4-3. UTLB Tag Field Description (Page 1 of 2)
WS
Bits
Name
Size
0
0:19
EPN
20
Effective page number.
The EPN, with the DSIZ, defines the page (both size and starting address)
that this UTLB entry represents. Unused bits in this field due to a larger
than minimum DSIZ are ignored.
0
20
EPNPar
1
The parity bit that covers the EPN.
0
21
Valid
1
If set, the remaining fields describe a valid UTLB entry.
0
22
TS
1
Translation space.
This entry only matches if the translation space bit matches the request bit.
0
23:28
DSIZ
6
This field describes the page size for this entry. Supported page sizes are
4 KB (minimum size), 16 KB, 64 KB, 1 MB, 16 MB, 256 MB, and 1 GB.
Entry
Page Size
‘000000’
4 KB
‘000001’
16 KB
‘000011’
64 KB
‘000111’
1 MB
‘001111’
16 MB
‘011111’
256 MB
‘111111’
1 GB
Note: This decoding is used so that logic can use dedicated bits to conditionally enable comparators on the appropriate address bits.
Page 108 of 322
Description
Version 2.2
July 31, 2014
User’s Manual
Table 4-3. UTLB Tag Field Description (Page 2 of 2)
WS
Bits
Name
Size
Description
0
29
DSIZPar
1
The parity bit that covers the Valid, TS, and DSIZ fields.
0
30:45
TID
16
Translation ID (MMUCR[0:15]).
This field describes the process ID for which this entry is valid. If TID = ‘0’,
it considered a match with the PID. If TID != ‘0’, the TID field must match
the PID for a search to come back as a positive match.
0
46
TIDPar
1
The parity bit that covers the TID.
4.3.5 Comparison
A virtual address to a TLB entry match is found when the valid, TS, EPN, and TID tags have following values:
• Valid == 1.
• TS == the requested address space (AS).
• EPN == the requested EA, where the number of bits compared depends on the DSIZ tag.
• TID != 0.
• TID == the requested process identifier (PID).
Table 4-4 defines the number of EPN and EA that are compared, depending on the page size.
Table 4-4. EPN and EA Comparison
Page Size
DSIZ
Comparison
4 KB
‘000000’
EPN[0:19] == EA[0:19]
16 KB
‘000001’
EPN[0:19] == EA[0:17]
64 KB
‘000011’
EPN[0:19] == EA[0:15]
1 MB
‘000111’
EPN[0:19] == EA[0:11]
16 MB
‘001111’
EPN[0:19] == EA[0:7]
256 MB
‘011111’
EPN[0:19] == EA[0:3]
1 GB
‘111111’
EPN[0:19] == EA[0:1]
The ITLB and DTLB maximum page size is 256 MB. Therefore, a 1 GB page must be converted into 256 MB
granules. This is done automatically by hardware.
4.3.6 Data Array
The data array contains the address translation information, storage control, and permission bits for the entry.
The data array also consists of two SRAM1Ps, each 256 × 100 bits. Each data entry is 47 bits, plus 3 bits of
parity. Therefore, one SRAM1P stores data way 0 and way 1, and the other SRAM1P stores way 2 and way
3. The tag index address is latched and presented to both data SRAM1Ps in the same clock so that data is
available to be latched back to the requesting unit if a comparison of the tag results in a match.
The information stored in the data array is listed in Table 4-5 on page 110. Odd parity is stored when
CCR1[MMUDPEI] = ‘0’. Even parity is stored when CCR1[MMUDPEI] = ‘1’.
Version 2.2
July 31, 2014
Page 109 of 322
User’s Manual
Table 4-5. UTLB Data Field Description
WS
Bits
Name
Size
Description
1
0:19
RPN
20
Real page number.
This field contains the real address bits that replace the EPN of the search
address. The entire RPN is used for 4 KB pages. For larger pages, the
least significant bits of the RPN are unused. However, the unused bits
must be set to 0 or indeterminate results will occur.
1
20
RPNPar
1
The parity bit that covers the RPN.
1
21:30
ERPN
10
Extended real page number.
This field contains the 10 most significant bits of the real address. They are
always used and prepended to the RPN.
1
31
ERPNPar
1
The parity bit that covers the ERPN.
2
32:36
WIMGE
5
This field describes the page type for this entry at the L2 cache level.
W
Write through (L1 is always write-through).
I
Cache inhibited.
M
Coherent page (all L1 pages are coherent.
G
Guarded access.
E
Endianness (this describes L1 page).
2
37
IL1I
1
Inhibit L1 instruction.
If set, this page is treated as cache-inhibited for the L1 cache, regardless
of the I bit used for the L2 attribute.
2
38
IL1D
1
Inhibit L1 data.
If set, this page is treated as cache-inhibited for the L1 cache, regardless
of the I bit used for the L2 attribute.
2
39:42
U
4
User defined bits.
This field can be used at the system level for any purpose.
2
43
UX
1
User execute-permission bit.
2
44
UW
1
User write-permission bit.
2
45
UR
1
User read-permission bit.
2
46
SX
1
Supervisor execute-permission bit.
2
47
SW
1
Supervisor write-permission bit.
2
48
SR
1
Supervisor read-permission bit.
2
49
StorPermPar
1
The parity bit that covers storage attributes and permissions.
4.3.6.1 Hardware Enforced I = 1 = IL1I = IL1D
If software sets I = 1 on a tlbwe instruction (WS = 2), the MMU hardware forces IL1I = IL1D = 1.
4.3.7 Writing UTLB Entries
Software places entries in the UTLB by using the tlbwe instruction. The operating system must understand
the details of the page size hashes to select process IDs and effective addresses to obtain the best utilization
within the UTLB. Each tlbwe instruction provides the page size, PID, and EA. The tlbwe can also specify one
of the four ways or allow the hardware to select the way to be written.
Page 110 of 322
Version 2.2
July 31, 2014
User’s Manual
Because the processor is a 32-bit architecture, and UTLB tag and data entries require 91 bits, a sequence of
three tlbwe instructions (WS = 0, WS = 1, and WS = 2) is required to write a new UTLB entry. On the first
tlbwe, with WS = 0, values are latched in the MMUCR for valid (V), tlb index, and tlb way. These latched tlb
index and way fields are used in the two subsequent tlbwe instructions with WS = 1 and WS = 2 to select the
tlb entry. In addition, the V bit is set to the latch value (LVALID) when WS = 2.
This way, the MMU supports atomic writes of UTLB entries.
Note: WS is a field that designates which word of the TLB entry is to be transferred (that is, WS = 0 specifies
TLB word 0, and so on).
4.3.8 Bolted UTLB Entries
The MMU enables software to specify up to six UTLB bolted entries. These entries are typically used for
pages containing the operating system kernel or interrupt handler. Bolted entries are automatically avoided
by hardware-assisted way selection. Bolted entries are also protected from both local and remote tlbivax
instructions. Bolted entries can be overwritten through software with the way specifically provided by the
tlbwe instruction. The location of a bolted entry within MMU Bolted Entries Registers (MMUBE0 or MMUBE1)
is specified with the tlbwe instruction, in RA[5:7]. When written, the index address of a bolted entry can be
obtained by a mfspr instruction from MMUBE0 or MMUBE1. Only way 0 can support a bolted entry, so software must ensure that bolted entries are not placed at the same UTLB index address.
4.3.9 Hardware Assisted Way Selection
To reduce the burden in software of understanding where entries are placed, and to automatically avoid overwriting bolted entries, the hardware provides assistance by selecting the way that can be written by the next
tlbwe. Each UTLB index address has a corresponding 2-bit counter. A tlbwe, WS = 0 instruction specifies
that the counter value is used for way selection when RA[0] = ‘0’. The counter is incremented each time there
is a tlbwe, WS = 2 when the corresponding tlbwe, WS = 0 had V = 1. The counter is reset to a way when that
way receives a tlbwe, WS = 0, V = 0. This places the next entry written to that index to the way that was just
“vacated’ by the tlbwe, WS = 0, V = 0. Similarly, the counter also resets to a way when that way is invalidated
by a local or snooped tlbivax instruction. The counter automatically skips a bolted entry. For example, if the
counter points to way 3 and way 0 contains a bolted entry, the counter increments to way 1 after the next
tlbwe, WS = 2.
4.3.10 Searching UTLB Entries
UTLB entries are searched by the ICU in response to an instruction-side TLB miss, by the DCU in response
to a data-side TLB miss, and by the EU in response to the tlbsx instruction.
The PowerPC 476FP core implements two shadow TLB arrays: one for instruction fetches and one for data
accesses. These arrays shadow the value of a subset of the entries in the main, UTLB (the UTLB in the
context of this discussion). The purpose of the shadow TLB arrays is to reduce the latency of the address
translation operation and to avoid contention for the UTLB array between instruction fetches and data
accesses.
Both shadow TLBs (ITLB and DTLB) contain eight entries. No latency is associated with accessing the
shadow TLB arrays, and instruction execution continues in a pipelined fashion provided that the requested
address is found in the shadow TLB. If the requested address is not found in the shadow TLB, the instruction
fetch or data storage access is automatically stalled while the address is looked up in the UTLB. If the
Version 2.2
July 31, 2014
Page 111 of 322
User’s Manual
address is found in the UTLB, the penalty associated with the miss in the shadow array is five cycles if there
is no contention. If the address is also a miss in the UTLB, an instruction or data TLB miss exception is
reported.
The replacement of entries in the shadow TLBs is managed by hardware in a round-robin fashion. Upon a
shadow TLB miss that leads to a UTLB hit, the hardware casts out the oldest entry in the shadow TLB and
replaces it with the new translation.
The hardware also invalidates all of the entries in both of the shadow TLBs upon any context synchronization.
Context synchronizing operations follow:
• Any interrupt (including machine check)
• Execution of isync
• Execution of rfi, rfci or rfmci
• Execution of sc
Note that there are other context changing operations that do not cause automatic context synchronization in
the hardware. For example, execution of a tlbwe instruction changes the UTLB contents but does not cause
a context synchronization, and thus, does not invalidate or otherwise update the shadow TLB entries. For
changes to the entries in the UTLB (or to other address-related resources such as the PID) to be reflected in
the shadow TLBs, software must ensure that a context synchronizing operation occurs before any attempt to
use any address associated with the updated UTLB entries (either the old or new contents of those entries).
By invalidating the shadow TLB arrays, a context synchronizing operation forces the hardware to refresh the
shadow TLB entries with the updated information in the UTLB as each memory page is accessed.
4.3.10.1 Instruction-Side and Data-Side TLB Miss Searches
When an instruction-side or data-side TLB miss occurs, the ICU or DCU presents the EA with the request.
The AS bit is driven by the ICU or IU (for the DCU). The EA, with an ICU or DCU-latched version of the PID
register, is hashed to obtain the UTLB index address, based on the page-size hash-order specified in either
the Supervisor Search Priority Configuration Register (SSPCR) or the User Search Priority Configuration
Register (USPCR). Because it takes four cycles to determine if there is a matching entry in the tag array,
subsequent page size hashes specified in either the SSPCR or USPCR are used to pipeline index addresses
to the tag array.
Entries placed in the UTLB with TID[0:15] = x‘0000’ are considered global pages. They match whether the
request PID[0:15] = x‘0000’ or not. However, because part of the PID is used in the hash, it is possible for a
global page to exist in the UTLB, even though the TLB index does not match the EA-PID hash. This is
handled by going through the search order twice when the requested PID[0:15] does not equal x‘0000’: once
using a value of x‘0000’ instead of the actual PID, and again using the actual PID[0:15] value.
UTLB entries are searched by the EU in response to a tlbsx instruction. The EU presents the effective
address (EA) with the request. The EA, along with the set translation ID (STID) field set in the MMUCR, is
hashed to obtain the UTLB index address, based on the page size hash order specified in the SSPCR. When
a matching UTLB entry is located, the way and index address are returned to the EU. Software can compare
the index address returned to the MMUBE0 and MMUBE1 SPRs to determine if the returned TLB entry is a
bolted entry. Software can then use the tlbre instruction to read the contents of the UTLB entry.
Page 112 of 322
Version 2.2
July 31, 2014
User’s Manual
4.3.11 Reading UTLB Entries
The UTLB entry can be read by the EU using the tlbre instruction. The way and index address specified in
the tlbre instruction must first be obtained by a tlbsx instruction. Three tlbre instructions are required to read
the entry information stored in the tag (WS = 0) and data (WS = 1 and WS = 2) arrays.
4.3.12 Invalidating UTLB Entries
A UTLB entry can be invalidated by the tlbwe instruction with WS = 0 and V = 0.
A UTLB entry can also be invalidated by a local or remote (snooped) tlbivax instruction. A local tlbivax
instruction is received as an EU request, with a corresponding effective address (EA). The EA, along with the
STID set in the MMUCR SPR, is hashed to obtain the UTLB index address, based on the page size hash
order specified in the ISPCR. If a matching entry is found in the UTLB, the corresponding V bit in the tag is
written to ‘0’, unless that entry is bolted. If the tlb entry is bolted. It is not invalidated.
A remote tlbivax instruction is received as a snoop request. The MMU holds the SnpAvail signal active until a
second snoop request is sampled active. Thus, two snoop requests can be serviced at the same time. Each
snoop request is presented with the corresponding AS, PID, and EA fields. The EA and PID fields are hashed
to obtain the UTLB index address, based on the page size hash order specified in the ISPCR. If a matching
entry is found in the UTLB, the corresponding V bit in the tag is written to ‘0’, unless that entry is bolted. If the
tlb entry is bolted. It is not invalidated. If the index address of the matching entry is equivalent to the value in
the latched index address (LINDEX), then the LVALID bit is cleared, again unless the entry is bolted. This
allows an entry that is partially written to be snoop invalidated.
4.4 Access Control
When a matching TLB entry has been identified and the address has been translated, the access control
mechanism determines whether the program has execute, read, write, or read and write access to the page
the address refers to.
4.4.1 Execute Access
The UX or SX bit of a TLB entry controls execute access to a page of storage, depending on the operating
mode (user or supervisor) of the processor.
User mode (MSR[PR] = ‘1’)
Instructions can be fetched and executed from a page in storage while in user mode if the UX access control
bit for that page is equal to ‘1’. If the UX access control bit is equal to ‘0’, instructions from that page are not
fetched and will not be placed into any cache as the result of a fetch request to that page while in user mode.
Furthermore, if the sequential execution model calls for the execution in user mode of an instruction from a
page that is not enabled for execution in user mode (that is, UX = ‘0’ when MSR[PR] = ‘1’), an execute access
control exception type instruction storage interrupt is taken (see Section 7 Processor Interrupts and Exceptions on page 167 for more information).
Version 2.2
July 31, 2014
Page 113 of 322
User’s Manual
Supervisor Mode (MSR[PR] = ‘0’)
Instructions can be fetched and executed from a page in storage while in supervisor mode if the SX access
control bit for that page is equal to ‘1’. If the SX access control bit is equal to ‘0’, instructions from that page
are not fetched and will not be placed into any cache as the result of a fetch request to that page while in
supervisor mode.
Furthermore, if the sequential execution model calls for the execution in supervisor mode of an instruction
from a page that is not enabled for execution in supervisor mode (that is, SX = ‘0’ when MSR[PR] = ‘0’), an
execute access control exception type instruction storage interrupt is taken (see Section 7 Processor Interrupts and Exceptions on page 167 for more information).
4.4.2 Write Access
The UW or SW bit of a TLB entry controls write access to a page, depending on the operating mode (user or
supervisor) of the processor.
Store operations (including the store-class cache management instructions dcbz and dcbtst) are permitted
to a page in storage while in user mode if the UW access control bit for that page is equal to ‘1’. If execution
of a store operation is attempted in user mode to a page for which the UW access control bit is ‘0’, a write
access control exception occurs. If the instruction is an stswx with string length 0, no interrupt is taken and no
operation is performed. For all other store operations, execution of the instruction is suppressed and a data
storage interrupt is taken.
Although the dcbi cache management instruction is a store-class instruction, its execution is privileged and
thus will not cause a data storage interrupt if execution of it is attempted in user mode (a privileged instruction
exception type program interrupt will occur instead).
Supervisor mode (MSR[PR] = ‘0’)
Store operations (including the store-class cache management instructions dcbz, dcbtst, and dcbtstls) are
permitted to a page in storage while in supervisor mode if the SW access control bit for that page is equal to
‘1’. If execution of a store operation is attempted in supervisor mode to a page for which the SW access
control bit is ‘0’, a write access control exception occurs. If the instruction is a stswx with string length 0, no
interrupt is taken and no operation is performed. For all other store operations, execution of the instruction is
suppressed and a data storage interrupt is taken.
4.4.3 Read Access
The UR or SR bit of a TLB entry controls read access to a page, depending on the operating mode (user or
supervisor) of the processor.
Load operations (including the load-class cache management instructions dcbst, dcbf, dcbt, icbi, and icbt)
are permitted from a page in storage while in user mode if the UR access control bit for that page is equal to
‘1’. If execution of a load operation is attempted in user mode to a page for which the UR access control bit is
‘0’, a read access control exception occurs. If the instruction is a load (not including lswx with string length 0)
Page 114 of 322
Version 2.2
July 31, 2014
User’s Manual
or is a dcbst, dcbf, or icbi, execution of the instruction is suppressed and a data storage interrupt is taken.
However, if the instruction is an lswx with string length 0, or is a dcbt or icbt, no interrupt is taken and no
operation is performed.
Supervisor mode (MSR[PR] = ‘0’)
Load operations (including the load-class cache management instructions dcbst, dcbf, dcbt, dcbi, dcbtls,
dcblc, icbi, icbt, and icblc) are permitted from a page in storage while in supervisor mode if the SR access
control bit for that page is equal to ‘1’. If execution of a load operation is attempted in supervisor mode to a
page for which the SR access control bit is ‘0’, a read access control exception occurs. If the instruction is a
load (not including lswx with string length 0) or is a dcbst, dcbf, dcbi or icbi, execution of the instruction is
suppressed and a data storage interrupt is taken. However, if the instruction is an lswx with string length 0, or
is a dcbt or icbt, no interrupt is taken and no operation is performed.
4.4.4 Access Control Applied to Cache Management Instructions
This section summarizes how each of the cache management instructions is affected by the access control
mechanism.
dcbz
This instruction is treated as a store with respect to access control because it changes the data in
a cache block. As such, it can cause write access control exception type data storage interrupts.
dcbi
This instruction is treated as a load with respect to access control because it can change the
value of a storage location by invalidating the current copy of the location in the data cache,
effectively restoring the value of the location to the former value that is contained in memory. As
such, it can cause write access control exception type data storage interrupts.
dcba
This instruction is treated as a no-op under all circumstances, and thus cannot cause any form of
data storage interrupt.
icbi
This instruction is treated as a load with respect to access control. As such, it can cause read
access control exception type data storage interrupts. This instruction can cause a data storage
interrupt (and not an instruction storage interrupt), even though it otherwise would perform its
operation on the instruction cache. Instruction storage interrupts are associated with exceptions
that occur upon the fetch of an instruction whereas data storage interrupts are associated with
exceptions that occur upon the execution of a storage access or cache management instruction.
dcbt and
icbt
These instructions are treated as loads with respect to access control. As such, they can cause
read access control exceptions. However, because these instructions act merely as hints that the
specified cache block will likely be accessed by the processor in the near future, such exceptions
do not result in a data storage interrupt. Instead, if a read access control exception occurs, the
instruction is treated as a no-op.
dcbtst
This instruction is treated as a store with respect to access control. As such, it can cause store
access control exceptions. However, because this instruction is intended to act merely as a hint
that the specified cache block will likely be accessed by the processor in the near future, such
exceptions do not result in a data storage interrupt. Instead, if a read access control exception
occurs, the instruction is treated as a no-op.
dcbf and
dcbst
These instructions are treated as loads with respect to access control. As such, they can cause
read access control exception type data storage interrupts. Flushing or storing a dirty line from
the cache is not considered a store because an earlier store operation has already updated the
cache line, and the dcbf or dcbst instruction is simply causing the results of that earlier store
operation to be propagated to memory.
Version 2.2
July 31, 2014
Page 115 of 322
User’s Manual
dci and ici These instructions do not generate an address. Also, the access control mechanism does not
affect these instructions. They are privileged instructions, and if executed in supervisor mode,
they flash invalidate the entire associated cache.
4.5 Storage Attributes
Each TLB entry specifies a number of storage attributes for the memory page with which it is associated.
Storage attributes affect the manner in which storage accesses to a given page are performed. The storage
attributes (and their corresponding TLB entry fields) are:
• Write-through (W)
• Caching inhibited (I)
• Memory coherence required (M)
• Guarded (G)
• Endianness (E)
• User-definable (U0, U1, U2, U3)
All combinations of these attributes are supported except combinations that simultaneously specify a region
as write-through and caching inhibited.
4.5.1 Write-Through (W)
The PowerPC 476FP processor data cache ignores the write-through attribute. The data for all store operations are written to memory, as opposed to only being written into the data cache. If the referenced line also
exists in the data cache (that is, the store operation is a hit), then the data is also written into the data cache.
An alignment exception occurs if a dcbz instruction targets a memory page that is either write-through
required or caching inhibited. A data storage exception occurs if a lwarx, stwcx., or instruction targets a
memory page that is either write-through required or caching inhibited. See Section 5 Instruction and Data
Caches on page 133 for more information on the handling of accesses to write-through storage.
4.5.2 Caching Inhibited (I)
If a memory page is marked as caching inhibited (I = 1), then all load, store, and instruction fetch operations
perform their access in memory, as opposed to in the respective cache. If I = 0, then the page is cacheable
and the operations may be performed in the cache.
An alignment exception occurs if a dcbz instruction targets a memory page that is either write-through
required or caching inhibited. A data storage exception occurs if a lwarx, stwcx., or instruction targets a
memory page that is either write-through required or caching inhibited. It is a programming error for the target
location of a load, store, dcbz, or fetch access to caching inhibited storage to be in the respective cache; the
results of such an access are undefined. It is not a programming error for the target locations of the other
cache management instructions to be in the cache when the caching inhibited storage attribute is set. The
behavior of these instructions is defined for both I = 0 and I = 1 storage. See Section 5 for more information
about the handling of accesses to caching-inhibited storage.
Page 116 of 322
Version 2.2
July 31, 2014
User’s Manual
4.5.3 Hardware Enforced IL1I and IL1D
The inhibit L1 instruction (IL1I) field and inhibit L1 data (IL1D) field indicate the page will be treated as cacheinhibited for the L1 cache, regardless of the I bit used for the L2 cache attribute.
Table 4-6. Access Control Applied to Cache Management Instructions
Instruction
dcbf, dcbst, icbi
dcbi
dcbt, icbt
dcbtst, dcbz
dcba
Notes
Generates read-protection permission violation interrupt if page UR = ‘0’.
Generates read-protection permission violation interrupt if page UR = ‘0’. If MSR[PR] = ‘1’, a privileged or illegal instruction error exception occurs.
These instructions are no-ops if UR = ‘0’.
Generates write-protection permission violation INTR if page UW = ‘0’.
This instruction is a no-op.
4.5.4 Memory Coherence Required (M)
The memory coherence required (M) storage attribute is defined by the architecture to support cache and
memory coherency within multiprocessor shared memory systems. If a TLB entry is created with M = 1, any
storage accesses to the page associated with that TLB entry are indicated, using the corresponding transfer
attribute interface signal, as being memory coherence required, but the setting has no effect on the operation
within the PowerPC 476FP processor.
4.5.5 Guarded (G)
The guarded storage attribute is provided to control speculative access to non-well-behaved memory locations. Storage is said to be well behaved if the corresponding real storage exists and is not defective, and if
the effects of a single access to it are indistinguishable from the effects of multiple identical accesses to it. As
such, data and instructions can be fetched out of order from well-behaved storage without causing undesired
side effects.
In general, storage that is not well behaved should be marked as guarded. Because such storage might
represent a control register on an I/O device or might include locations that do not exist, an out-of-order
access to such storage might cause an I/O device to perform unintended operations or may result in a
machine-check exception. For example, if the input buffer of a serial I/O device is memory-mapped, then an
out-of-order or speculative access to that location could result in the loss of an item of data from the input
buffer, if the instruction execution is interrupted and later reattempted.
A data access to a guarded storage location is performed only if either the access is caused by an instruction
that is known to be required by the sequential execution model, or the access is a load and the storage location is already in the data cache. Once a guarded data storage access is initiated, if the storage is also
caching inhibited then only the bytes specifically requested are accessed in memory, according to the
operand size for the instruction type. Data storage accesses to guarded storage that is marked as cacheable
can access the entire cache block, either in the cache itself or in memory. To avoid unintended results, the
storage should be guarded and cache-inhibited to maintain a well-behaved storage model.
Version 2.2
July 31, 2014
Page 117 of 322
User’s Manual
Instruction fetch is not affected by guarded storage. While the architecture does not prohibit instruction
fetching from guarded storage, system software should generally prevent such instruction fetching by
marking all guarded pages as no-execute (UX/SX = 0). Then, if an instruction fetch is attempted from such a
page, the memory access will not occur and an execute access control exception type instruction storage
interrupt will result if and when execution is attempted for an instruction at any address within the page.
See Section 5 Instruction and Data Caches on page 133 for more information about the handling of accesses
to guarded storage.
4.5.6 Endian (E)
The endian (E) storage attribute controls the byte ordering with which load, store, and fetch operations are
performed. Byte ordering refers to the order in which the individual bytes of a multiple-byte scalar operand are
arranged in memory. The operands in a memory page with E = 0 are arranged with big-endian byte ordering,
which means that the bytes are arranged with the most-significant byte at the lowest-numbered memory
address. The operands in a memory page with E = 1 are arranged with little-endian byte ordering, which
means that the bytes are arranged with the least-significant byte at the lowest-numbered address.
4.5.7 User-Definable (U0 - U3)
The PowerPC 476FP core provides four user-definable (U0 - U3) storage attributes that can be used to
control system-dependent behavior of the storage system. By default, these storage attributes do not have
any effect on the operation of the PowerPC 476FP core, although all storage accesses indicate to the
memory subsystem the values of U0 - U3 using the corresponding transfer attribute interface signals. The
specific system design can then take advantage of these attributes to control some system-level behaviors.
4.5.8 Supported Storage Attribute Combinations
Storage modes where both W = 1 and I = 1 (that would represent write-through but caching inhibited storage)
are not supported. For all supported combinations of the W and I storage attributes, the G, E, and U0 - U3
storage attributes can be used in any combination.
4.5.9 Aliasing
For multiple pages that are mapped to the same real address the following rules apply:
1. If the multiple pages exist on a single processor, then:
The I bits (I, IL1I, IL1D) must match the corresponding I bits on all pages (see note below).
The W bits do not need to match on all pages.
The M bits do not need to match on all pages. In such a case, it is the software’s responsibility to maintain
data coherency.
2. If the multiple pages exist on multiple processors, then:
The I bits (I, IL1I, IL1D) do not need to match on all pages.
The W bit must match on all pages. (Book E requirement).
The M bits do not need to match on all pages. In such a case, it is the software’s responsibility to maintain
data coherency.
Note: For multiple pages that exist on a single processor that map to the same real address, the I bits (I, IL1I,
IL1D) do not need to match under the following conditions that must be guaranteed by software:
Page 118 of 322
Version 2.2
July 31, 2014
User’s Manual
1. For those pages where the I bit is zero, the page must be marked as guarded and no execute to prevent
speculative accesses.
2. For those addresses where the cacheability attributes are different software must ensure that only those
pages where all I bits are the same access the overlapped real address. Alternatively, software can manage the cache appropriately between different cacheability accesses to guarantee that an access to any
I = 1 is not found in the associated cache. When the I bit is a one, the data must not be in any level of
cache.
For example consider a cacheable 64 KB page and a noncacheable 4 KB page (the smallest page size for
the PowerPC 476FP core) that both map to the same real address (for example the 4 KB page maps to the
last 4 KB of real addresses that the 64 KB page maps to). In this case the 64 KB page is marked as guarded
and cacheable. In addition, software must ensure that when operating in the 64 KB page no accesses are
performed to the last 4 KB addresses.
4.6 MMU Registers
Table 4-7 summarizes the MMU Special Purpose Registers (SPRs) available to software. In the rest of this
section, these SPRs are described in detail.
Table 4-7. MMU SPR Summary
Name
Reset Value
SPRN
PID
x‘XXXX XXXX’
x‘030’
Processor ID Register.
RMPD
x‘XXXX XXXX’
x‘339’
Real Mode Page Description Register.
MMUBE0
x‘XXXX XXXX’, ‘000’
x‘334’
MMU Bolted Entries 0 Register.
MMUBE1
x‘XXXX XXXX’, ‘000’
x‘335’
MMU Bolted Entries 1 Register.
SSPCR
x‘0000 000X’
x‘33E’
Supervisor Search Priority Configuration Register.
USPCR
x‘0000 000X’
x‘33F’
User Search Priority Configuration Register.
ISPCR
x‘0000 000X’
x‘33D’
Invalidate/Search Priority Configuration Register
RSTCFG
strapped
x‘39B
Reset Configuration Register.
MMUCR
‘00000’, ‘X’, x‘000 0000’
x‘3B2’
MMU Configuration Register.
Version 2.2
July 31, 2014
Description
Page 119 of 322
User’s Manual
4.6.1 Process ID Register (PID)
Reserved
0
1
2
3
4
5
6
Bits
Field Name
0:15
Reserved
16:31
PID
7
8
PID
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Description
Process ID. This field is used for trace broadcast by the MMU, with EA[0:19], to hash an index address
into the UTLB.
4.6.2 Real Mode Page Description Register (RMPD)
0
1
2
3
4
5
6
7
8
9
Field Name
0:1
Reserved
2:11
ERPN
12
Reserved
13
W
14
I
Cache inhibited.
15
M
Memory coherency required.
16
G
Guarded.
17
E
Endian.
18
IL1I
L1 instruction cache inhibit.
19
IL1D
L1 data cache inhibit.
20:21
Reserved
22
SX
Supervisor execute permission.
23
SR
Supervisor read permission.
24
SW
Supervisor write permission.
25
UX
User execute permission.
26
UR
User read permission.
27
UW
User write permission.
28:31
U
Page 120 of 322
G
E
SX SR SW UX UR
U
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bits
M
UW
I
Reserved
W
IL1D
ERPN
IL1I
Reserved
Reserved
Real mode paging is used in debugging only and must be used with care. Thus, this register is reserved for
debugging by designers.
Description
Extended real page number.
Write through.
U0 - U3. User-defined bits.
Version 2.2
July 31, 2014
User’s Manual
0
1
2
3
4
5
6
7
8
9
IBE2
Reserved
VBE2
IBE1
VBE1
IBE0
VBE0
4.6.3 MMU Bolted Entries 0 Register (MMUBE0)
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bits
Field Name
Description
0:7
IBE0
Read only. UTLB index address for bolted entry 0.
8:15
IBE1
16:23
IBE2
24:28
Reserved
29
VBE0
Valid bit for bolted entry 0.
0
There is no bolted entry at the index address in IBE0.
1
There is a bolted entry in way 0 at the index address in IBE0.
30
VBE1
0
1
31
VBE2
0
1
0
1
2
3
4
5
6
7
8
9
IBE5
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bits
Field Name
0:7
IBE3
8:15
IBE4
16:23
IBE5
24:28
Reserved
Description
29
VBE3
0
1
30
VBE4
0
1
31
VBE5
0
1
Version 2.2
July 31, 2014
Reserved
VBE5
IBE4
VBE4
IBE3
VBE3
4.6.4 MMU Bolted Entries 1 Register (MMUBE1)
Page 121 of 322
User’s Manual
4.6.5 Search Priority Configuration Registers
There are three sets of registers to control UTLB look-up/search priority. Two registers, SSPCR and USPCR,
are used for instruction-side TLB and data-side TLB misses. The SSPCR is assigned for supervisor/privileged mode (MSR[PR] = ‘0’), and the USPCR is used for problem/user mode (MSR[PR] = ‘1’).
Another register, the Invalidate/Search Priority Configuration Register (ISPCR), is used for local tlbsx and
tlbivax operations, and for incoming snoops resulting from external tlbivax operations. Separating the registers reduces the number of pages searched to the minimum, improving performance by reducing search
latency.
All three sets of registers are written by software when the UTLB is set up. For example, in user mode, if there
are many 4 KB pages, several 64 KB pages, and a few 256 MB pages, the USPCR first searches using the
4 KB hash, then the 64 KB hash, and finally, the 256 MB hash.
4.6.6 Supervisor Search Priority Configuration Register (SSPCR)
This register is used when MSR[PR] = ‘0’. Figure 4-3 Supervisor Search Priority Configuration Registers on
page 123 illustrates how this register works.
ORD1
0
1
2
ORD2
3
4
5
6
ORD3
7
8
9
ORD4
ORD5
ORD6
ORD7
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bits
Field Name
0:3
ORD1
Order 1. See Figure 4-3 on page 123 for code values and page sizes for all these fields.
4:7
ORD2
Order 2.
8:11
ORD3
Order 3.
12:15
ORD4
Order 4.
16:19
ORD5
Order 5.
20:23
ORD6
Order 6.
24:27
ORD7
Order 7.
28:31
Reserved
Page 122 of 322
Reserved
Description
Version 2.2
July 31, 2014
User’s Manual
Figure 4-3. Supervisor Search Priority Configuration Registers
0
34
Order 1
78
Order 2
Binary Order Code
11 12
Order 3
15 16
Order 4
19 20
Order 5
23 24
Order 6
27 28
Order 7
31
Reserved
Page Size Searched
4 KB
16 KB
64 KB
1 MB
16 MB
256 MB
1 GB
Use input PID
If input PID ≠ 0, the first search uses PID = 0. Then repeat the search with input PID.
No search, with the exception of ‘order 1’ where ‘0000’ indicates a 4 KB page.
x001
x010
x011
x100
x101
x110
x111
0xxx
1xxx
0000
Search priority order:
1. Order 1
2. Order 2
3. Order 3
4. Order 4
5. Order 5
6. Order 6
7. Order 7
Any order code having ‘0000’ stops the search. For example, if order 1 = ‘0010’, order 2 = ‘0001’, and order 3 = ‘0000’,
then hardware stops the search after order 2.
A value of ‘0000’ for order 1 searches using 4 KB hash.
4.6.7 Invalidate Search Priority Configuration Register (ISPCR)
0
1
2
3
4
5
6
Bits
Field Name
0
Reserved
1:3
ORD1
4
Reserved
5:7
ORD2
Version 2.2
July 31, 2014
7
8
9
ORD5
ORD6
Reserved
ORD4
Reserved
ORD3
Reserved
ORD2
Reserved
ORD1
Reserved
Reserved
Reserved
The ISPCR is used for local tlbsx and tlbivax instructions, and for incoming snoops that result from external
tlbivax instructions. Figure 4-4 on page 124 illustrates how this register works.
ORD7
Reserved
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Description
Order 1.
Order 2.
Page 123 of 322
User’s Manual
Bits
Field Name
8
Reserved
9:11
ORD3
12
Reserved
13:15
ORD4
16
Reserved
17:19
ORD5
20
Reserved
21:23
ORD6
24
Reserved
25:27
ORD7
28:31
Reserved
Description
Order 3.
Order 4.
Order 5.
Order 6.
Order 7.
Figure 4-4. Invalidate Search Priority Configuration Register
Binary Order Code
001
010
011
100
101
110
111
000
Page Size Searched
4 KB
16 KB
64 KB
1 MB
16 MB
256 MB
1 GB
No search, with the exception of ‘order 1’‚ where ‘0000’ indicates a 4 KB page.
1. Order 1
2. Order 2
3. Order 3
4. Order 4
5. Order 5
6. Order 6
7. Order 7
then hardware stops the search after order 2.
A value of ‘000’ for order 1 searches using a 4 KB hash.
Page 124 of 322
Version 2.2
July 31, 2014
User’s Manual
4.6.8 User Search Priority Configuration Register (USPCR)
This register is used when MSR[PR] = ‘1’. Figure 4-5 User Search Priority Configuration Registers (USPCR)
on page 126 illustrates how this register works.
ORD1
0
1
2
ORD2
3
4
5
6
ORD3
7
8
9
ORD4
ORD5
ORD6
ORD7
Reserved
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bits
Field Name
0:3
ORD1
Order 1. See Figure 4-5 on page 126 for code values and page sizes for all these fields.
4:7
ORD2
Order 2.
8:11
ORD3
Order 3.
12:15
ORD4
Order 4.
16:19
ORD5
Order 5.
20:23
ORD6
Order 6.
24:27
ORD7
Order 7.
28:31
Reserved
Version 2.2
July 31, 2014
Description
Page 125 of 322
User’s Manual
Figure 4-5. User Search Priority Configuration Registers (USPCR)
0
34
Order 1
78
Order 2
11 12
Order 3
15 16
Order 4
19 20
Order 5
23 24
Order 6
27 28
Order 7
31
Reserved
Page Size Searched
Binary Order Code
x001
x010
x011
x100
x101
x110
x111
0xxx
1xxx
0000
4 KB
16 KB
64 KB
1 MB
16 MB
256 MB
1 GB
Use input PID
If input PID ≠ 0, the first search uses PID = 0. Then, repeat the search with input
No search, with the exception of ‘order 1’ where ‘0000’ indicates a 4 KB page.
1. Order 1
2. Order 2
3. Order 3
4. Order 4
5. Order 5
6. Order 6
7. Order 7
hardware stops the search after order 2.
A value of ‘0000’ for order 1 searches using a 4 KB hash.
4.6.9 Reset Configuration Register (RSTCFG)
See Section 2.7.7 Reset Configuration (RSTCFG) on page 74.
3
4
5
6
Bits
Field Name
0
REALE
Page 126 of 322
LINDEX
7
8
9
STS
Reserved
2
IULXE
1
DULXE
0
LWAY
LVALID
REALE
4.6.10 MMU Configuration Register (MMUCR)
STID
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Description
Real mode enable.
0
Address translation, storage, and permission bits are sourced from the UTLB.
1
Address translation, storage, and permission bits are sourced from RMPD.
Version 2.2
July 31, 2014
User’s Manual
Bits
Field Name
Description
1:2
LWAY
Latched way.
Read only. Set by tlbwe with word select (WS) = 0. Set to RA[1:2] when RA[0] = ‘1’ and RA[4] = ‘0’. Set
to ‘00’ when RA[4] = ‘1’ to ensure that bolted entries are placed in way 0. Set to the hardware assist
value when RA[0] = ‘0’. Used for way by tlbwe with WS = 1 and WS = 2.
3
LVALID
Latched valid.
Read only. Set by tlbwe with WS = 0 to the V bit value. Cleared on tlbwe WS = 2 or by a tlbivax matching LINDEX before tlbwe WS = 2.
4
DULXE
Data-side user locked line exception enable.
0
User mode attempt to execute dcbf does not generate an exception.
1
User mode attempt to execute dcbf generates an exception.
5
IULXE
Instruction-side user locked line exception enable.
0
User mode attempt to execute icbi does not generate an exception.
1
User mode attempt to execute icbi generates an exception.
6
Reserved
7:14
LINDEX
15
STS
Set translation space.
Set by software before tlbwe.
16:31
STID
Set translation ID.
Set by software before tlbwe. Written to the UTLB TID field during tlbwe, WS = 0. Used in the index
address hash during tlbsx and tlbivax (local).
Latched index address.
Read only. Set by tlbwe with WS = 0. Used for the index address by tlbwe with WS = 1 and WS = 2.
4.7 UTLB Block Descriptions
The UTLB design can be broken into the following parts:
• Tag array
• Data array
• TLB coherency
The arrays are arranged so that four entries can be read at a time.
4.7.1 Tag Array
The tag array contains the necessary information to determine if a UTLB request from either the EU, ICU,
DCU, or snoop interface matches a valid entry.
4.8 Software Considerations
The PowerPC 476FP UTLB is a software managed entity. Initializing entries and invalidating entries (except
from external tlbivax) require explicit software instructions. Typical UTLB searches, resulting from misses in
the ITLB or DTLB, are handled through the hardware. Four instructions must be handled by the software:
tlbsx, tlbre, tlbwe, tlbivax.
Version 2.2
July 31, 2014
Page 127 of 322
User’s Manual
4.8.1 TLB Search Indexed (tlbsx)
The tlbsx instruction provides a translation space (through MMUCR[STS]), translation ID (through
MMUCR[STID]), and an effective address (EA), used to search the UTLB for a matching entry. The UTLB is
searched following the order specified in ISPCR. If a match is found, the way and index address are returned.
The format of the tlbsx instruction is shown here:
tlbsx RT, RA, RB Rc
The effective address is computed as described here:
EA = (RA) + (RB)
if RA = 0, EA = 0 + (RB)
if Rc = 1
CR[CR0[0]] ← 0
CR[CR0[1]] ← 0
CR[CR0[3]] ← XER[SO]
The virtual-address equals MMUCR[STS] | MMUCR[STID] | EA[0:n], where n = 19 for a 4 KB page.
If a valid entry with the following matches are found in the UTLB:
MMUCR[STS] matches tlbentry[TS], and
If tlbentry[TID] ≠ ‘0’, MMUCR[STID] matches tlbentry[TID], and
EA[0:19] matches tlbentry EPN[0:19]
then
RT[33:34] ← Way hit, driven on MMU_iwbRdData[1:2]
RT[40:47] ← index address of the matching TLB entry, driven on MMU_iwbRdData[8:15]
if Rc = 1
CR[CR0[2]] ← 1
else
(RT) ← undefined
if Rc = 1
CR[CR0[2]] ← 0
4.8.2 TLB Read Entry (tlbre)
The tlbre instruction is used to read the contents of an entry. A tlbsx must be performed before a tlbre
instruction to obtain the UTLB way and index address.
The format of the tlbre instruction is shown here:
tlbre RT, RA, WS
TLB[(RA)33:34] indicates the tlbentry way to be read. TLB[(RA)40:47] indicates the tlbentry index address.
Software can determine if the entry is bolted by comparing it to the MMUBE0 and MMUBE1 Registers.
if WS = 0
RT[32:60] ← tlbentry[EPN(0:19), V, TS, DSIZ(0:5), BLTD]
if CCR0[CRPE] = 0
Page 128 of 322
Version 2.2
July 31, 2014
User’s Manual
RT[61:63] ← 000
else
RT[61:63] ← tlbentry[EPNPar, DSIZPar, TIDPar]
MMUCR[STID] ← tlbentry[TID]
else if WS = 1
RT[32:51] ← tlbentry[RPN]
RT[54:63] ← tlbentry[ERPN]
if CCR0[CRPE] = 0
RT[52] ← 0
RT[53] ← 0
else
RT[52] ← tlbentry[RPNPar]
RT[53] ← tlbentry[ERPNPar]
else if WS = 2
RT[46:56] ← tlbentry[IL1I, IL1D, U(0:3),W,I,M,G,E], driven on MMU_iwbRdData[14:24]
RT[58:63] ← tlbentry[UX,UW,UR,SX,SW,SR]
if CCR0[CRPE] = 0
RT[32] ← 0’b0
else
RT[32] ← Storage/Permission Parity
else (RT),
MMUCR[STID] ← undefined
4.8.3 TLB Write Entry (tlbwe)
The tlbwe instruction is used by software to place entries in the UTLB. Software can choose to specify the
way to be written, or to use the hardware assist way counters. If the way is specified in software, the entry can
be bolted, which protects it from subsequent hardware assisted way selection writes, or tlbivax (local or
remote). Software can choose to invalidate entries by using tlbwe to clear the valid bit. Bolted entries invalidated in this manner must either specify that the entry is bolted (with RA[36] = ‘1’) or specify way 0 in the
tlbwe instruction (with RA[32:34] = ‘100’).
The format of the instruction is as follows:
tlbwe RS, RA, WS
If TLB[(RA)36] = 1, tlbentry way to be written is '0'
else if TLB[(RA)36] = 0 and TLB[(RA)32] = 1, tlbentry way is specified by TLB[(RA)33:34]
else provided by the way counter corresponding to the index address
If TLB[(RA)36] = 1, tlbentry to be written is bolted. TLB[(RA)37:39] specifies the bolted entry,
000 - 101
if WS = 0
UTLB index address obtained by hashing input EPN[0:19] and DSIZ with MMUCR STID[8:15] UTLB
way, index address, and valid bit are latched into MMUCR[LWAY], MMUCR[LINDEX], and
MMUCR[LVALID] for use with WS = 1 and WS = 2
if V-bit = ‘1’,
tlbentry[EPN, V, TS, DSIZ] ← RS[32:59]
EPN, TS, and DSIZ are written to the UTLB. The V bit is not written until WS = 2.
tlbentry[TID] ← MMUCR[STID]
if V-bit = ‘0’,
Version 2.2
July 31, 2014
Page 129 of 322
User’s Manual
tlbentry[V] ← ‘0’
else if WS = 1
tlbentry[RPN] ← RS[32:51]
tlbentry[ERPN] ← RS[54:63]
write to UTLB way and index address specified by MMUCR[LWAY] and MMUCR[LINDEX]
else if WS = 2
tlbentry[IL1I,IL1D,U(0:3),W,I,M,G,E] ← RS[46:56]
tlbentry[UX,UW,UR,SX,SW,SR] ← RS[58:63]
write to UTLB way and index address specified by MMUCR[LWAY] and MMUCR[LINDEX]
tlbentry[V] ← MMUCR[LVALID]
else
tlbentry ← undefined
Parity bits are automatically calculated in hardware and entered into the tag and data arrays of the UTLB.
4.8.4 TLB Invalidate, Virtual Address Indexed (tlbivax)
The tlbivax instruction searches the UTLB and, if a match is found, invalidates the first matching entry.
The format of the tlbivax instruction is shown here:
tlbivax RA, RB
The effective address is computed as follows:
EA = (RA) + (RB)
if RA = 0, EA = 0 + (RB)
If a valid entry with the following matches is found in the UTLB, and that entry is not listed as a bolted entry in
the MMUBE0 or MMUBE1 registers, then that entry is invalidated by writing its V bit to 0:
• MMUCR[STS] to tlbentry[TS], and
• if tlbentry[TID] ≠ 0, MMUCR[STID] to tlbentry[TID], and
• EA[0:19] to tlbentry EPN[0:19]
It uses the hash search order defined by ISPCR. Because the software sets the MMUCR[STID], there is no
need to force a nonzero PID to be equal to 0 for the first set of searches.
The MMUCR[STS], MMUCR[STID], and EA[0:19] is also broadcast from the core, through the L2, and on to
the PLB6 bus such that other coherent processors can invalidate a matching entry.
The tlbivax instruction will not invalidate the shadow TLBs (ITLB and DTLB). This can be accomplished with
an additional tlbsync instruction.
4.9 UTLB Coherency
Support for UTLB coherency involves four operations:
•
•
•
•
local tlbivax
local tlbsync (handled by the execution unit, generate idle signal)
remote tlbivax
remote tlbsync (handled by L2, generate idle signal)
Page 130 of 322
Version 2.2
July 31, 2014
User’s Manual
The MMU responds to a local tlbivax instruction by invalidating a matching unbolted UTLB entry. The SYNC
unit responds to a local tlbivax instruction by broadcasting the corresponding MMUCR[STS], MMUCR[STID],
and EA[0:19] through the L2 and onto the PLB6 bus.
Software, more specifically, the kernel, manages the PowerPC 476FP UTLB. It is imperative that tlbivax and
tlbsync are executed and operated one-at-a-time in the system. Multiple processors never simultaneously
process or execute these instructions.
4.10 tlbsync Special Operations
The master processor executes the following instruction sequence to ensure invalidation of TLB entries and
synchronize the effects of tlbivax:
mtmmucr RS
isync
tlbivax
...
isync
mbar
tlbsync
msync
Sets STS and STID fields for the tlbivax operation.
Ensures MMUCR is complete.
Invalidates a TLB entry.
A sequence of a number of mtmmucr and tlbivax instructions occurs.
Flushes the local ITLB and DTLB to force synchronization with the UTLB.
Ensures the subsequent tlbsync instruction will not bypass any of the tlbivax
instructions.
Synchronizes remote processors and ensures that all processors complete any
pending tlbivax instructions.
Ensures that all tlbivax and tlbsync instructions complete before starting the
next instruction.
4.10.1 Remote tlbsync operation
A remote tlbsync operation is heavy in that the L2 cache ensures the processor completes the tlbsync
instruction before accepting the subsequent msync instruction. The msync is retried until the tlbsync is
complete. When the tlbsync is complete, the system can detect all storage operations, including instruction
and fetches.
During the remote tlbsync operation, the processor ensures the completion of all preceding operations
including instruction fetches, and ensures that the context sync operation is completed.
4.10.1.1 CPU Remote tlbsync Operation
The remote tlbsync operation sequence follows:
1. Detect the remote tlbsync by the IU through the SYNC unit.
2. Flush to clear all uncommitted instructions and stop fetching any instructions. The IU will prevent the ICU
from fetching. Both the ITLB and DTLB shadow TLBs are invalidated (context switch operation).
3. A pseudo remote tlbsync operation is issued to the LRACC to mark the pseudo operation through the
EU and the DCU. Because this is a pseudo operation, there is no confirm and commit.
4. Ensure all read and write operations are completed.
5. A write command of x‘F’ is issued to the SYNC unit.
Version 2.2
July 31, 2014
Page 131 of 322
User’s Manual
6. The SYNC unit sends the command to the L2 cache indicating the remote tlbsync operation is complete.
When the SYNC unit receives an acknowledgement from the L2 cache (or becomes available), the SYNC
unit then indicates to the IU that the remote tlbsync operation is complete.
7. The IU sends a fetch request to the ICU to resume instruction executions.
4.10.1.2 L2 Cache Remote tlbsync Operations
The L2 cache must ensure that all preceding remote tlbivax and tlbsync operations are completed by the
remote accompanying processor before accepting the subsequent msync instructions. This remote L2 cache
does not guarantee the performing of loads and stores that might have used the previous translations.
The remote tlbsync operation sequence follows:
1. Detect the remote tlbsync instruction on the PLB6.
2. Set up a flag to mark a remote tlbsync instruction in progress. Any subsequent remote msync is retried
until the tlbsync is completed by the accompanying processor.
3. The L2 cache completes all pending requests from the accompanying processor, including instruction
fetches. The L2 cache continues all normal operations except the msync instruction from PLB6.
4. The L2 cache completes the tlbsync operation by resetting the tlbsync flag when the accompanying
processor returns the lwsync, x‘F’ write command.
5. The L2 cache accepts the remote msync and acknowledgement.
Page 132 of 322
Version 2.2
July 31, 2014
User’s Manual
5. Instruction and Data Caches
The PowerPC 476FP core provides separate level 1 (L1) instruction cache (I-cache) and L1 data cache (Dcache) controllers and arrays. These controllers and arrays allow concurrent access and minimize pipeline
stalls. The cache arrays are 32 KB each. Both cache controllers have 32-byte lines. Both cache controllers
are four-way set associative. The PowerPC 476FP core implementation also provides special debug instructions that can directly read the data arrays and the tag. Both the instruction controllers and data cache
controllers interface to the level 2 (L2) cache. The L2 cache interface consists of a 256-bit shared read bus
(reads from the L2 cache) and a 128-bit write bus (writes to the L2 cache).
Both caches support symmetrical multiprocessor (SMP) coherency through a processor local bus 6 (PLB6)
interconnect, and allow up to eight coherent masters and processors. Both caches are Power Instruction Set
Architecture (ISA) Version 2.05 compliant to ease programming.
Both caches are parity-protected against soft errors. If such errors are detected, the processor vectors to the
machine check interrupt handler where software can take appropriate action.
The rest of this section provides more detailed information about the operation of the instruction and data
cache controllers and arrays.
5.1 Cache Array Organization and Operation
The instruction and data cache arrays are organized identically. However, the fields of the tag and data
portions of the arrays are slightly different because the functions of the arrays differ. Both instruction cache
and data cache are real address (RA) tagged.
The associativity of each cache is 4-way set-associative. Each cache has 256sets, and the line size of each
cache is 32 bytes.
Table 5-1 illustrates generically the ways and sets of the cache arrays, and Table 5-2 on page 134 provides
specific values for the parameters used in Table 5-1.
Table 5-1. Instruction and Data Cache Array Organization
Line 0
Line n
Line 2n
Line (w – 1)n
Way 0
Way 1
Way 2
Way 3
Set 1
Line 1
Line n + 1
Line (w – 2)n + 1
Line (w – 1)n + 1
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Set 254
Line n – 2
Line 2n – 2
Line (w – 1)n – 2
Line wn – 2
Set 255
Line n – 1
Line 2n – 1
Line (w – 1)n – 1
Line wn – 1
Set 0
As shown in Table 5-2, the tag field for each line in each way holds the high-order address bits associated
with the line that currently resides in that way. The middle-order address bits form an index to select a specific
set of the cache, and the five lowest-order address bits form a byte-offset to choose a specific byte (or bytes,
depending on the size of the operation) from the 32-byte cache line.
Version 2.2
July 31, 2014
Instruction and Data Caches
Page 133 of 322
User’s Manual
Table 5-2. Cache Size and Parameters
Cache Size
Ways (w)
Sets (n)
Tag Address Bits
Set Address Bits
Byte Offset Address Bits
32 KB
4
256
RA[0:18]
RA[29:36]
RA[37:41] or EA[27:31]
The tag address bits shown inTable 5-2 refer to the RA bits and are for illustrative purposes only. Because
the instruction cache is tagged with the virtual address, and the data cache is tagged with the real address,
the actual tag address bits contained within each array are different.
See Section 5.2.2.9 Instruction Cache Debug Tag Register Low (ICDBTRL) on page 137 and Section 5.2.2.10
Instruction Cache Debug Tag Register High (ICDBTRH) on page 138 for instruction cache tag information. See
Section 5.5.22 Data Cache Debug Tag Register Low (DCDBTRL) on page 152 and Section 5.5.23 Data Cache
Debug Tag Register High (DCDBTRH) on page 153 for data cache tag information.
5.2 Instruction Cache Controller
The instruction cache unit (ICU) delivers four instructions per cycle to the instruction unit (IU) of the PowerPC
476FP core.
The ICU also handles the execution of the PowerPC instruction cache management instructions, for touching
(prefetching) or invalidating cache lines, or for flash invalidation of the entire cache. Resources for controlling
and debugging the instruction cache operation are also provided.
5.2.1 I-Cache Operations
The instruction cache can accept four types of instruction cache operations:
• icbt (instruction cache block touch), including icbtls (instruction cache block touch and lock set) and
icblc (instruction cache block lock clear)
• icbi (instruction cache block invalidate)
• ici (instruction cache invalidate)
• icread (instruction cache read)
Also, the instruction cache supports parity operations. The instruction cache contains parity bits and multihit
detection hardware to protect against soft data errors.
5.2.2 Instruction Cache Parity Operations
The instruction cache contains parity bits and multihit detection hardware to protect against soft data errors.
Two types of errors can be detected by the instruction cache parity logic. In the first type, the parity bits stored
in the RAM array are checked against the appropriate data in the instruction cache line when the RAM line is
read for an instruction fetch. Note that a parity error is not signaled as a result of an icread instruction.
The second type of parity error that can be detected is a multihit. This type of error occurs when a tag address
bit is corrupted, leaving two tags in the instruction cache array that match the same input address. Multihit
errors can be detected on any instruction fetch. No parity errors of any kind are detected on speculative fetch
lookups or icbt lookups. Rather, such lookups are treated as cache hits and cause no further action until an
instruction fetch lookup at the offending address causes an error to be detected.
Page 134 of 322
Version 2.2
July 31, 2014
User’s Manual
If a parity error is detected, and the MSR[ME] is asserted (that is, machine check interrupts are enabled), the
processor vectors to the machine check interrupt handler. As is the case for any machine check interrupt,
after vectoring to the machine check handler, the MCSRR0 contains the value of the oldest uncommitted
instruction in the pipeline at the time of the exception, and MCSRR1 contains the old MSR context. The interrupt handler can query the Machine Check Status Register (MCSR) to determine if it was called because of
an instruction cache parity error, and then must invalidate the instruction cache using the iccci instruction.
The handler returns to the interrupted process using the rfmci instruction.
If parity checking and machine check interrupts are enabled, instruction cache parity errors are always recoverable. Also note that the machine check interrupt is asynchronous; that is, the return address in the
MCSRR0 does not point at the instruction address that contains the parity error. Rather, the machine check
interrupt is taken as soon as the parity error is detected. Some instructions in progress are flushed and reexecuted after the interrupt, just as if the machine were responding to an external interrupt.
5.2.2.1 Instruction Cache Block Lock Clear (icblc)
The icblc instruction clears the lock bit for a given line in the instruction cache. If the CT field is set to ‘0’ (for
L1 cache), icblc clears the lock bit in the least recently used (LRU), valid, or lock array in the instruction
cache. If the CT field is set to 2, the icbic instruction sends a request to clear the lock in the L2. The target
line remains valid in the cache.
5.2.2.2 Instruction Cache Block Invalidate (icbi)
If the block containing the byte addressed by the EA is in the instruction cache of any processors, the block
(cache line) is invalidated in the instruction cache. This instruction is broadcast to all processors on the PLB.
5.2.2.3 Instruction Cache Invalidate (ici)
This instruction invalidates the entire L1 instruction cache. The ici instruction is not sent to the L2 cache. The
ici instruction generates an exception if it is not executed in supervisor mode. If the CT field of the instruction
is 2, the instruction is treated as a no-op.
The Power Instruction Set Architecture (ISA) specifies that software must place an isync instruction after the
ici instruction to invalidate any instructions that might have already been fetched from the previous contents
of the instruction cache after the isync.
5.2.2.4 icbt
The icbt instruction is a hint to establish a specified line in the cache, but the operation is not guaranteed to
establish the cache line. Therefore, the cache line can be snooped out, or the line can be flushed with an icbi
instruction.
When an icbt is received, the ICU checks the line fill buffers, fetch queue, and the tag to determine if the
requested line is already in the cache. If the requested line is not in the cache, the request drops into the fetch
queue. This operation uses the CT field of the instruction. If CT is set to ‘0’ and the request misses in the
instruction cache, a fill buffer is requested and the request is sent to the L2 cache. No action is required for a
cache hit if CT = ‘0’.
When CT = 2, no action is taken for the instruction cache, but the request is made to the L2 cache. Because
icbt does not guarantee the data is written to the cache, the instruction cache controller does not require a
response from the L2 cache controller.
Version 2.2
July 31, 2014
Page 135 of 322
User’s Manual
5.2.2.5 icbtls
The icbtls instruction operates similarly to the icbt, except it locks the data in the cache on a write. This
means along with the read request, a lock signal is sent to the L2 on a fetch. For lines that are not already in
the cache, the lock bit is set on the line fill.
This instruction also makes use of the CT field. When CT = 2, only the line in the instruction cache is locked.
If the line is already in the instruction cache, the line is locked and no request is made to the L2 cache. When
CT = 2, a control signal is sent to the L2 cache controller indicating the line is also to be locked in the L2.
Again the instruction cache does not require a response from the L2 cache.
5.2.2.6 icread
This instruction reads the content of a specified physical location in the instruction cache and stores the data
into debug registers. The cache controller does no address translation or exception processing for this
instruction. Because only the content of a specific cache location is accessed, the icread request uses a
modified format for the EA. Table 5-3 describes the EA format for the icread instruction.
Table 5-3. EA Format icread
Address Bits
Description
0:16
Unused.
17:18
Instruction cache way.
19:26
Instruction cache index.
27:29
Word address within L1 instruction cache line.
30:31
Unused.
Note: The icread instruction is not sent to the L2 cache.
Note: The PowerPC 476FP core does not automatically synchronize context between an icread instruction
and the subsequent mfspr instructions that read the results of the icread instruction into general purpose
registers (GPRs). To guarantee that the mfspr instructions obtain the results of the icread instruction, a
sequence such as the following example must be used:
icread
regA,regB
isync
mficdbdr0
mficdbdr1
mficdbtrh
mficdbtrl
regC
regD
regE
regF
#
#
#
#
#
#
#
Read cache information. The contents of GPR A and GPR B are added,
and the result is used to specify a cache line index to be read.
Ensure icread is completed before attempting to read results.
Move instruction information into GPR C.
Move instruction information into GPR D.
Move the high portion of the tag into GPR E.
Move the low portion of the tag into GPR F.
The following special purpose registers (SPRs) are written with the icread operation:
• ICDBDR0
• ICDBDR1
• ICDBTRL
• ICDBTRH
Page 136 of 322
Version 2.2
July 31, 2014
User’s Manual
5.2.2.7 Instruction Cache Debug Data Register 0 (ICDBDR0)
ICDBDR0
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bits
0:31
Description
Instruction word.
5.2.2.8 Instruction Cache Debug Data Register 1 (ICDBDR1)
Instruction predecode bits
0
1
2
3
4
5
6
Parity
7
8
9
Reserved
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bits
Description
0:7
Instruction predecode bits.
8:9
Parity.
10:31
Reserved.
0
1
2
LRU
3
4
5
6
Bit
Field Name
0:3
LRUV
4:9
LRU
10:13
LOCK
14:15
Reserved
16:18
LRUP
19:27
Reserved
28
CONF
29:31
Reserved
Version 2.2
July 31, 2014
7
LOCK
8
9
LRUP
Reserved
CONF
LRUV
Reserved
5.2.2.9 Instruction Cache Debug Tag Register Low (ICDBTRL)
Reserved
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Description
The LRU valid bits. One bit for each way in the set.
The LRU value for the set.
Lock bits. One bit for each way in the set.
LRU parity.
16
Even parity for array bits [0:5].
17
18
Way conflict bit.
Page 137 of 322
User’s Manual
ADDR
0
1
2
3
4
5
6
7
8
9
VALID
5.2.2.10 Instruction Cache Debug Tag Register High (ICDBTRH)
TAGP
EXTADDR
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bit
Field Name
Description
0:18
ADDR
Tag address.
19
VALID
Valid bit for this entry. This bit is the tag valid bit and obtained from the LRU array.
20:21
TAGP
Tag parity bits.
22:31
EXTADDR
Extended tag address.
5.2.2.11 Instruction Cache Parity Operations
The instruction cache contains parity bits and multihit detection hardware to protect against soft data errors.
Two types of errors can be detected by the instruction cache parity logic. In the first type, the parity bits stored
in the RAM array are checked against the appropriate data in the instruction cache line when the RAM line is
read for an instruction fetch. Note that a parity error is not signaled as a result of an icread instruction.
The second type of parity error that can be detected is a multihit. This type of error occurs when a tag address
bit is corrupted, leaving two tags in the instruction cache array that match the same input address. Multihit
errors can be detected on any instruction fetch. No parity errors of any kind are detected on speculative fetch
lookups or icbt lookups. Rather, such lookups are treated as cache hits and cause no further action until an
instruction fetch lookup at the offending address causes an error to be detected.
If a parity error is detected, and MSR[ME] is asserted (that is, machine check interrupts are enabled), the
instruction in the pipeline at the time of the exception, and MCSRR1 contains the old MSR context. The interrupt handler can query the Machine Check Status Register (MCSR) to determine if it was called because of
an instruction cache parity error and then must invalidate the instruction cache using the iccci instruction.
The handler returns to the interrupted process using the rfmci instruction.
If parity checking and machine check interrupts are enabled, instruction cache parity errors are always recoverable. Also note that the machine check interrupt is asynchronous; that is, the return address in the
MCSRR0 does not point at the instruction address that contains the parity error. Rather, the machine check
interrupt is taken as soon as the parity error is detected. Some instructions in progress are flushed and reexecuted after the interrupt, just as if the machine were responding to an external interrupt.
5.2.3 Speculative Prefetch
In general, all instructions are fetched speculatively. The ICU fetches (or prefetches) two code streams of
instructions: one for the sequential stream and another for a branch predicted stream.
Because the ICU submits four instructions at a time, it accesses the subsequent cache line or branch
predicted target instruction cache line even though it does not detect whether the program code requires
those instructions. Thus, the instructions are speculatively prefetched.
Page 138 of 322
Version 2.2
July 31, 2014
User’s Manual
The processor limits speculative fetches because overly-speculative fetches might reduce the cache usage
by removing or replacing cache lines with unnecessary prefetched cache lines.
All fetches from the L2 cache are performed using real (physical) addresses.
5.2.4 Exceptions
The instruction cache generates three different types of exceptions:
• Instruction storage interrupt
• Instruction-side unified translation lookaside buffer (UTLB) miss
• Instruction-side machine check
These exceptions are passed to the IU, where the instruction is tagged as a special instruction-side exception. It is then propagated to the instruction pipeline writeback stage as a faulty commitment performed by the
IU. These three exceptions are mutually exclusive. The faulty commitment consists of four no-op instructions
passed to the decode-and-issue (DISS) and marked with the error. These instructions remain valid until the
flush. Subsequent fetches to the L2 cache are blocked. Any data the L2 cache returns with machine-check
status is not written into the cache. Snoops, though, continue to be processed.
5.2.4.1 Instruction Storage Interrupt
The instruction cache only generates an instruction storage interrupt (ISI) during an execute protection violation instruction. This happens when a page requested from the memory management unit (MMU) is returned
without the execute permissions.
5.2.4.2 Instruction-Side UTLB Miss
The instruction cache generates an instruction-side UTLB miss when an MMU request for an instruction-side
TLB (ITLB) entry results in a UTLB miss.
5.2.4.3 Instruction-Side Machine Check
The instruction cache generates an instruction-side machine check whenever a hardware error in the ICU is
detected. This can be a read error from the L2 or a parity error from a number of sources. The instruction
cache contains the Instruction Cache Error Syndrome Register (ICESR), an SPR, to differentiate between the
various encountered errors. ICESR is in supervisor mode only and cleared on a write.
5.3 ICU Special Purpose Registers
Table 5-4 lists the SPRs used in the ICU.
Table 5-4. ICU Special Purpose Registers (Page 1 of 2)
Register
Name
Address
Read/Write
Privileged
ICESR
Instruction Cache Error Syndrome Register
x‘851’
R/W
Yes
ICDBDR0
Instruction Cache Debug Data Register, Instruction
x‘979’
R
Yes
Version 2.2
July 31, 2014
Page 139 of 322
User’s Manual
Table 5-4. ICU Special Purpose Registers (Page 2 of 2)
Register
Name
Address
Read/Write
Privileged
ICDBDR1
Instruction Cache Debug Data Register, predecode
x‘980’
R
Yes
ICDBTRL
x‘926’
R
Yes
ICDBTRH
x‘927’
R
Yes
5.3.1 Instruction Cache Error Syndrome Register (ICESR)
0
1
2
3
4
5
6
7
8
9
ICDAPE
ICDAHIT
ICINDXPE
ICSNPPE
ICTESPE
ICLOSPE
ICTESPE
ICLESPE
ICRDPE
ICTAPE
The ICESR provides a syndrome to differentiate between the different kinds of exceptions that can generate
the same interrupt type. Upon the generation of one of these interrupt types, the bit or bits corresponding to
the specific exception that generated the interrupt is set, and all other ICESR bits are cleared.
Reserved
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bit
Field Name
Description
0:3
ICRDPE
Instruction cache read interface parity error.
The bit number represents which word contains the error on the data bus. Multiple bits can be set.
4:7
ICTESPE
Tag even set parity error.
8:11
ICTESPE
Tag odd set parity error.
12
ICTAPE
Parity error in tag SRAM.
13:20
ICINDXPE
21:24
ICDAPE
Parity error in ISD.
25
ICLESPE
Parity error in LRU/valid SRAM, even set.
26
ICLOSPE
Parity error in LRU/valid SRAM, odd set.
27
ICSNPPE
Instruction cache snoop parity error.
A parity error exists on the snoop request received from the L2 cache.
28
ICDAHIT
Instruction cache data array hit.
This bit modifies ICDAPE when set. If both ICDAPE and ICDAHIT are set, there is a data parity
error on a load request that hits in the instruction cache. If only ICDAPE is set, the parity error is
from a request that serviced from the line fill buffers. If ICDAPE is not set, this bit should be ignored.
29:31
Reserved
Index of parity error in cache.
Represents bits 19:26 of the real address.
Core Configuration Registers, CCR0, CCR1, and CCR2, are provided to assist debug and function control.
See Section 2 Programming Model on page 33 for further details.
5.4 Self-Modifying Code
This example of self-modifying code illustrates the use of cache management instructions to enforce instruction cache coherency. In this example, the program executing on the PowerPC 476FP core stores new data
to memory for the purpose of later branching to and executing this new data, which consists of instructions.
Page 140 of 322
Version 2.2
July 31, 2014
User’s Manual
The following code example illustrates the required sequence for software to use when writing self-modifying
code. This example assumes that addr1 references a cacheable memory page.
stw
dcbst
msync
icbi
isync
regN, addr1
addr1
addr1
#
#
#
#
#
#
#
Store the data (an instruction) in regN to addr1 in the data cache.
Write the new instruction from the data cache to memory.
Wait until the data reaches the memory.
Invalidate addr1 in the instruction cache if it exists.
Flush any prefetched instructions within the ICU and instruction
unit and refetch them. An older copy of the instruction at addr1
might have already been fetched.
At this point, software can begin executing the instruction at addr1 and be guaranteed that the new instruction
is recognized.
5.5 Data Cache Controller
The data cache is a write-through model and thus all store data is written into the data cache and L2 cache
memory at the same time. In addition, the data cache is weakly consistent storage model, and it allows out-oforder loads and store data forwarding within the processor. In general stores are done in-order.
PowerPC storage has two levels: the L1 cache and L2 cache structure. This is in addition to system caches,
such as the L3 cache and system memory in some systems.
In PowerPC storage, the L1 and L2 cache line sizes are different. The L1 cache line size is 32 bytes, and the
L2 cache size is 128 bytes. Therefore, all cache operations are handled in a way that the L1 cache operates
on its four cache lines independently from the L2 cache. These independent cache operations are transparent to users and programmers.
The data cache unit (DCU) primarily consists of the following three subunits: data cache arrays, data cache
control (DCC), and the data translation lookaside buffer (DTLB).
The array subunit contains three arrays: the LRU, tag, and data arrays. The tag and data arrays are standard
SRAMs. The LRU array is a smaller, dual-port register array. Both the tag and data arrays are 4-way set
associative and have a pipelined, 2-cycle access. The DCU uses an LRU replacement algorithm that uses a
6-bit age vector in combination with way-locking to determine the best candidate for a replacement. The DCU
can receive 256 bits of read data at once from the bus and can send up to 128 bits of write data. The D-cache
is nonblocking, and cache coherency is supported by way of the L2 interface in write-through mode.
The DCC subunit includes all of the DCU pipeline controls, cache arbitration, the SPRs, and the snoop pipe.
It manages most of the data path flow and operation.
The DTLB is an 8-entry, fully associative cache that uses the EA to quickly calculate the real address. It is
accessed in parallel with the other three arrays. If a DTLB miss occurs, a request is made to the UTLB to
calculate the real address. The DCC also handles the execution of the PowerPC data cache management
instructions for touching (prefetching), flushing, invalidating, or zeroing cache lines, or for flash invalidation of
the entire cache. Resources for controlling and debugging the data cache operation are also provided.
Version 2.2
July 31, 2014
Page 141 of 322
User’s Manual
5.5.1 DCU Operations
The data cache is a nonblocking cache and operates in-order. However, load misses can be operated
out-of-order because the latencies of L2 cache versus memory differ. Load completion is kept in-order
because all instructions are committed (allowed to complete) in-order.
All loads and stores are operated based on operand alignment, or up to a two-word boundary. Therefore, any
operands that cross the boundary will replicate the operation. For example, an operand will create an extra
pipeline cycle.
5.5.1.1 Load Operations
Load instructions that reference cacheable memory L2 cache pages and miss in the data cache result in
cache line read requests being presented to the data-read PLB interface. Load operations to caching-inhibited memory pages, however, only access the bytes specifically requested, according to the type of load
instruction. This behavior of only accessing the requested bytes is only architecturally required. However, the
DCU enforces this requirement on any load to a caching-inhibited memory page. Subsequent L2 cache load
operations to the same caching-inhibited locations cause new requests to be sent to the data read PLB interface. Data from caching-inhibited locations is not reused from the data cache line fill data (DCLFD) buffer.
The DCU includes four DCLFD buffers, such that a total of four independent data cache line fill requests can
be in progress at one time. The DCU can continue to process subsequent load and store accesses while
these line fills are in progress.
The DCU also includes a 8-entry load miss queue (LMQ), which holds up to eight outstanding load instructions that have either missed in the data cache or accessed caching-inhibited memory pages. A load instruction in the LMQ remains there until the requested data arrives in the DCLFD buffer. The data is delivered to
the register file and the instruction is removed from the LMQ.
5.5.1.2 Store Operations
The processing of store instructions in the DCU is affected by several factors, including the caching-inhibited
(I), write-through (W), and guarded (G) storage attributes, and whether the allocation of data cache lines is
enabled for cacheable store misses. There are three different behaviors to consider:
• Whether a data cache line is allocated (if the line is not already in the data cache)
• Whether the data is written directly to memory or only into the data cache
• Whether the store data can be gathered with store data from previous or subsequent store instructions
before being written to memory; store data is in-order.
5.5.2 Store Gathering
In general, memory write operations caused by separate store instructions that specify locations in either
write-through or caching-inhibited storage can be gathered into one simultaneous access to memory in
16-byte units, though store gathering in cache inhibited or write-through cases can be disabled by CCR2
register setting. See CCR2[DCSTGW, DISTG] in Section 2.7.6 Core Configuration Register 2 (CCR2) on
page 73.
A given sequence of two store operations can only be gathered together if the targeted bytes are contained
within the same aligned quadword of memory whether they are contiguous with respect to each other. Subsequent store operations might continue to be gathered with the previously gathered sequence, subject to the
Page 142 of 322
Version 2.2
July 31, 2014
User’s Manual
same two rules (same aligned quadword and contiguous with the collection of previously gathered bytes). For
example, a sequence of three store word operations to addresses 4, 8, and 0 can all be gathered together
because the first two are contiguous with each other and the third (store word to address 0) is contiguous with
the gathered combination of the previous two.
An additional requirement for store gathering applies to stores that target caching-inhibited memory pages.
Specifically, a given store to a caching-inhibited page can only be gathered with previous store operations if
the bytes targeted by the given store do not overlap with any of the previously gathered bytes. In other words,
a store to a caching-inhibited page must be both contiguous and nonoverlapping with the previous store operations with which it is being gathered. This ensures that the multiple write operations associated with a
sequence of store instructions that each target a common caching-inhibited location will each be performed
independently on that target location.
Finally, a given store operation is not gathered with an earlier store operation if it is separated from the earlier
store operation by the msync or mbar instructions and if either of the two store operations reference a
memory page that is both guarded and caching inhibited.
5.5.3 Line Flush Operations
Because the data cache is write-through, L1 cache and L2 cache memory is always updated at the same
time. Therefore, no dirty bits exist in the data cache, and flush operations are not required.
The dcbi, dcbf, and dcbz operations are sent to the L2 cache with the operand RA, and the L2 cache operates accordingly while the data cache operates accordingly within the data cache.
The following list describes data-cache flush operations:
dcbf The L2 cache flushes the dirty line specified by RA to memory and invalidates the cache line. The L1
cache invalidates up to four hit cache lines of the same L2 cache line.
dcbi The L2 cache invalidates the cache line specified by RA. The L1 cache invalidates up to four hit cache
lines of the same L2 cache line.
dcbz The L2 cache writes zeros to the entire cache line specified by RA. The L1 cache writes zeros up to
four cache lines of the L2 cache line.
5.5.4 Storage Access Ordering
In general, the DCU can perform load and store operations out-of-order with respect to the instruction stream.
That is, the memory accesses associated with a sequence of load and store instructions can be performed in
memory in an order different from that implied by the order of the instructions. For example, loads can be
processed ahead of earlier stores, or stores can be processed ahead of earlier loads. Also, later loads and
stores that hit in the data cache can be processed before earlier loads and stores that miss in the data cache.
The DCU enforces the requirements of the ISA sequential execution model, such that the net result of a
sequence of load and store operations is the same as that implied by the order of the instructions. This
means, for example, if a later load reads the same address written by an earlier store, the DCU guarantees
that the load will use the data written by the store, and not the older prestore data. But the memory subsystem
might still detect a read access associated with an even later load before it detects the write access associated with the earlier store.
Version 2.2
July 31, 2014
Page 143 of 322
User’s Manual
If the DCU must make a read request to the data read L2 cache interface, and this request conflicts with (that
is, references one or more of the same bytes as) an earlier write request that is being made to the data write
L2 cache interface, the DCU withholds the read request from the data read L2 cache interface until the write
request has been acknowledged on the data write L2 cache interface. When the earlier write request has
been acknowledged, the read request is presented, and the L2 cache subsystem must ensure that the data
returned for the read request reflects the value of the data written by the write operation.
Conversely, if a write request conflicts with an earlier read request, the DCU withholds the write request until
the read request has been acknowledged.
The PowerPC system provides storage synchronization instructions to enable software to control the order in
which the memory accesses associated with a sequence of instructions are performed. Also, the affected
cache lines in the L1 and L2 caches should be snoop invalidated if an I/O device writes data over the PLB6.
5.5.5 Data Cache Coherency
Because the PowerPC 476FP data cache is write-through, and the L2 cache is write-back and inclusive of the
data cache, all PLB6 data transactions are monitored by the L2 cache and filtered for the L1 data cache
snoop-invalidate. This reduces the snoop traffic to the L1 data cache and improves the data cache performance. Because the L2 cache keeps the cache states, the data cache does not maintain and manipulate
many of those coherency protocol activities for the system.
Therefore, the data cache emphasizes processor performance rather than system level maintenance.
5.5.6 Data Cache Control and Debug
The PowerPC 476FP core provides various registers and instructions to control data cache operation and to
help debug data cache problems. See Section 2.7.4 Core Configuration Register 0 (CCR0) on page 69,
Section 2.7.5 Core Configuration Register 1 (CCR1) on page 70, and Section 2.7.6 Core Configuration
Register 2 (CCR2) on page 73 for more information.
5.5.7 Data Cache Management and Debug Instruction Summary
For detailed descriptions of the instructions summarized in this section, see the Power ISA Version 2.05
specification.
In the instruction descriptions, the term block, describes the unit of storage operated on by the cache block
instructions. For the PowerPC 476FP core, this is the same as a cache line.
Software uses the following instructions to manage the data cache.
5.5.7.1 Data Cache Block Zero (dcbz)
The data cache block zero (dcbz) instruction writes zeros to the specified cache line in both the L1 and L2
caches. Because the L2 cache line size is larger than the L1 cache line size, the DCU replicates this instruction such that there is a dcbz request for each L1 cache line that exists within the L2 cache line. Each replicated operation is treated as an independent instruction within the L-pipe, and only the first dcbz of the
replicated group is broadcast to the L2. All of the replicated operations search the L1 data cache and each
replicated operation that is a hit in the cache writes zeros to the L1 cache line. If the operation is a miss, no
resources are updated in the L1, but it is still sent to the L2 if it is the first replicated operation.
Page 144 of 322
Version 2.2
July 31, 2014
User’s Manual
The dcbz is broadcast to the L2 that uses the DCU store interface. It allocates an entry in the store buffer
queue (SBQ) and generates a dcbz request to the L2 once at the head of the queue.
A dcbz generates a write access control exception if write permission does not exist.
5.5.8 Data Cache Block Lock Clear (dcblc)
This instruction unlocks a line in the L1 or L2 cache. The DCU verifies that a valid address translation exists
and searches the tag array to determine if that address exists in the cache. If it does, the lock bit is cleared for
the matching way in the cache, but this can only occur when the instruction is committed. A lock clear operation does not invalidate the line from the cache. Even though the line might exist in both the L1 and L2
caches, the dcblc only unlocks the requested line in the cache specified by the CT field.
If the CT is set to the L2 cache only, the dcblc is broadcast to the L2 cache using the DCU read interface. It
requires allocation of a line fill buffer and generates a no-data request to the L2 cache. When the request is
accepted, the line fill buffer deallocates.
A dcblc generates a read access control exception if read permission does not exist.
5.5.9 Data Cache Block Store (dcbst)
The dcbst instruction flushes dirty data from the L2 cache. This instruction is effectively a no-op for the L1
because the L1 cache is write-through only and cannot contain dirty data.
The dcbst is broadcast to the L2 cache using the DCU store interface.
A dcbst generates a read access control exception if read permission does not exist.
5.5.10 Data Cache Block Flush (dcbf)
In local mode, which is described in the Power ISA Version 2.05 specifications, the dcbf instruction invalidates the line in processors in the system. Note that only L = 0 (local) mode is supported.
The dcbf instruction forces dirty data in the specified L2 cache line to be written to memory and then invalidates the line. Because the L1 is strictly a write-through cache, there is no dirty data in the L1. Thus, a dcbf to
the L1 only invalidates the line in the L1 cache. No data flush is required. Note that even though the specified
line might be in cache inhibited mode, it must still be searched and flushed from the cache if it exists.
Because this instruction operates on an L2 cache line, and the L2 cache line size is larger than the L1 cache
line size, the DCU replicates this instruction such that there is a dcbf request for each L1 cache line that
exists within the L2 cache line. Each replicated operation is treated as an independent instruction within the
L-pipe, and only the first operation of the replicated group is sent to the L2. All of the replicated operations
search the L1 data cache, and each replicated operation that hits in the cache invalidates the cache line that
is hit. If the operation is a miss, no resources are updated in the L1, but it is still sent to the L2 cache if it is the
first replicated operation. A dcbf cannot invalidate a line in the L1 cache until it is committed.
A dcbf to a locked line clears the lock in both the L1 and L2 caches.
The dcbf is broadcast to the L2 cache by using the DCU store interface. It allocates an entry in the SBQ and
generates a no-data request to the L2 cache.
A dcbf instruction generates an exception if any of the following conditions occur:
Version 2.2
July 31, 2014
Page 145 of 322
User’s Manual
• The dcbf instruction is executed in user mode, and the DULXE bit (MMUCR[4]) is set.
• The target block of the dcbf does not have read permission.
5.5.11 Data Cache Block Invalidate (dcbi)
The dcbi and dcbf instructions perform the same function for both the L1 and L2 caches. That is, the dcbi
flushes dirty data in the L2 cache out to memory. Even though dcbi does not have the L field, it operates in
local mode, just like the dcbf instruction.
The dcbi instruction invalidates a line in the L1 and L2 caches. The L2 treats the dcbi as a dcbf (that is, it
invalidates the line and flushes the data). Even though the specified line might be in cache inhibited mode, it
must still be searched and invalidated in the cache if it exists.
Because this instruction operates on an L2 cache line, and the L2 cache line size is larger than the L1 cache
line size, the DCU replicates this instruction such that there is a dcbi request for each L1 cache line that
exists within the L2 cache line. Each replicated operation is treated as an independent instruction within the
L-pipe, and only the first operation of the replicated group is sent to the L2 cache. All of the replicated operations search the L1 data cache, and each replicated operation that hits in the cache invalidates the cache line
that is hit. If the operation is a miss, no resources are updated in the L1, but it is still sent to the L2 cache if it
is the first replicated operation. A dcbi cannot invalidate a line in the L1 cache until it is committed.
A dcbi to a locked line clears the lock in both the L1 and L2 caches.
The dcbi is broadcast to the L2 using the DCU store interface. It allocates an entry in the SBQ and generates
a no-data request to the L2 cache.
The dcbi generates an exception if any of the following conditions are true:
• The dcbi instruction is executed in user mode.
• The target block of the dcbi does not have read permission.
5.5.12 Data Cache Invalidate (dci)
This instruction invalidates the entire L1 or L2 data cache based on the CT field. If destined for the L2 cache,
the DCU broadcasts one dci instruction to the L2 cache. If destined for the L1cache, the DCU generates
replicated dci requests until the entire data cache is invalidated. Each dci can invalidate two sets in the data
cache. Nonfinal replicated pieces require virtual commitment before invalidating any cache lines. The final
piece requires regular commitment.
A dci instruction can only go to the L1 or L2 cache, thus it is possible to violate the inclusive nature of the L1
and L2 cache relationship by only performing a dci instruction to the L2 cache. In other words, it is only
permissible to perform a dci instruction to the L1 and L2, in that order, followed by an isync.
Power ISA specifies that software must place an msync instruction after a dci instruction to guarantee that
the dci completes before any subsequent data storage accesses are performed. However, instead of the
msync instruction, isync suffices for a dci to the L1 cache.
If the L2 cache is unified, invalidating the L2 cache requires the software to invalidate both the instruction-side
and data-side L1 caches.
The dci is broadcast to the L2 cache using the DCU store interface. It allocates an entry in the SBQ and
generates a no-data request to the L2 cache.
Page 146 of 322
Version 2.2
July 31, 2014
User’s Manual
The dci generates an exception if not in supervisor mode.
5.5.13 Data Cache Block Touch (dcbt)
This instruction gives a hint that the specified line might be accessed in the future. It is considered a speculative operation, and it affects either the L1 or the L2 cache, based on the CT field. Both types of dcbt instructions send a request to the L2 cache with the difference being that an L2-only dcbt request does require data
to be returned to the L1. A request can be sent to the L2 cache before commitment because the instruction is
only a hint.
If CT is set to L1, the DCU searches the L1 cache for the appropriate cache line. If it is a hit in the cache, the
dcbt does nothing and is not broadcast to the L2 cache. If it is a miss, it is allocated a line fill buffer (CT = L1)
and sends a request for data to the L2 cache.
If CT is set to L2, the L1 cache is not searched and a request is sent to the L2 as a dcbt with no data
returned.
The dcbt is broadcast to the L2 using the DCU read interface. It allocates a line fill buffer and generates
either a data or no-data request, based on CT field and L1 hit status.
A dcbt does not generate an exception if read permission does not exist. If this occurs, the dcbt becomes a
no-op and has no effect on the L1. It is also a no-op if it references a cache-inhibited or guarded page.
5.5.14 Data Cache Block Touch with Lock Set (dcbtls)
This instruction can either load and lock a line in the cache or lock a line that already exists in the cache. It is
not a speculative operation, and it affects either the L1 or the L2 cache based on the CT field. The dcbtls
instruction allocates a cache line if a miss occurs and then locks the line in either the L1 or L2 cache. There
are several possible actions for the dcbtls:
• If CT = L1 and dcbtls is a cache miss in the L1, a line fill buffer is allocated and eventually goes into the
L1 cache as a locked line.
• If CT = L1, dcbtls is a cache hit, and the line is currently unlocked, the line is locked.
• If CT = L1, dcbtls is a cache hit, and the line is currently locked, it is treated as a no-op.
If CT = L2, a line fill buffer is allocated and a no-data dcbtls request is sent to the L2 cache. This does not
change anything in the L1.
The dcbtls is broadcast to the L2 using the DCU read interface. It allocates a line fill buffer and generates
either a data or no-data request, based on the CT field and L1 hit status.
A dcbtls generates an exception if any of the following conditions occur:
• The dcbtls instruction is executed in user mode.
• The target block of the dcbtls does not have read permission.
There is no exception generated if in an overlocked condition exists in the cache.
Note:
A unique scenario in the L1 cache can occur in which a locked line is replaced even though the lock was
recently set. Upon allocation of a line fill buffer, the destination way in the set is selected based on current
LRU and lock status. A subsequent dcbtls can match the cache line that the line fill buffer will eventually
Version 2.2
July 31, 2014
Page 147 of 322
User’s Manual
replace. In this scenario, the dcbtls might set the lock bit before the line fill occurs. Therefore, when the line
fill occurs, it overwrites the entry, and the locked line is removed. To prevent this scenario, an lwsync must
be placed before the dcbtls. The requirements for this scenario follow:
• A cacheable line fill buffer exists that will replace way X, set Y in the L1 exists.
• A dcbtls instruction hits in the L1, an unlocked location way X of set Y.
• A dcbtls sets the lock bit for way X of set Y before the occurrence of the line fill.
• When the cacheable line fill buffer is ready to perform the line fill, it replaces the locked line in way X, set Y.
5.5.15 Data Cache Block Touch for Store (dcbtst)
This instruction gives a hint that the specified line might be written in the near future. It is considered a speculative operation and it affects either the L1 or the L2 based on the CT field. Both types of dcbtst instructions
send a request to the L2 with the difference being that an L2-only dcbtst request does not require data to be
returned to the L1 cache. A request can be sent to the L2 before commitment because the instruction is only
a hint.
If the CT field is set to the L1, the DCU searches the L1 cache for the appropriate cache line. If it is a hit in the
cache, the dcbtst instruction does nothing and is not broadcast to the L2. If it is a miss, it allocates a line fill
buffer (CT = L1) and sends a request for data to the L2.
If CT is set to L2, the L1 cache is not searched and a request is sent to the L2 as a dcbtst with no data
returned.
The dcbtst instruction is broadcast to the L2 using the DCU read interface. It allocates a line fill buffer and
generates either a data or no-data request, based on the CT field and L1 hit status.
A dcbtst instruction does not generate an exception if write permission does not exist. If this occurs, the
dcbtst instruction becomes a no-op and has no effect on the L1. It is also a no-op if it references a cache
inhibited or guarded page.
5.5.16 Data Cache Block Touch For Store with Lock Set (dcbtstls)
This instruction can either load and lock a line in the cache (with the expectation that a store to the specified
line will occur in the near future), or lock a line that already exists in the cache. It is not a speculative operation
and it affects either the L1 or the L2 cache based on the CT field. The dcbtstls instruction allocates a cache
line if a miss occurs, and then locks the line in either the L1 or L2 cache. There are several possible actions
for the dcbtstls instruction:
• If CT = L1 and dcbtstls is a cache miss in the L1 cache, a line fill buffer is allocated and eventually goes
into the L1 cache as a locked line.
• If CT = L1, dcbtstls is a cache hit, and the line is currently unlocked, the line will be locked
• If CT = L1, dcbtstls is a cache hit, and the line is currently locked, the line is treated as a no-op
If CT = L2, a line fill buffer is allocated and a no-data dcbtstls request is sent to the L2. This does not change
anything in the L1.
A dcbtstls generates an exception if any of the following conditions occur:
• The dcbtstls instruction is executed in user mode.
• The target block of the dcbtstls does not have read permission.
Page 148 of 322
Version 2.2
July 31, 2014
User’s Manual
No exception is generated if in an overlocked condition exists in the cache.
The dcbtstls is broadcast to the L2 using the DCU read interface. It allocates a line fill buffer and generates
either a data or no-data request, based on the CT field and L1 hit status.
Note:
A unique scenario in the L1 cache can occur in which a locked line is replaced even though the lock was
recently set. Upon allocation of a line fill buffer, the destination way in the set is selected, based on current
LRU and lock status. A subsequent dcbtls can match the cache line that the line fill buffer will eventually
replace. In this scenario, the dcbtstls might set the lock bit before the line fill occurs. Therefore, when the line
fill occurs, it overwrites the entry and the locked line is removed. To prevent this scenario, a lwsync must be
placed before the dcbtstls. The requirements for this scenario follow:
• A cacheable line fill buffer exists that will replace way X, set Y in the L1 cache exists.
• A dcbtstls instruction hits in the L1 cache, an unlocked location way X of set Y.
• A dcbtstls sets the lock bit for way X of set Y before the occurrence of the line fill.
• When the cacheable line fill buffer is ready to perform the line fill, it replaces the locked line in way X, set Y.
5.5.17 Data Cache Read (dcread)
The dcread instruction reads the content of a specific, physical location in the data cache and stores the data
in debug registers. The DCU does not do any address translation nor exception processing for this instruction. Because only the content of a specific cache location is accessed, a dcread request uses a modified
format for the EA.
Address Bits
Description
0:16
Unused.
17:18
Data cache way.
19:26
Data cache index.
27:29
Word address within L1 Data cache line.
30:31
Unused.
The dcread instruction is not broadcast to the L2.
The dcread instruction generates an exception if it is not in supervisor mode.
5.5.18 Memory Barrier Instructions
The PowerPC 476FP processor provides three types of memory barrier or storage barrier instructions:
msync, mbar, and lwsync. See Power ISA Version 2.05 for more details.
The msync instruction is equivalent to sync L = 0, lwsync is equivalent to sync L = 1, and mbar is intended
to be similar to eieio, but in the PowerPC 476FP implementation, msync is similar to mbar. However, for the
future compatibility, it is recommended to use mbar for mbar functionality for programming. In other words,
do not substitute msync for mbar.
Version 2.2
July 31, 2014
Page 149 of 322
User’s Manual
5.5.18.1 Memory Synchronization (msync)
The msync instruction is the same as sync L = 0 in the Power ISA Version 2.05 specification.
The msync instruction blocks the issue of all further instructions until the IU receives a signal from the L2
indicating that the operation has completed. The instruction must proceed down the L-pipe to be broadcast to
the L2.
The DCU is responsible only for confirmation, commitment, and broadcasting the instruction to the L2 using
the store interface. The msync must ensure that all load and store operations have completed before
sending the request to the L2. Thus, it cannot allocate a line in the SBQ until the LMQ, line fill buffers, and
store hit queue are empty. It does not have to wait for the SBQ to be empty. Because the SBQ is used to
send the msync to the L2 and is a FIFO queue, all store operations are guaranteed to leave the DCU before
the msync. The msync must be committed before leaving the SBQ, but can allocate an entry in the SBQ
before receiving a commitment.
The msync instruction also has a system synchronization function, and it guarantees completion of all
preceding operations.
5.5.18.2 Memory Barrier (mbar)
The mbar instruction is the same as sync L = 0 in the Power ISA Version 2.05 specification.
The mbar instruction blocks the issue of all further instructions until the IU receives a signal from the L2 indicating that the operation has completed. The instruction must proceed down the L-pipe to be broadcast to the
L2.
The DCU is responsible only for confirmation, commitment, and broadcasting the instruction to the L2 using
the store interface. The mbar must ensure that all load and store operations have completed before sending
the request to the L2. Thus, it cannot allocate a line in the SBQ until the LMQ, line fill buffers, and store hit
queue are empty. It does not have to wait for the SBQ to be empty. Because the SBQ is used to send the
mbar to the L2 and is a FIFO queue, all store operations are guaranteed to leave the DCU before the mbar.
The mbar must be committed before leaving the SBQ, but can allocate an entry in the SBQ before receiving
commitment.
5.5.18.3 Lightweight Sync (lwsync)
The lwsync instruction is the same as sync L = 1 in the Power ISA Version 2.05 specification.
The lwsync instruction blocks the issue of all further instructions until the operation has completed. The operation is considered complete when the request is acknowledged by the L2 cache.
5.5.19 Core Configuration Registers (CCR0, CCR1, and CCR2)
Core Configuration Registers CCR0, CCR1, and CCR2, are provided to assist debug and function control.
See Section 2 Programming Model on page 33 for further details.
Page 150 of 322
Version 2.2
July 31, 2014
User’s Manual
5.5.20 dcbt and dcbtst Operation
The dcbt instruction is typically used as a hint to the processor that a particular block of data is likely to be
referenced by the executing program in the near future. Thus, the processor can begin filling that block into
the data cache so that when the executing program eventually performs a load from the block, it is already
present in the cache, thereby improving performance.
The dcbtst instruction is typically used for a similar purpose, but specifically for cases where the executing
program is likely to store to the referenced block in the near future. The difference in the purpose of the
dcbtst instruction relative to the dcbt instruction is only relevant within shared-memory systems with hardware-enforced support for cache coherency. In such systems, the dcbtst instruction attempts to establish the
block within the data cache in such a fashion that the processor is most readily able to subsequently write to
the block.
By default, the dcbt and dcbtst instructions are ignored if the filling of a requested cache block cannot be
immediately commenced, and waiting for such commencement might result in the DCU execution pipeline
being stalled. For example, the dcbt instruction is ignored if all three DCLFD buffers are already in use and
execution of subsequent storage access instructions is pending.
However, the dcbt and dcbtst instructions can also be used as a convenient mechanism for setting up a
fixed, known environment within the data cache. This is useful for establishing contents for cache line locking,
deterministic performance on a particular sequence of code, or debugging of low-level hardware and software
problems.
Because the PowerPC 476FP core supports hardware coherency, these touch instructions are not guaranteed operations, and therefore, the target cache line might not be accelerated under certain conditions such
as snooped cases and the DCU pipeline might be stalled as mentioned previously.
5.5.21 dcread Operation
The dcread instruction can be used to directly read both the tag information and a specified data word in a
specified entry of the data cache. The data word is read into the target GPR specified in the instruction
encoding. The tag information is read into a pair of SPRs, the DCDBTRH Register, and the DCDBTRL
Register. The tag information can subsequently be moved into the GPRs using mfspr instructions.
The execution of the dcread instruction generates the equivalent of an EA, which is then used to select a
specific data word from a specific cache line, as shown in Table 5-5 on page 151.
Table 5-5. Effective Address Format for icread and dcread
Address Bits
Description
0:16
Unused.
17:18
Cache way.
19:26
Data cache index.
27:29
Word address within L1 cache line.
30:31
Unused.
The EA generated by the dcread instruction must be word-aligned (that is, EA[30:31] must be 0); otherwise,
it is a programming error and the result is undefined.
Version 2.2
July 31, 2014
Page 151 of 322
User’s Manual
If the CCR0[CRPE] bit is set, execution of the dcread instruction also loads parity information into the
DCDBTRL Register. Note that the DCDBTRL[DATAP] field, unlike all the other parity fields, loads the check
values of the parity instead of the raw parity values. That is, the DATAP field will always load with zeros
unless a parity error has occurred or has been inserted intentionally using the appropriate bits in the CCR1.
Execution of the dcread instruction is privileged and is intended for use for debugging purposes only.
Note: The use of the dcread instruction might not provide correct information when the DCU is still in the
process of performing cache operations associated with previously executed instructions such as line fills and
line flushes. Also, the PowerPC 476FP core does not automatically synchronize context between a dcread
instruction and the subsequent mfspr instructions that read the results of the dcread instruction into GPRs.
To guarantee that the dcread instruction operates correctly and that the mfspr instructions obtain the results
of the dcread instruction, a sequence such as the following must be used:
msync
dcread
regT,regA,regB
isync
mfdcdbtrh
mfdcdbtrl
regD
regE
#
#
#
#
#
#
#
#
Ensure that all previous cache operations have completed.
Read cache information; the contents of GPR A and GPR B are
added and the result is used to specify a cache line index to be
read. The data word is moved into GPR T and the tag information
is read into DCDBTRH and DCDBTRL.
Ensure dcread completes before attempting to read the results.
Move the high portion of the tag into GPR D.
Move the low portion of the tag into GPR E.
LRUV
0
1
2
LRU
3
4
5
Bit
Name
0:3
LRUV
4:9
LRU
10:13
LOCK
14:15
Reserved
16:17
LRUP
18:27
Reserved
28:31
DATAP
6
Page 152 of 322
7
LOCK
8
9
Reserved
5.5.22 Data Cache Debug Tag Register Low (DCDBTRL)
LRUP
Reserved
DATAP
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Description
LRU valid bits.
One bit for each way in the set.
The LRU value for the set.
Lock bit.
One bit for each way in the set.
LRU Parity
Data parity for the word being accessed.
Version 2.2
July 31, 2014
User’s Manual
VALID
5.5.23 Data Cache Debug Tag Register High (DCDBTRH)
ADDR
0
1
2
3
4
5
6
7
8
9
TAGP
EXTADDR
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bits
Field Name
Description
0:18
ADDR
Tag address. RA[10:28].
19
VALID
Valid bit for this entry.
20:21
TAGP
Tag parity bits. Parity of RA[0:28].
22:31
EXTADDR
Extended tag address. RA[0:9].
5.5.24 Data Cache Parity Operations
The data cache contains parity bits and multihit detection hardware to protect against soft data errors. Both
the data cache tags and data are protected. The data parity is byte-based; 258 bits of data and 32 bits of parities per cache line. The tag parity is based on stored real address based because it is real address tagged; bit
0 to bit 28 RA has one parity bit. In addition, there is one parity bit for six LRU bits, one parity bit for 4 valid
bits, and 4 lock bits per cache line.
If a parity error is detected and the MSR[ME] is asserted (that is, machine check interrupts are enabled), the
instruction in the pipeline at the time of the exception, and MCSRR1 contains the old (MSR) context. The
interrupt handler can query the MCSR, MCSR[DC] being set, to determine whether it was called because of a
data cache parity error, and is then expected to either invalidate the data cache (using dci) or to invoke the
operating system to end the process or reset the processor, as appropriate. The handler returns to the interrupted process using the rfmci instruction.
5.5.24.1 Data Cache Exception Status Register (DCESR)
0
1
2
3
4
5
6
Bit
Name
0:3
DCRDPE
4:7
Reserved
Version 2.2
July 31, 2014
7
8
9
Reserved
DCDAAPU
DCDAHIT
DCINDXPE
DCSNPPE
DCOSPE
DCLRUPE
DCESPE
Reserved
Reserved
DCTAPE
DCRDPE
DCDAPE
The Data Cache Exception Status Register (DCESR) provides further details about data cache parity errors.
This register provides what operation caused an error, which interface detected an error, and which cache
line index is affected. It also includes a multihit error case.
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Description
Data cache read interface parity error.
The bit number represents which word contains the error on the data bus. Multiple bits can be set.
Page 153 of 322
User’s Manual
Bit
Name
Description
8:11
DCESPE
Data cache even set parity error.
This field is used for tag array parity errors and can have multiple bits set. Errors can be reported
even though the request is for an odd set. 0:3 ways are in the set with (addr[19] XOR addr[26]) equal
to ‘0’
12:15
DCOSPE
Data cache odd set parity error
Multiple bits can be set. 0:3 ways are in the set with (addr[19] XOR addr[26]) equal to ‘1’. This field is
used for tag array parity errors and can have multiple bits set. Errors can be reported even though the
request is for an even set.
16:22
DCINDXPE
23
DCDAPE
Data cache data array parity error.
If set, the requested data has a parity error. If the request is a miss, no error is reported.
24
DCTAPE
Data cache tag array parity error.
If set, at least one of the tags associated with a way in either the even or odd set has a parity error,
the designation of which way is specified by the DCESPE and DCOSPE fields.
25
Reserved
26
DCLRUPE
Data cache LRU/valid/lock parity error.
A parity error exists in either the even or odd LRU/Valid/Lock field for the requested set.
27
DCSNPPE
Data cache snoop parity error.
28
DCDAHIT
Data cache data array hit.
This bit modifies DCDAPE when set. If both DCDAPE and DCDAHIT are set, there is a data parity
error on a load request that hits in the data cache. If only DCDAPE is set, the parity error is from a
request that serviced from the line fill buffers. If DCDAPE is not set, this bit should be ignored.
29
DCDAAPU
Data cache data array APU.
This bit modifies DCDAPE when set. If both DCDAPE and DCDAAPU are set, there is a data parity
error on a load request for the APU. If only DCDAPE is set, the parity error is from a CPU request. If
DCDAPE is not set, this bit should be ignored.
30:31
Reserved
Represents bits 20:26 of the real address. Bit 19 can be inferred from the DCESPE and DCOSPE
fields.
If the interrupt handler is executed before a parity error can corrupt the state of the machine, the executing
process is recoverable, and the interrupt handler can invalidate the data cache and resume the process. To
guarantee that all parity errors are recoverable, user code must have two characteristics. First, it must mark
all cacheable data pages as write-through instead of copy-back. Second, the software-settable bit
(CCR0[PRE]) must be set. This bit forces all load instructions to stall in the last stage of the load/store pipeline for one cycle, but only if required to ensure that parity errors are recoverable. The pipeline stall guarantees that any parity error is detected. Thus, the resulting machine check interrupt is taken before the load
instruction completes and the target GPR is corrupted. Setting CCR0[PRE] degrades overall application
performance. However, if the state of the load/store pipeline is such that a load instruction stalls in the last
stage for some reason unrelated to parity recoverability, CCR0[PRE] does not cause an additional cycle stall.
Note that the parity exception type machine check interrupt is asynchronous; that is, the return address in the
MCSRR0 does not necessarily point at the instruction address that detected the parity error in the data
cache. Rather, the machine check interrupt is taken as soon as the parity error is detected, and some instructions in progress can get flushed and re-executed after the interrupt as if the machine were responding to an
external interrupt.
Page 154 of 322
Version 2.2
July 31, 2014
User’s Manual
5.5.25 Simulating Data Cache Parity Errors for Software Testing
Parity errors occur in the cache infrequently and unpredictably. Therefore, the CCR1[DCDPEI],
CCR1[DCTPEI], CCR1[DCUPEI], CCR1[DCMPEI], and CCR1[FCOM] fields can be used to simulate the
effect of a data cache parity error so that interrupt handling software can be exercised.
The 39 data cache parity bits in each cache line contain one parity bit per data byte (that is, 32 parity bits per
32 byte line) plus the following parity bits:
• Two parity bits for the address tag (the valid (V) bit is not included in the parity bit calculation for the tag)
•
One parity bit for the 4-bit U field on the line
• A parity bit for each of the four modified (dirty) bits on the line
There are two parity bits for the tag data because the parity is calculated for alternating bits of the tag field to
guard against a single particle strike event that upsets two adjacent bits. The other data bits are physically
interleaved in such a way as to allow the use of a single parity bit per data byte or other field. All parity bits are
calculated and stored as the line is initially filled into the cache. In addition, the data and modified (dirty) parity
bits (but not the tag and user parity bits) are updated as the line is updated, as the result of executing a store
instruction or dcbz.
Usually, parity is calculated as the even parity for each set of bits to be protected, which the checking hardware expects. However, if any of the CCR1[DCTPEI] bits are set, the calculated parity for the corresponding
bits of the tag are inverted and stored as odd parity. Likewise, if the CCR1[DCUPEI] bit is set, the calculated
parity for the user bits is inverted and stored as odd parity. Similarly, if the CCR1[DCDPEI] bit is set, the parity
for any data bytes that are written, either during the process of a line fill or by execution of a store instruction,
is set to odd parity. Then, when the data stored with odd parity is subsequently loaded, it causes a parity
exception type machine check interrupt and exercises the interrupt handling software. The following
pseudocode is an example that uses the CCR1[DCDPEI] field to simulate a parity error on byte 0 of a target
cache line:
dcbt <target line address>
msync
mtspr CCR1, Rx
isync
stb <target byte address>
msync
mtspr CCR1, Rz
isync
lb <byte 0 of target line>
#
#
#
#
#
#
#
#
#
Get the target line into the cache.
Wait for the dcbt.
Set CCR1[DCDPEI].
Wait for the CCR1 context to update.
Store some data at byte 0 of the target line.
Wait for the store to finish.
Reset CCR1[ICDPEI0].
Load byte causes interrupt.
If the CCR1[DCMPEI] bit is set, the parity for any modified (dirty) bits that are written, either during the
process of a line fill or by execution of a store instruction or dcbz, is set to odd parity. If the CCR1[FFF] bit is
also set in addition to CCR1[DCMPEI], the parity for all four modified (dirty) bits is set to odd parity. Store
access to a cache line that is already in the cache and in a memory page for which the write-through storage
attribute is set does not update the modified (dirty bits) or the modified (dirty) parity bits. Thus for these
accesses, the CCR1[DCMPEI] setting has no effect.
The CCR1[FCOM] bit enables the simulation of a multihit parity error. When set, it causes a dcbt to seem to
be a miss, initiating a line fill even if the line is already in the cache. Thus, this bit allows the same line to be
filled to the cache multiple times, which generates a multi-hit parity error when an attempt is made to read
data from those cache lines. The following pseudocode is an example that uses the CCR1[FCOM] field to
simulate a multihit parity error in the data cache:
Version 2.2
July 31, 2014
Page 155 of 322
User’s Manual
mtspr CCR0, Rx
msync
mtspr CCR1, Ry
isync
msync
mtspr CCR1, Rz
isync
br <byte 0 of target line>
Page 156 of 322
#
#
#
#
#
#
#
#
#
#
Set CCR0[GDCBT].
This dcbt fills a first copy of the target line, if necessary.
Wait for the fill to finish.
Set CCR1[FCOM].
Fill a second copy of the target line.
Wait for the fill to finish.
Reset CCR1[FCOM].
Load byte causes interrupt.
Version 2.2
July 31, 2014
User’s Manual
6. Timer Facilities
The PowerPC 476FP provides four timer facilities as described in the Power ISA Specification V2.05: a time
base, a decrementer (DEC), a fixed-interval timer (FIT), and a watchdog timer. These facilities share the
same source clock frequency and can support the following functions:
• Time of day
• General software timing
• Periodic service of peripherals
• General system maintenance
• System error recovery
Figure 6-1 shows the relationship between these facilities and the clock source.
Figure 6-1. Relationship of Timer Facilities to the Time Base
New Timer Divide Select
External
Timer Clock
MUX
Time Base (Incrementer)
CPU Clock
Divide by 4
Divide by 8
CCR1[TSS]
Time Base Lower (32 bits)
0
Time Base Upper (32 bits)
31
0
31
Divide by 16
TBU[31] (233 clocks)
TBL[3] (229 clocks)
CCR1[TCS] [22:23]
TBL[7] (225 clocks)
Watchdog Timer
Period
TBL[11] (221 clocks)
TBL[7] (225 clocks)
Fixed-Interval Timer
Period
Decrementer (DEC)
DEC (32 bits)
0
31
Zero Detection (Decrementer Exception)
Version 2.2
July 31, 2014
Timer Facilities
Page 157 of 322
User’s Manual
Table 6-1 summarizes the timer registers in the PowerPC 476FP processor.
Table 6-1. Timer Register Summary
Register Name
Register Short Name
Read
Address
Write
Address
See Page
Time Base Lower
TBL
x‘10C’
x‘11C’
158
Time Base Upper
TBU
x‘10D’
x‘11D’
158
Decrementer
DEC
x‘016’
x‘016’
159
Decrementer Autoreload
DECAR
Write only
x‘036’
159
TCR
x‘154’
x‘154’
163
TSR
x‘150’
x‘150’ (clear)
x‘350’ (set)
164
6.1 Time Base
The time base is a 64-bit register which increments once during each period of the source clock, and provides
a time reference. Access to the time base is through two Special Purpose Registers (SPRs). The Time Base
Upper (TBU) SPR contains the high-order 32 bits of the time base, and the Time Base Lower (TBL) SPR
contains the low-order 32 bits.
Software access to TBU and TBL is nonprivileged for reads but is privileged for writes. Therefore, different
SPR numbers are used for reading than for writing. TBU and TBL are written using the mtspr instruction and
are read using the mfspr instruction.
The period of the 64-bit time base registers is approximately 1462 years for a 400 MHz clock source. The
time base value itself does not generate any exceptions, even when it wraps. For most applications, the time
base is set once at system reset and only read thereafter. Note that fixed-interval timer and watchdog timer
exceptions (discussed in Section 6.3 Fixed-Interval Timer on page 160 and Section 6.4 Watchdog Timer on
page 161) are caused by ‘0’ to ‘1’ transitions of selected bits from the time base. Transitions of these bits
caused by software alteration of the time base have the same effect as transitions caused by normal incrementing of the time base. The TBL and TBU Registers are shown here.
Time Base Lower
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:63
Time Base Lower
Description
Low-order 32 bits of the time base.
Time Base Upper
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:63
Time Base Upper
Timer Facilities
Page 158 of 322
Description
High-order 32 bits of the time base.
Version 2.2
July 31, 2014
User’s Manual
6.1.1 Reading the Time Base
The following code provides an example of reading the time base. TBU and TBL are the symbolic names for
the TBU and TBL registers.
loop:
mfspr
mfspr
mfspr
cmpw
bne
Rx,TBU
Ry,TBL
Rz,TBU
Rz, Rx
loop
#
#
#
#
#
Read TBU into general purpose register (GPR) Rx.
Read TBL into GPR Ry.
Read TBU again, this time into GPR Rz.
See if old = new.
Loop/reread if rollover occurred.
The comparison and loop ensure that a consistent pair of values is obtained.
6.1.2 Writing the Time Base
The following code provides an example of writing the time base.
lwz
lwz
li
mtspr
mtspr
mtspr
Rx, upper
Ry, lower
Rz, 0
TBL,Rz
TBU,Rx
TBL,Ry
# Load 64-bit time base value into GPRs Rx and Ry.
#
#
#
#
Set GPR Rz to 0.
Force TBL to 0 (thereby preventing wrap into TBU).
Set TBU to initial value.
Set TBL to initial value.
6.2 Decrementer and Decrementer Autoreload Registers
The Decrementer Register (DEC) is a 32-bit privileged SPR that decrements at the same rate that the time
base increments. The DEC is read using mfspr and is written using mtspr. When a nonzero value is written
to the DEC, it begins to decrement with the next time base clock. A decrementer exception is signaled when
a decrement occurs on a DEC count of 1, and the decrementer interrupt status (DIS) field of the Timer Status
Register (TSR[DIS]; see Section 6.6 Timer Status Register on page 164) is set. A decrementer interrupt
occurs if it is enabled by both the decrementer interrupt enable (DIE) field of the Timer Control Register
(TCR[DIE]; see Section 6.5 Timer Control Register on page 163) and by the external interrupt enable (EE)
field of the Machine State Register (MSR[EE]; see Section 7.4.1 Machine State Register (MSR) on page 173.
Section 7 Processor Interrupts and Exceptions on page 167 provides more information about the handling of
decrementer interrupts.
The decrementer interrupt handler software should clear TSR[DIS] before re-enabling MSR[EE] to avoid
another decrementer interrupt caused by the same exception (unless TCR[DIE] is cleared instead).
The behavior of the DEC itself upon a decrement from a DEC value of 1 depends on which of two modes it is
operating in: normal, or autoreload. The mode is controlled by the autoreload enable (ARE) field of the TCR.
When operating in normal mode (TCR[ARE] = ‘0’), the DEC decrements to the value 0 and then stops decrementing until it is reinitialized by software. When operating in autoreload mode (TCR[ARE] = ‘1’), instead of
decrementing to the value 0, the DEC is reloaded with the value in the Decrementer Autoreload Register
(DECAR), and continues to decrement with the next time base clock (assuming the DECAR value was
nonzero). The DECAR register is a 32-bit privileged, write-only SPR, and is written using mtspr.
The autoreload feature of the DEC is disabled upon reset, and must be enabled by software. The Decrementer Register and the Decrementer Autoreload Register (DECAR) are shown here.
Version 2.2
July 31, 2014
Timer Facilities
Page 159 of 322
User’s Manual
Decrementer
0
1
2
3
4
5
6
Bits
Field Name
0:31
Decrementer
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Description
The decrementer holds a value that decrements with each time base clock cycle, and is automatically reloaded through the Decrementer Autoreload Register.
Autoreload value
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bits
Field Name
Description
0:31
Autoreload value
The value in this register is copied to DEC at the next time base clock when DEC = ‘1’ and autoreload is enabled (TCR[ARE] = ‘1’).
When an mtspr instruction forces the DEC count to 0, a decrementer exception does not occur, and thus
TSR[DIS] is not set. However, if a time base clock causes a decrement from a DEC value of 1 to occur simultaneously with the writing of the DEC by an mtspr instruction, the decrementer exception does occur,
TSR[DIS] is set, and the DEC is written with the value from the mtspr.
For software to quiesce the activity of the DEC and eliminate all DEC exceptions, the following procedure
should be performed:
1. Write ‘0’ to TCR[DIE]. This prevents a decrementer exception from causing a decrementer interrupt.
2. Write ‘0’ to TCR[ARE]. This disables the DEC autoreload feature.
3. Write x‘0000 0000’ to the DEC to halt decrementing. Although this action does not itself cause a decrementer exception, it is possible that a decrement from a DEC value of 1 has occurred since the last time
that TSR[DIS] was cleared.
4. Write ‘1’ to TSR[DIS] (DEC interrupt status bit). This clears the decrementer exception by setting
TSR[DIS] to ‘0’. Because the DEC is no longer decrementing (because it was written with x‘0000 0000’ in
step 3), no further decrementer exceptions are possible.
6.3 Fixed-Interval Timer
The FIT provides a mechanism for causing regular periodic exceptions. The FIT typically is used by system
software to start a periodic system maintenance function, executed by the FIT interrupt handler.
A FIT exception occurs on a ‘0’ to ‘1’ transition of a selected bit from the time base. Note that a FIT exception
also occurs if the selected time base bit changes from ‘0’ to ‘1’ when an mtspr instruction writes a ‘1’ to the
selected time base bit that is at ‘0’.
The fixed-interval timer FIT field of the TCR selects one of four bits from the TBL Register, as shown in
Table 6-2 on page 161.
Timer Facilities
Page 160 of 322
Version 2.2
July 31, 2014
User’s Manual
Table 6-2. Fixed-Interval Timer Period Selection
TCR[FP]
Time Base Bit
Period
(time base clocks)
Period
(400 MHz clock)
00
TBL[19]
213 clocks
20.48 μs
01
10
11
TBL[15]
TBL[11]
TBL[7]
17
clocks
327.68 μs
21
clocks
5.2 ms
25
clocks
83.9 ms
2
2
2
When a fixed-interval timer exception occurs, the exception status is recorded by setting the fixed-interval
timer interrupt status (FIS) bit of the TSR to ‘1’. A fixed-interval timer interrupt occurs if it is enabled by both
the fixed-interval timer interrupt enable (FIE) field of the TCR and by MSR[EE]. Section 7.5.11 Fixed-Interval
Timer Interrupt on page 198 provides more information about the handling of fixed-interval timer interrupts.
The fixed-interval timer interrupt handler software should clear TSR[FIS] before re-enabling MSR[EE] to
avoid another fixed-interval timer interrupt caused by the same exception (unless TCR[FIE] is cleared
instead).
6.4 Watchdog Timer
The watchdog timer provides a mechanism for system error recovery in case the program running on the
PowerPC 476FP processor has stalled and cannot be interrupted by the normal interrupt mechanism. The
watchdog timer can be configured to cause a critical-class watchdog timer interrupt upon the expiration of a
single period of the watchdog timer. It can also be configured to start a processor-initiated reset upon the
expiration of a second period of the watchdog timer.
A watchdog timer exception occurs on a ‘0’ to ‘1’ transition of a selected bit from the time base. Note that a
watchdog timer exception also occurs if the selected time base bit changes from ‘0’ to ‘1’ when an mtspr
instruction writes a ‘1’ to the selected time base bit when it is at ‘0‘.
The watchdog timer period (WP) field of the TCR selects one of four bits from the TBU Register, as shown in
Table 6-3 on page 161.
Table 6-3. Watchdog Timer Period Selection
TCR[WP]
Time Base Bit
Period
(time base clocks)
Period
(400 MHz clock)
00
TBL[11]
221 clocks
5.2 ms
01
10
11
TBL[7]
TBL[3]
TBU[31]
2
25
clocks
83.9 ms
2
29
clocks
1.34 s
33
clocks
21.47 s
2
The action taken upon a watchdog timer exception depends upon the status of the enable next watchdog
(ENW) and watchdog timer interrupt status (WIS) fields of the TSR at the time of the exception. When
TSR[ENW] = ‘0’, the next watchdog timer exception is disabled, and the only action to be taken upon the
exception is to set TSR[ENW] to ‘1’. By clearing TSR[ENW], software can guarantee that the time until the
next enabled watchdog timer exception is at least one full watchdog timer period and a maximum of two full
watchdog timer periods.
Version 2.2
July 31, 2014
Timer Facilities
Page 161 of 322
User’s Manual
When TSR[ENW] = ‘1’, the next watchdog timer exception is enabled, and the action to be taken upon the
exception depends on the value of TSR[WIS] at the time of the exception. If TSR[WIS] = ‘0’, the action is to
set TSR[WIS] to ‘1’, at which time a watchdog timer interrupt occurs if enabled by both the watchdog timer
interrupt enable (WIE) field of the TCR and by the critical interrupt enable (CE) field of the MSR. The
watchdog timer interrupt handler software should clear TSR[WIS] before re-enabling MSR[CE] to avoid
another watchdog timer interrupt caused by the same exception (unless TCR[WIE] is cleared instead).
Section 7.5.12 Watchdog Timer Interrupt on page 199 provides more information about the handling of
watchdog timer interrupts.
If TSR[WIS] is already ‘1’ at the time of the next watchdog timer exception, the action to take depends on the
value of the watchdog timer reset control (WRC) field of the TCR. If TCR[WRC] is nonzero and a watchdog
timer exception occurs, the value of the TCR[WRC] field is copied into the watchdog timer reset status (WRS)
bit of the TSR, TCR[WRC] is cleared, and a core reset occurs. See Section 9.1 Processor Core State after
Reset on page 243 for more information about core behavior when reset.
Note: After software has set TCR[WRC] to a nonzero value, it cannot be reset by software; this feature prevents errant software from disabling the watchdog timer reset capability.
Table 6-4 summarizes the action to be taken upon a watchdog timer exception according to the values of
TSR[ENW] and TSR[WIS].
Table 6-4. Watchdog Timer Exception Behavior
TSR[ENW]
TSR[WIS]
Action upon Watchdog Timer Exception
0
0
Set TSR[ENW] to ‘1’.
0
1
Set TSR[ENW] to ‘1’.
1
0
Set TSR[WIS] to ‘1’. If watchdog timer interrupts are enabled (TCR[WIE] = ‘1’ and MSR[CE] = ‘1’), an
interrupt occurs.
1
1
Cause the watchdog timer reset action specified by TCR[WRC]. A reset causes the TCR[WRC] bit to be
copied into TSR[WRS], and then clears TCR[WRC].
A typical system use of the watchdog timer function is to enable the watchdog timer interrupt and the
watchdog timer reset function in the TCR (and MSR), and to start out with both TSR[ENW] and TSR[WIS]
cleared to zeros. A recurring software loop of reliable duration (or alternatively the interrupt handler for a periodic interrupt such as the fixed-interval timer interrupt) can perform a periodic check of system integrity. Upon
successful completion of the system check, software clears TSR[ENW], thereby ensuring that a minimum of
one full watchdog timer period and a maximum of two full watchdog timer periods must expire before an
enabled watchdog timer exception occurs.
If for some reason the recurring software loop is not successfully completed (and TSR[ENW] does not get
cleared) during this period of time, an enabled watchdog timer exception occurs. The exception sets
TSR[WIS], and a watchdog timer interrupt occurs (if enabled by both TCR[WIE] and MSR[CE]). The occurrence of a watchdog timer interrupt in this software-serviced system is interpreted as a system error, because
the system was unable to complete the periodic system integrity check in time to avoid the watchdog timer
exception. The action taken by the watchdog timer interrupt handler is system-dependent, but typically the
software attempts to determine the nature of the problem and correct it if possible. If and when the system
attempts to resume operation, the software typically clears both TSR[WIS] and TSR[ENW], thus providing a
minimum of another full watchdog timer period for a new system integrity check to occur.
Finally, if for some reason the watchdog timer interrupt is disabled or the watchdog timer interrupt handler is
unsuccessful in clearing TSR[WIS] and TSR[ENW] before another watchdog timer exception, or both, the
next exception causes a processor reset operation to occur, according to the value of TCR[WRC].
Timer Facilities
Page 162 of 322
Version 2.2
July 31, 2014
User’s Manual
Figure 6-2 illustrates the sequence of watchdog timer events that typically occurs in a system.
Figure 6-2. Watchdog State Machine
The watchdog timer exception is disabled.
The next exception sets TSR[ENW] so that a
subsequent exception can set TSR[WIS].
Exception
Software Loop
TSR[ENW,WIS] = ‘00’
The watchdog timer exception is
enabled. The next exception sets
TSR[WIS] and causes an
interrupt if enabled by
TCR[WIE] and MSR[CE].
Exception
Watchdog Timer
Interrupt Handler
Exception
The watchdog timer exception is disabled,
but TSR[WIS] is already set. This state
should not occur.
Exception
If TCR[WRC] ≠ ‘00’,
then perform a RESET.
Otherwise, do nothing.
The watchdog timer exception is enabled, and
the first exception status is still set. The next
exception causes a reset if enabled by TCR[WRC].
6.5 Timer Control Register
The Timer Control Register (TCR) is a privileged SPR that controls DEC, FIT, and watchdog timer operation.
The TCR is read into a General Purpose Register (GPR) by using mfspr, and is written from a GPR by using
mtspr.
The WRC field of the TCR is cleared to zero by a processor reset (see Section 9.1 Processor Core State after
Reset on page 243). Each bit of this 2-bit field is set only by software and is cleared only by hardware. For
each bit of the field, after software has written it to a ‘1’, that bit remains ‘1’ until a processor reset occurs. This
prevents errant code from disabling the watchdog timer reset function.
The ARE bit of the TCR is also cleared to ‘0’ by a processor reset. This disables the autoreload feature of the
DEC.
Version 2.2
July 31, 2014
Timer Facilities
Page 163 of 322
User’s Manual
ARE
FP
FIE
WRC
DIE
WP
WIE
Reserved
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:33
WP
Description
Watchdog timer period.
00
221 time base clocks.
01
10
11
34:35
WRC
Watchdog timer reset control.
This field is reset to ‘00’. This field specifies the type of reset that is generated when a watchdog
timer exception occurs with TSR[ENW,WIS] = ‘11’. This field can be set by software, but cannot be
cleared by software, except by a software-induced reset.
00
No watchdog timer reset.
01
Processor core reset.
10
Chip reset.
11
System reset.
36
WIE
Watchdog timer interrupt enable.
0
Disable the watchdog timer interrupt.
1
Enable the watchdog timer interrupt.
37
DIE
Decrementer interrupt enable.
0
Disable decrementer interrupt.
1
Enable decrementer interrupt.
38:39
FP
FIT period.
00
01
10
11
40
FIE
FIT interrupt enable.
0
Disable FIT interrupt.
1
Enable FIT interrupt.
41
ARE
Autoreload enable.
This bit is reset to ‘0’.
0
Disable autoreload.
1
Enable autoreload.
41:64
Reserved
6.6 Timer Status Register
The Timer Status Register (TSR) is a privileged SPR that records the status of DEC, FIT, and watchdog timer
events. The fields of the TSR are generally set to ‘1’ only by hardware and are cleared to ‘0’ only by software.
Hardware cannot clear any fields in the TSR, nor can software set any fields. Software can read the TSR into
a GPR by using the mfspr instruction. Clearing the TSR is performed using the mtspr instruction by placing
a ‘1’ in the GPR source register in all bit positions that are to be cleared in the TSR, and a ‘0’ in all other bit
positions. The data written from the GPR to the TSR is not direct data, but a mask. A ‘1’ clears the bit, and a
‘0’ leaves the corresponding TSR bit unchanged.
Timer Facilities
Page 164 of 322
Version 2.2
July 31, 2014
User’s Manual
FIS
WRS
DIS
WIS
ENW
Reserved
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
Description
32
ENW
Enable next watchdog timer exception.
0
The action on the next watchdog timer exception is to set TSR[ENW] = ‘1’.
1
The action on the next watchdog timer exception is governed by TSR[WIS]. See Table 6-4
on page 162 for information about the action taken.
33
WIS
Watchdog timer interrupt status.
0
The watchdog timer exception has not occurred.
1
The watchdog timer exception has occurred.
34:35
WRS
Watchdog timer reset status.
00
No watchdog timer reset has occurred.
01
A core reset was forced by the watchdog timer.
10
A chip reset was forced by the watchdog timer.
11
A system reset was forced by the watchdog timer.
36
DIS
Decrementer interrupt status.
0
A decrementer exception has not occurred.
1
A decrementer exception has occurred.
37
FIS
Fixed-interval timer interrupt status.
0
A fixed-interval timer exception has not occurred.
1
A fixed-interval timer exception has occurred.
38:63
Reserved
Reserved.
6.7 Halting the Timer Facilities
The debug mechanism provides a means for temporarily halting the timers upon a debug exception. Whenever a debug exception is recorded in the Debug Status Register (DBSR), the time base can be prevented
from incrementing, and the decrementer can be prevented from decrementing. This allows a debugger to
simulate the appearance of real-time operation, even though the application has been temporarily stopped to
service the debug event.
Section 8.5.1 Debug Control Register 0 (DBCR0) on page 235 describes the use of the freeze timers (FT) bit.
6.8 Selection of the Timer Clock Source
The timer clock source is selected by CCR1[TSS] and determines which clock is the timer source: the CPU
clock or CPMC476TIMERCLOCK. See Section 2.7.5 Core Configuration Register 1 (CCR1) on page 70 for
more information. The timer clock select (TCS) field of the Core Configuration Register 1 (CCR1) determines
what clock frequency runs the timers. When set to ‘00’, CCR1[TCS] selects the selected source clock
frequency. This is the highest frequency timer clock source. When set to ‘01’, CCR1[TCS] selects a quarterrate processor clock as the timer clock. Other TCS settings select other submultiples of the processor clock.
See Core Configuration Register 1 (CCR1) on page 70 for more information about the timer clock select field.
Version 2.2
July 31, 2014
Timer Facilities
Page 165 of 322
User’s Manual
Timer Facilities
Page 166 of 322
Version 2.2
July 31, 2014
User’s Manual
7. Processor Interrupts and Exceptions
This section begins by defining the terminology and classification of interrupts and exceptions in Section 7.1
Overview on page 167 and Section 7.2 Interrupt Classes on page 167.
Section 7.3 Interrupt Processing on page 170 explains in general how interrupts are processed, including the
requirements for partial execution of instructions.
Several registers support interrupt handling and control. Section 7.4 Interrupt Processing Registers on
page 173 describes these registers.
Table 7-2 on page 183 lists the interrupts and exceptions handled by the PowerPC 476FP core, in the order
of Interrupt Vector Offset Register (IVOR) usage. Detailed descriptions of each interrupt type follow, in the
same order.
Finally, Section 7.6 Interrupt Ordering and Masking on page 207 and Section 7.7 Exception Priorities on
page 210 define the priority order for the processing of simultaneous interrupts and exceptions.
7.1 Overview
An interrupt is the action in which the processor saves its old context (Machine State Register [MSR] and next
instruction address) and begins execution at a predetermined interrupt-handler address with a modified MSR.
Exceptions are the events that will, if enabled, cause the processor to take an interrupt.
Exceptions are generated by signals from internal and external peripheral devices, instructions, the internal
timer facility, debug events, or error conditions. Interrupts are divided into four classes, and when they are
processed no program state is lost. Because Save/Restore register pairs SRR0/SRR1, CSRR0/CSRR1, and
MCSSR0/MCSSR1 are serially reusable resources used by base, critical, Machine Check interrupts, respectively, the program state might be lost when an unordered interrupt is taken.
All interrupts, except machine check, are context synchronizing. A machine check interrupt acts like a context
synchronizing operation with respect to subsequent instructions.
Exceptions might be generated by the execution of instructions or by signals from devices external to the
PowerPC 476FP core, the internal timer facilities, debug events, or error conditions.
7.2 Interrupt Classes
All interrupts, except for machine check, can be categorized according to two independent characteristics of
the interrupt:
• Asynchronous or synchronous
• Critical or noncritical
7.2.1 Asynchronous Interrupts
Asynchronous interrupts are caused by events that are independent of instruction execution. For asynchronous interrupts, the address reported to the interrupt handling routine is the address of the instruction that
would have executed next, had the asynchronous interrupt not occurred.
Version 2.2
July 31, 2014
Processor Interrupts and Exceptions
Page 167 of 322
User’s Manual
7.2.2 Synchronous Interrupts
Synchronous interrupts are those that are caused directly by the execution (or attempted execution) of
instructions, and are further divided into two classes, precise and imprecise.
Synchronous, precise interrupts are those that precisely indicate the address of the instruction causing the
exception that generated the interrupt or for certain synchronous, precise interrupt types, the address of the
immediately following instruction.
Synchronous, imprecise interrupts are those that can indicate the address of the instruction that caused the
exception that generated the interrupt or the address of some instruction after the one that caused the exception.
7.2.2.1 Synchronous, Precise Interrupts
When the execution or attempted execution of an instruction causes a synchronous, precise interrupt, the
following conditions exist when the associated interrupt handler begins execution:
• SRR0 (see Section 7.4.2 Save/Restore Register 0 (SRR0) on page 174) or CSRR0 (see Section 7.4.4
Critical Save/Restore Register 0 (CSRR0) on page 175) addresses either the instruction that caused the
exception that generated the interrupt or the instruction immediately following this instruction. Which
instruction is addressed can be determined from a combination of the interrupt type and the setting of
certain fields of the ESR (see Section 7.4.11 Exception Syndrome Register (ESR) on page 179).
• The interrupt is generated such that all instructions preceding the instruction that caused the exception
appear to have completed with respect to the executing processor. However, some storage accesses
associated with these preceding instructions might not have been performed with respect to other processors and mechanisms.
• The instruction that caused the exception might appear not to have begun execution (except for having
caused the exception), might have been partially executed, or might have completed, depending on the
interrupt type (see Section 7.3.1 Partially Executed Instructions on page 172).
• Architecturally, no instruction beyond the one that caused the exception has executed.
7.2.2.2 Synchronous, Imprecise Interrupts
When the execution or attempted execution of an instruction causes a synchronous, imprecise interrupt, the
following conditions exist when the associated interrupt handler begins execution:
• SRR0 or CSRR0 addresses either the instruction that caused the exception that generated the interrupt,
or some instruction following this instruction.
• The interrupt is generated such that all instructions preceding the instruction addressed by SRR0 or
CSRR0 appear to have completed with respect to the executing processor.
• If the imprecise interrupt is forced by the context synchronizing mechanism due to an instruction that
causes another exception that generates an interrupt (for example, alignment, data storage), SRR0
addresses the interrupt-forcing instruction, and the interrupt-forcing instruction might have been partially
executed (see Section 7.3.1 Partially Executed Instructions on page 172).
• If the imprecise interrupt is forced by the execution synchronizing mechanism due to executing an execution synchronizing instruction other than msync or isync, SRR0 or CSRR0 addresses the interrupt-forcing instruction, and the interrupt-forcing instruction appears not to have begun execution (except for its
forcing the imprecise interrupt). If the imprecise interrupt is forced by an msync or isync instruction,
SRR0 or CSRR0 can address either the msync or isync instruction, or the following instruction.
Page 168 of 322
Version 2.2
July 31, 2014
User’s Manual
• If the imprecise interrupt is not forced by either the context synchronizing mechanism or the execution
synchronizing mechanism, the instruction addressed by SRR0 or CSRR0 might have been partially executed (see Section 7.3.1 Partially Executed Instructions on page 172).
• No instruction following the instruction addressed by SRR0 or CSRR0 has executed.
The only synchronous, imprecise interrupts in the PowerPC 476FP core are the special cases of delayed
interrupts, which can result when certain kinds of exceptions occur while the corresponding interrupt type is
disabled. The first of these is the floating-point enabled exception type program interrupt. For this type of
interrupt to occur, a floating-point unit must be attached to the auxiliary processor interface of the PowerPC
476FP core, and the floating-point enabled exception summary bit of the Floating-Point Status and Control
Register (FPSCR[FEX]) must be set while floating-point enabled exception type program interrupts are
disabled due to MSR[FE0,FE1] both being ‘0’. When such interrupts are subsequently enabled by setting
both of MSR[FE0,FE1] to ‘1’ while FPSCR[FEX] is still ‘1’, a synchronous, imprecise form of floating-point
enabled exception type program interrupt occurs, and SRR0 is set to the address of the instruction that would
have executed next (that is, the instruction after the one that updated MSR[FE0,FE1]). If the MSR was
updated by an rfi, rfci, or rfmci instruction, SRR0 will be set to the address to which the rfi, rfci, or rfmci was
returning and not to the instruction address that is sequentially after the rfi, rfci, or rfmci.
The second type of delayed interrupt that is handled as a synchronous, imprecise interrupt is the debug interrupt. Similar to the floating-point enabled exception type program interrupt, the debug interrupt can be temporarily disabled by an MSR bit, MSR[DE]. Accordingly, certain kinds of debug exceptions can occur and be
recorded in the DBSR while MSR[DE] = ‘0’, and later lead to a delayed debug interrupt if MSR[DE] is set to ‘1’
while a debug exception is still set in the DBSR. When this occurs, the interrupt will either be synchronous
and imprecise, or it will be asynchronous, depending on the type of debug exception causing the interrupt. In
either case, CSRR0 is set to the address of the instruction that would have executed next (that is, the instruction after the one that set MSR[DE] to ‘1’). If MSR[DE] is set to ‘1’ by rfi, rfci, or rfmci, CSRR0 is set to the
address to which the rfi, rfci, or rfmci was returning, and not to the address of the instruction that was
sequentially after the rfi, rfci, or rfmci.
Besides these special cases of program and debug interrupts, all other synchronous interrupts are handled
precisely by the PowerPC 476FP core, including FP enabled exception type program interrupts even when
the processor is operating in one of the architecturally-defined imprecise modes (MSR[FE0,FE1] = ‘01’ or
‘10’). The PowerPC 476FP core generates a precise interrupt when MSR[FE0, FE1] = ‘01’ or ‘10’.
See Section 7.5.7 Program Interrupt on page 193 and Section 7.5.15 Debug Interrupt on page 201 for a more
detailed description of these interrupt types, including both the precise and imprecise cases.
7.2.3 Critical and Noncritical Interrupts
Interrupts can also be classified as critical or noncritical interrupts. Certain interrupt types demand immediate
attention, even if other interrupt types are currently being processed and have not yet saved the state of the
machine (that is, return address and captured state of the MSR). To enable taking a critical interrupt immediately after a noncritical interrupt has occurred (that is, before the state of the machine has been saved), two
sets of Save/Restore Register pairs are provided. Critical interrupts use the Save/Restore Register pair
CSRR0/CSRR1. Noncritical interrupts use Save/Restore Register pair SRR0/SRR1.
7.2.4 Machine Check Interrupts
Machine check interrupts are a special case. They are typically caused by some kind of hardware or storage
subsystem failure or by an attempt to access an invalid address. A machine check can be caused indirectly
by the execution of an instruction but not be recognized or reported until long after the processor has
Version 2.2
July 31, 2014
Page 169 of 322
User’s Manual
executed past the instruction that caused the machine check. As such, machine check interrupts cannot
properly be classified as either synchronous or asynchronous, nor as precise or imprecise. They also do not
belong to either the critical or the noncritical interrupt class but instead, have associated with them a unique
pair of save/restore registers, Machine Check Save/Restore Register 0 (MCSRR0) and Machine Check
Save/Restore Register 1(MCSRR1).
Architecturally, the following general rules apply for machine check interrupts:
1. No instruction after the one whose address is reported to the machine check interrupt handler in
MCSRR0 has begun execution.
2. The instruction whose address is reported to the machine check interrupt handler in MCSRR0, and all
prior instructions, might or might not have completed successfully. All those instructions that are ever
going to complete appear to have done so already, and have done so within the context existing before
the machine check interrupt. No further interrupt other than possible additional machine check interrupts
occurs as a result of those instructions.
With the PowerPC 476FP core, machine check interrupts can be caused by machine check exceptions on a
memory access for an instruction fetch, a data access, or a TLB access. Some of the interrupts generated
behave as synchronous, precise interrupts, and other are handled in an asynchronous fashion.
In the case of an Instruction synchronous machine check exception, the PowerPC 476FP core handles the
interrupt as a synchronous, precise interrupt, assuming machine check interrupts are enabled
(MSR[ME] = ‘1’). That is, if a machine check exception is detected during an instruction fetch, the exception is
not reported to the interrupt mechanism unless execution is attempted for the instruction address at which the
machine check exception occurred. For example, if the direction of the instruction stream is changed
(perhaps due to a branch instruction) such that the instruction at the address associated with the machine
check exception will not be executed, the exception is not reported and no interrupt occurs. If an instruction
machine check exception is reported, and if machine check interrupts are enabled at the time of the reporting
of the exception, the interrupt is synchronous and precise, and MCSRR0 is set to the instruction address that
led to the exception. If machine check interrupts are not enabled at the time of the reporting of an instruction
machine check exception, a machine check interrupt will not be generated (even if MSR[ME] is subsequently
set to ‘1’) although the ESR[ISMC] field is set to ‘1’ to indicate that the exception has occurred and that the
instruction associated with the exception has been executed.
Instruction asynchronous machine check, data asynchronous machine check, and TLB asynchronous
machine check exceptions, however, are handled in an asynchronous fashion. That is, the address reported
in MCSRR0 might not be related to the instruction that prompted the access that led directly or indirectly to
the machine check exception. The address can be that of an instruction before or after the exception-causing
instruction, or it can reference the exception causing instruction, depending on the nature of the access, the
type of error encountered, and the circumstances of the instruction execution within the processor pipeline. If
MSR[ME] = ‘0’ at the time of a machine check exception that is handled in this asynchronous way, a machine
check interrupt subsequently occurs if MSR[ME] is set to ‘1’.
See Section 7.5.2 Machine Check Interrupt on page 186 for more detailed information about machine check
interrupts.
7.3 Interrupt Processing
Associated with each kind of interrupt is an interrupt vector, that is, the address of the initial instruction that is
executed when the corresponding interrupt occurs.
Page 170 of 322
Version 2.2
July 31, 2014
User’s Manual
Interrupt processing consists of saving a small part of the processor state in certain registers, identifying the
cause of the interrupt in another register, and continuing execution at the corresponding interrupt vector location. When an exception exists and the corresponding interrupt type is enabled, the following actions are
performed in order:
1. SRR0 (for noncritical class interrupts), CSRR0 (for critical class interrupts), or MCSRR0 (for machine
check interrupts) is loaded with an instruction address that depends on the type of interrupt; see the specific interrupt description for details.
2. The ESR is loaded with information specific to the exception type. Note that many interrupt types can only
be caused by a single type of exception and thus, do not need nor use an ESR setting to indicate the
cause of the interrupt. Machine check interrupts load the Machine Check Syndrome Register (MCSR).
3. SRR1 (for noncritical class interrupts), CSRR1 (for critical class interrupts), or MCSRR1 (for machine
check interrupts) is loaded with a copy of the contents of the MSR.
4. The MSR is updated described as follows. The new values take effect beginning with the first instruction
following the interrupt:
• MSR[WE,EE,PR,FP,FE0,DWE,FE1,IS,DS] are set to ‘0’ by all interrupts.
• MSR[CE,DE] are set to ‘0’ by all critical class interrupts and left unchanged by all noncritical class
interrupts.
• MSR[ME] is set to ‘0’ by machine check interrupts and left unchanged by all other interrupts.
See Section 7.4.1 Machine State Register (MSR) on page 173 for more detail on the definition of the
MSR.
5. Instruction fetching and execution resumes using the new MSR value at the interrupt vector address,
which is specific to the interrupt type and is determined as follows:
IVPR0:15 || IVORn16:27 || 0b0000
where n specifies the IVOR register to be used for a particular interrupt type (see Section 7.4.9 Interrupt
Vector Offset Registers (IVOR0 - IVOR15) on page 178).
At the end of a noncritical interrupt handling routine, execution of an rfi causes the MSR to be restored from
the contents of SRR1 and instruction execution to resume at the address contained in SRR0. Likewise,
execution of an rfci performs the same function at the end of a critical interrupt handling routine using CSRR0
instead of SRR0 and CSRR1 instead of SRR1. The rfmci instruction uses MCSRR0 and MCSRR1 in the
same manner.
Note: In general, at process switch due to possible process interlocks and possible data availability requirements, the operating system must consider executing the following instructions:
• stwcx., to clear the reservation if one is outstanding to ensure that an lwarx in the old process is not
paired with a stwcx. in the new process.
• msync, to ensure that all storage operations of an interrupted process are complete with respect to other
processors before that process begins executing on another processor.
• isync, rfi, rfci, or rfmci, to ensure that the instructions in the new process execute in the new context.
Version 2.2
July 31, 2014
Page 171 of 322
User’s Manual
7.3.1 Partially Executed Instructions
In general, the architecture permits load and store instructions to be partially executed, interrupted, and then
to be restarted from the beginning upon return from the interrupt. To guarantee that a particular load or store
instruction will complete without being interrupted and restarted, software must mark the storage being
referred to as Guarded, and must use an elementary (not a string or multiple) load or store that is aligned on
an operand-sized boundary.
To guarantee that load and store instructions can, in general, be restarted and completed correctly without
software intervention, the following rules apply when an instruction is partially executed and then interrupted:
For an elementary load, no part of the target register, GPR(RT), FPR(FRT), or auxiliary processor register.
will have been altered.
• For the update forms of load and store instructions, the update register, GPR(RA), will not have been
altered.
However, the following effects are permissible when certain instructions are partially executed and then
restarted:
• For any store instruction, some of the bytes at the addressed storage location might have been accessed
or updated (if write access to that page in which bytes were altered is permitted by the access control
mechanism). In addition, for the stwcx. instruction, if the address is not aligned on a word boundary, the
value in CR[CR0] is undefined, as is whether the reservation (if one existed) has been cleared.
• For any load, some of the bytes at the addressed storage location might have been accessed (if read
access to that page in which bytes were accessed is permitted by the access control mechanism). In
addition, for the lwarx instruction, if the address is not aligned on a word boundary, it is undefined
whether a reservation has been set.
• For load multiple and load string instructions, some of the registers in the range to be loaded might have
been altered. Including the addressing registers (GPR(RA), and possibly GPR(RB)) in the range to be
loaded is an invalid form of these instructions (and a programming error), and thus, the rules for partial
execution do not protect against overwriting of these registers. Such possible overwriting of the addressing registers makes these invalid forms of load multiple and load strings inherently nonrestartable.
In no case is access control violated.
As previously stated, the only load or store instructions that are guaranteed to not be interrupted after being
partially executed are elementary, aligned, and guarded loads and stores. All others can be interrupted after
being partially executed. The following list identifies the specific instruction types for which interruption after
partial execution can occur and the specific interrupt types that can cause the interruption:
• Any load or store (except elementary, aligned, guarded):
– Critical input
– Machine check
– External input
– Program (imprecise mode floating-point enabled)
Note: This type of interrupt can lead to partial execution of a load or store instruction under the architectural definition only; the PowerPC 476FP core handles the imprecise modes of the floating-point
enabled exceptions precisely, and hence, this type of interrupt does not lead to partial execution.
– Decrementer
Page 172 of 322
Version 2.2
July 31, 2014
User’s Manual
– Fixed-interval timer
– Watchdog timer
– Debug (unconditional debug event)
• Unaligned elementary load or store, or any load or store multiple or string:
All of those listed previously plus the following items:
– Alignment
– Data storage (if the access crosses a memory page boundary)
– Debug (data address compare, data value compare)
7.4 Interrupt Processing Registers
The interrupt processing registers include the Save/Restore Registers (SRR0 - SRR1), Critical Save/Restore
Registers (CSRR0 - CSRR1), Data Exception Address Register (DEAR), Interrupt Vector Offset Registers
(IVOR0 - IVOR15), Interrupt Vector Prefix Register (IVPR), and Exception Syndrome Register (ESR). Also
described in this section is the Machine State Register (MSR), which belongs to the category of processor
control registers.
7.4.1 Machine State Register (MSR)
The MSR is a register of its own unique type that controls important chip functions such as the enabling or
disabling of various interrupt types.
Reserved
PMM
IS DS
Reserved
Reserved
DE
FE1
EE PR FP ME
DWE
WE CE
FE0
Reserved
Reserved
The MSR can be written from a GPR using the mtmsr instruction. The contents of the MSR can be read into
a GPR using the mfmsr instruction. The MSR[EE] bit can be set or cleared atomically using the wrtee or
wrteei instructions. The MSR contents are also saved, altered, and restored by the interrupt-handling mechanism.
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:44
Reserved
45
WE
Wait state enable.
0
The processor is not in the wait state.
1
The processor is in the wait state.
If MSR[WE] = ‘1’, the processor remains in the wait state until an interrupt is taken, a reset occurs,
or an external debug tool clears WE.
46
CE
Critical interrupt enable.
0
Critical input and watchdog timer interrupts are disabled.
1
Critical input and watchdog timer interrupts are enabled.
47
Reserved
Version 2.2
July 31, 2014
Description
Page 173 of 322
User’s Manual
Bits
Field Name
Description
48
EE
External interrupt enable.
0
External input, decrementer, and fixed interval timer interrupts are disabled.
1
External input, decrementer, and fixed interval timer interrupts are enabled.
49
PR
Problem state.
0
Supervisor state (the processor is in privileged state).
1
Problem state (the processor is in problem state).
50
FP
Floating point available.
0
The processor cannot execute floating-point instructions.
1
The processor can execute floating-point instructions.
51
ME
Machine check enable.
0
Machine check interrupts are disabled.
1
Machine check interrupts are enabled.
52
FE0
Floating-point exception mode 0.
0
If MSR[FE1] = ‘0’, ignore exceptions mode; if MSR[FE1] = ‘1’, imprecise nonrecoverable
mode.
1
If MSR[FE1] = ‘0’, imprecise recoverable mode; if MSR[FE1] = ‘1’, precise mode.
53
DWE
Debug wait enable.
0
Disable debug wait mode.
1
Enable debug wait mode.
54
DE
Debug interrupt enable.
0
Debug interrupts are disabled.
1
Debug interrupts are enabled.
55
FE1
Floating-point exception mode 1.
0
If MSR[FE0] = ‘0’, ignore exceptions mode; if MSR[FE0] = ‘1’, imprecise recoverable
mode.
1
If MSR[FE0] = ‘0’, imprecise non-recoverable mode; if MSR[FE0] = ‘1’, precise mode.
56:57
Reserved
58
IS
Instruction address space.
0
All instruction storage accesses are directed to address space 0 (TS = ‘0’ in the relevant
TLB entry).
1
All instruction storage accesses are directed to address space 1 (TS = ‘1’ in the relevant
TLB entry).
59
DS
Data address space.
0
All data storage accesses are directed to address space 0 (TS = ‘0’ in the relevant TLB
entry).
1
All data storage accesses are directed to address space 1 (TS = ‘1’ in the relevant TLB
entry).
60
Reserved
61
PMM
62:63
Reserved
Performance monitor mark.
0
Disable gathering statistics for marked processes.
1
Enable gathering statistics for marked processes.
7.4.2 Save/Restore Register 0 (SRR0)
The SRR0 is an SPR that is used to save the machine state on noncritical interrupts and to restore machine
state when an rfi is executed. When a noncritical interrupt occurs, SRR0 is set to an address associated with
the process that was executing at the time. When rfi is executed, instruction execution returns to the address
in SRR0.
Page 174 of 322
Version 2.2
July 31, 2014
User’s Manual
In general, SRR0 contains the address of the instruction that caused the noncritical interrupt or the address of
the instruction to return to after a noncritical interrupt is serviced. See the individual descriptions under
Section 7.5 Interrupt Definitions on page 182 for an explanation of the precise address recorded in SRR0 for
each noncritical interrupt type.
Reserved
SRR0 can be written from a GPR using mtspr and can be read into a GPR using mfspr.
ADDR
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:61
ADDR
62:63
Reserved
Description
The return address for noncritical interrupts.
Reserved.
7.4.3 Save/Restore Register 1 (SRR1)
The SRR1 is an SPR that is used to save machine state on noncritical interrupts, and to restore machine
state when an rfi is executed. When a noncritical interrupt is taken, the contents of the MSR (before the MSR
was cleared by the interrupt) are placed into SRR1. When rfi is executed, the MSR is restored with the
contents of SRR1.
Bits of SRR1 that correspond to reserved bits in the MSR are also reserved.
Note: An MSR bit that is reserved can be altered by rfi consistent with the value being restored from SRR1.
Reserved
PMM
IS DS
Reserved
Reserved
DE
FE1
EE PR FP ME
DWE
WE CE
FE0
Reserved
Reserved
SRR1 can be written from a GPR using mtspr, and can be read into a GPR using mfspr.
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
32:63
Field Name
Description
A copy of the MSR at the time of a noncritical interrupt.
7.4.4 Critical Save/Restore Register 0 (CSRR0)
The CSRR0 is an SPR that is used to save machine state on critical interrupts and to restore machine state
when an rfci is executed. When a critical interrupt occurs, CSRR0 is set to an address associated with the
process that was executing at the time. When rfci is executed, instruction execution returns to the address in
CSRR0.
In general, CSRR0 contains the address of the instruction that caused the critical interrupt or the address of
the instruction to return to after a critical interrupt is serviced. See the individual descriptions under
Section 7.5 Interrupt Definitions on page 182 for an explanation of the precise address recorded in CSRR0
for each critical interrupt type.
Version 2.2
July 31, 2014
Page 175 of 322
User’s Manual
Reserved
CSRR0 can be written from a GPR using mtspr, and can be read into a GPR using mfspr.
ADDR
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:61
ADDR
62:63
Reserved
Description
Return address for critical interrupts.
7.4.5 Critical Save/Restore Register 1 (CSRR1)
The CSRR1 is an SPR that is used to save machine state on critical interrupts and to restore machine state
when an rfci is executed. When a critical interrupt is taken, the contents of the MSR (before the MSR was
cleared by the interrupt) are placed into CSRR1. When rfci is executed, the MSR is restored with the
contents of CSRR1.
Bits of CSRR1 that correspond to reserved bits in the MSR are also reserved. Because CSRR1 is a 32-bit
register, CSRR1[0:31] corresponds to MSR[32:63].
Note: An MSR bit that is reserved can be altered by rfci, consistent with the value being restored from
CSRR1.
Reserved
PMM
IS DS
Reserved
Reserved
DE
FE1
EE PR FP ME
DWE
WE CE
FE0
Reserved
Reserved
CSRR1 can be written from a GPR using mtspr and can be read into a GPR using mfspr.
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:63
Description
A copy of the MSR when a critical interrupt is taken.
7.4.6 Machine Check Save/Restore Register 0 (MCSRR0)
The MCSRR0 is an SPR that is used to save machine state on machine check interrupts, and to restore
machine state when an rfmci is executed. When a machine check interrupt occurs, MCSRR0 is set to an
address associated with the process that was executing at the time. When rfmci is executed, instruction
execution returns to the address in MCSRR0.
In general, MCSRR0 contains the address of the instruction that caused the machine check interrupt, or the
address of the instruction to return to after a machine check interrupt is serviced. See the individual descriptions under Section 7.5 Interrupt Definitions on page 182 for an explanation of the precise address recorded
in MCSRR0 for each machine check interrupt type.
MCSRR0 can be written from a GPR using mtspr and can be read into a GPR using mfspr.
Page 176 of 322
Version 2.2
July 31, 2014
User’s Manual
Reserved
ADDR
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:61
ADDR
62:63
Reserved
Description
The return address for machine check interrupts.
7.4.7 Machine Check Save/Restore Register 1 (MCSRR1)
The MCSRR1 is an SPR that is used to save machine state on machine check interrupts and to restore
machine state when an rfmci is executed. When a machine check interrupt is taken, the contents of the MSR
(before the MSR was cleared by the interrupt) are placed into MCSRR1. When rfmci is executed, the MSR is
restored with the contents of MCSRR1.
Bits of MCSRR1 that correspond to reserved bits in the MSR are also reserved. Because CSRR1 is a 32-bit
register, CSRR1[0:31] corresponds to MSR[32:63].
Note: An MSR bit that is reserved can be altered by rfmci, consistent with the value being restored from
MCSRR1.
Reserved
PMM
IS DS
Reserved
Reserved
DE
FE1
EE PR FP ME
DWE
WE CE
FE0
Reserved
Reserved
MCSRR1 can be written from a GPR using mtspr and can be read into a GPR using mfspr.
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
32:63
Field Name
Description
A copy of the Machine State Register (MSR) at the time of a machine check interrupt.
7.4.8 Data Exception Address Register (DEAR)
The DEAR contains the address that was referenced by a load, store, or cache management instruction that
caused an alignment, data TLB miss, or data storage exception.
The DEAR can be written from a GPR using mtspr, and can be read into a GPR using mfspr.
Version 2.2
July 31, 2014
Page 177 of 322
User’s Manual
DEAR
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
Description
32:63
DEAR
Data Cache Effective Address Register.
Upon exceptions that are detected in the data cache unit, such as a UTLB miss or a DSI, the DEAR
is written when a faulty commitment occurs in the LWB with the effective address of the data
request. The DEAR can also be written with an mtspr instruction.
7.4.9 Interrupt Vector Offset Registers (IVOR0 - IVOR15)
An IVOR specifies the quadword (16-byte)-aligned interrupt vector offset from the base address provided by
the IVPR (see Section 7.4.10 Interrupt Vector Prefix Register (IVPR) on page 179) for its respective interrupt
type. IVOR0 - IVOR15 are provided for the defined interrupt types. The interrupt vector effective address is
formed as follows:
IVPR32:47 || IVORn48:59 || 0b0000
where n specifies the IVOR register to be used for the particular interrupt type.
Any IVOR can be written from a GPR using mtspr and can be read into a GPR using mfspr.
Table 7-1 identifies the specific IVOR register associated with each interrupt type.
Reserved
OFFSET
Reserved
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:47
Reserved
48:59
OFFSET
60:63
Reserved
Description
The address used for the interrupt vector.
Table 7-1 on page 178 identifies the specific IVOR register associated with each interrupt type.
Table 7-1. Interrupt Types Associated with each IVOR (Page 1 of 2)
IVOR
Interrupt Type
IVOR0
Critical input
IVOR1
Machine check
IVOR2
Data storage
IVOR3
Instruction storage
IVOR4
External input
IVOR5
Alignment
IVOR6
Program
Page 178 of 322
Version 2.2
July 31, 2014
User’s Manual
Table 7-1. Interrupt Types Associated with each IVOR (Page 2 of 2)
IVOR
Interrupt Type
IVOR7
Floating point unavailable
IVOR8
System call
IVOR9
Auxiliary processor unavailable
IVOR10
Decrementer
IVOR11
Fixed interval timer
IVOR12
Watchdog timer
IVOR13
Data translation lookaside buffer (TLB) error
IVOR14
Instruction TLB error
IVOR15
Debug
7.4.10 Interrupt Vector Prefix Register (IVPR)
The IVPR provides the high-order 16 bits of the effective address of the interrupt vectors, for all interrupt
types. The interrupt vector effective address is formed as follows:
IVPR0:15 || IVORn16:27 || 0b0000
where n specifies the IVOR register to be used for the particular interrupt type.
The IVPR can be written from a GPR using mtspr, and can be read into a GPR using mfspr.
ADDR
Reserved
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:47
ADDR
48:63
Reserved
Description
Address prefix.
7.4.11 Exception Syndrome Register (ESR)
The ESR provides a syndrome to differentiate between the different kinds of exceptions that can generate the
same interrupt type. Upon the generation of one of these types of interrupt, the bit or bits corresponding to the
specific exception that generated the interrupt is set, and all other ESR bits are cleared. Other interrupt types
do not affect the contents of the ESR. See the individual interrupt descriptions under Section 7.5 Interrupt
Definitions on page 182 for an explanation of the ESR settings for each interrupt type, and for a more detailed
explanation of the function of certain ESR fields.
The ESR can be written from a GPR using mtspr and can be read into a GPR using mfspr.
Version 2.2
July 31, 2014
Page 179 of 322
User’s Manual
BO PIE
Reserved
PCMP
AP
PCRE
DLK
PUO
FP ST
Reserved
PTR
PIL
PPR
POT2
SS
POT1
ISMC
PCRF
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
Description
32
ISMC
33
SS
34
POT1
Program interrupt: opcode trap 1.
This bit is set if an interrupt occurs and the opcode matches the opcode specified in IOCR1.
35
POT2
Program interrupt: opcode trap 2.
This bit is set if an interrupt occurs and the opcode matches the opcode specified in IOCR2.
36
PIL
Program interrupt: illegal instruction exception.
0
An illegal instruction exception did not occur.
1
An illegal instruction exception occurred.
37
PPR
Program interrupt: privileged instruction exception.
0
A privileged instruction exception did not occur.
1
A privileged instruction exception occurred.
38
PTR
Program interrupt: trap exception.
0
A trap exception did not occur.
1
A trap exception occurred.
39
FP
Floating-point operation.
0
The exception was not caused by a floating-point instruction.
1
The exception was caused by a floating-point instruction.
40
ST
Store operation.
0
The exception was not caused by a store-type storage access or cache management
instruction.
1
The exception was caused by a store-type storage access or cache management instruction.
41
Reserved
42:43
DLK
Instruction side machine check.
0
An instruction did not cause an exception.
1
An instruction caused an exception.
Storage synchronization.
0
A storage synchronization exception did not occur.
1
A storage synchronization exception occurred.
This exception occurs when the lwarx or stwcx instructions are issued and both the write-through
(W) and caching-inhibited storage attributes (I) are enabled. See Section 4.5 Storage Attributes on
page 116 for more information.
Reserved.
Data storage interrupt: locking exception.
00
A locking exception did not occur.
01
A dcbf instruction issued in user mode caused the locking exception.
10
An icbi issued in user mode caused the locking exception.
11
Reserved.
Note:
1. The PCRE, PCMP, and PCRF fields are implementation-dependent fields of the ESR and not part of the Power Instruction Set
Architecture (ISA) Version 2.05 Architecture.
Page 180 of 322
Version 2.2
July 31, 2014
User’s Manual
Bits
Field Name
Description
44
AP
Auxiliary processor operation.
0
The exception was not caused by an auxiliary processor instruction.
1
The exception was caused by an auxiliary processor instruction.
This bit is used with program, alignment, data storage interrupt (DSI), and data-side TLB miss interrupt types.
45
PUO
46
BO
Byte ordering exception.
0
A byte ordering exception did not occur.
1
A byte ordering exception occurred.
47
PIE
Program interrupt: imprecise exception.
0
An exception occurred precisely. Save/Restore Register 0 (SRR0) contains the address of
the instruction that caused the exception.
1
An exception occurred imprecisely. SRR0 contains the address of an instruction after the
one that caused the exception.
This field is only set for a floating-point enabled exception type program interrupt when the interrupt
occurs imprecisely due to MSR[FE0,FE1] being set to a nonzero value when an attached floatingpoint unit is already signaling the floating-point enabled exception (FPSCR[FEX] is already ‘1’).
48:58
Reserved
59
PCRE
Program interrupt: condition register enable1.
0
The instruction that caused the exception is not a floating-point CR-updating instruction.
1
The instruction that caused the exception is a floating-point CR-updating instruction.
This field is only defined for a floating-point enabled exception type program interrupt, and then only
when ESR[PIE] = ‘0’.
60
PCMP
Program interrupt: compare1.
0
Instruction that caused the exception is not a floating-point compare type instruction.
1
Instruction that caused the exception is a floating-point compare type instruction.
61:63
PCRF
Program interrupt: condition register field1.
If ESR[PCRE] = ‘1’, this field indicates which CR field was to be updated by the floating-point
instruction that caused the exception.
Program interrupt: unimplemented operation exception.
0
An unimplemented operation exception did not occur.
1
An unimplemented operation exception occurred.
Reserved.
Note:
1. The PCRE, PCMP, and PCRF fields are implementation-dependent fields of the ESR and not part of the Power Instruction Set
Architecture (ISA) Version 2.05 Architecture.
7.4.12 Machine Check Syndrome Register (MCSR)
The MCSR contains status to allow the machine check interrupt handler software to determine the cause of a
machine check exception. Any machine check exception that is handled as an asynchronous interrupt sets
MCSR[MCS] and other appropriate bits of the MCSR. If MSR[ME] and MCSR[MCS] are both set, the
machine takes a machine check interrupt. Section 7.5.2 Machine Check Interrupt on page 186
The MCSR is read into a GPR using mfspr or write to MCSR using mtspr for SPR address of x‘23C’. See
Table A-1 on page 263. Clearing the MCSR is performed using mtspr by placing a ‘1’ in the GPR source
register in all bit positions that are to be cleared in the MCSR, and a ‘0’ in all other bit positions. The data
Version 2.2
July 31, 2014
Page 181 of 322
User’s Manual
L2
DCR
IMP
FPR
IC DC
GPR
Reserved
TLB
MCS
written from the GPR to the MCSR is not direct data, but a mask. A ‘1’ clears the bit, and a ‘0’ leaves the
corresponding MCSR bit unchanged. Note that the SPR address for this clearing operation is x‘33C’. See
Table A-1 on page 263.
Reserved
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
Description
32
MCS
33:35
Reserved
36
TLB
37
IC
I-cache asynchronous error.
38
DC
D-cache error.
39
GPR
GPR parity error.
40
FPR
FPR parity error.
41
IMP
Imprecise machine check.
42
L2
43
DCR
44:63
Reserved
Machine check summary.
UTLB parity error.
Error or system error reported through the L2 cache.
DCR timeout (enabled by CCR2[MCDTO]).
7.5 Interrupt Definitions
Table 7-2 on page 183 provides a summary of each interrupt type, in the order corresponding to their associated IVOR register. The table also summarizes the various exception types that might cause that interrupt
type; the classification of the interrupt; which ESR bits can be set, if any; and which mask bits can mask the
interrupt type, if any.
Page 182 of 322
Version 2.2
July 31, 2014
User’s Manual
Data storage
IVOR3
X
Data machine check
X
IVOR4
External input
IVOR5
Alignment
X
CE
1
X
ME
2
X
ME
2
EE
1
Read access control
X
[FP,AP]
Write access control
X
ST, [FP,AP]
Cache locking
X
DLK, [ST]
lwarx/stwcx. to W = 1 or IL1 = 1
X
SS, [ST]
Instruction storage Execute access control
External Input
Notes
IVOR2
Instruction machine check
DBCR Mask
Machine check
X
MSR Mask
IVOR1
Critical input
ESR
Critical
Critical input
Context
Synchronous
IVOR0
Exception Type
Synchronous,
Precise
Interrupt Type
Synchronous,
Imprecise
IVOR
Asynchronous
Table 7-2. Interrupt and Exception Types (Page 1 of 3)
X
X
Load/store alignment
X
[ST], [FP]
Load/store multiple
X
[ST]
dcbz to W = 1 or I = 1
Notes:
1. Although it is not specified as part of Book E, it is common for system implementations to provide, as part of the interrupt controller,
independent mask and status bits for the various sources of critical input and external input interrupts.
2. Machine check interrupts are not classified as asynchronous nor synchronous. They are also not classified as critical or noncritical
because they use their own unique set of Save/Restore Registers, MCSRR0 and MCSRR1. See Section 7.2.4 Machine Check
Interrupts on page 169 and Section 7.5.2 Machine Check Interrupt on page 186.
3. Debug exceptions have special rules regarding their interrupt classification (synchronous or asynchronous, and precise or imprecise), depending on the particular debug mode being used and other conditions (see Section 7.5.15 Debug Interrupt on page 201).
4. In general, when an interrupt causes a particular ESR bit or bits to be set as indicated in the table, it also causes all other ESR bits
to be cleared. Special rules apply to the ESR[ISMC] field; see Section 7.5.2 Machine Check Interrupt on page 186. If no ESR setting is indicated for any of the exception types within a given interrupt type, the ESR is unchanged for that interrupt type.
The syntax for the ESR setting indication is as follows:
• [xxx] means ESR[xxx] might be set.
• [xxx,yyy,zzz] means any one (or none) of ESR[xxx] or ESR[yyy] or ESR[zzz] might be set, but never more than one.
• {xxx,yyy,zzz} means that any combination of ESR[xxx], ESR[yyy], and ESR[zzz] might be set, including all or none.
• xxx means ESR[xxx] will be set.
5. Unimplemented operation exception type program interrupts can only occur when the PowerPC 476FP core is connected to a
floating-point unit or auxiliary processor, and then only when executing instruction opcodes that are recognized by the floating-point
unit or auxiliary processor but are not implemented within the hardware.
6. Floating-point unavailable and auxiliary processor unavailable interrupts and floating-point enabled and auxiliary processor
enabled exception type program interrupts can only occur when the PowerPC 476FP core is connected to a floating-point unit or
auxiliary processor and then only when executing instruction opcodes that are recognized by the floating-point unit or auxiliary processor.
Version 2.2
July 31, 2014
Page 183 of 322
User’s Manual
Notes
DBCR Mask
MSR Mask
ESR
Critical
Program
Context
Synchronous
IVOR6
Exception Type
Synchronous,
Precise
Interrupt Type
Synchronous,
Imprecise
IVOR
Asynchronous
Illegal instruction
X
PIL
mtspr/mfspr to undefined UM
SPR
X
PIL
Privileged instruction
X
PPR, [AP]
Trap
X
PTR
IOC enabled trap
X
[POT1], [POT2]
FP enabled
X
FP, [PIE], [PCRE],
[PCMP], [PCRF]
6
AP enabled
X
AP
6
Unimplemented operation
X
PUO, [FP,AP]
5
FP unavailable
X
System call
X
AP unavailable
X
IVOR7
FP unavailable
IVOR8
System call
IVOR9
AP unavailable
IVOR10
Decrementer
IVOR11
FIT
IVOR12
Watchdog timer
IVOR13
DTLB miss
DTLB miss
X
IVOR14
ITLB miss
ITLB miss
X
6
Decrementer
X
EE
FIT
X
EE
Watchdog timer
X
X
CE
[ST], [FP,AP]
Notes:
Page 184 of 322
Version 2.2
July 31, 2014
User’s Manual
Notes
DBCR Mask
ESR
MSR Mask
Critical
Debug
Context
Synchronous
IVOR15
Exception Type
Synchronous,
Precise
Interrupt Type
Synchronous,
Imprecise
IVOR
Asynchronous
Trap
X
X
DE IDM
IAC
X
X
DE IDM
3
DAC
X
X
DE IDM
3
ICMP
X
X
DE IDM
3
Branch taken
X
X
DE IDM
3
Return
X
X
DE IDM
3
3
Interrupt
X
X
DE IDM
Unconditional
X
X
DE IDM
Notes:
7.5.1 Critical Input Interrupt
A critical input interrupt occurs when no higher priority exception exists, a critical input exception is presented
to the interrupt mechanism, and MSR[CE] = ‘1’. A critical input exception is caused by the activation of an
asynchronous input to the PowerPC 476FP core. Although the only mask for this interrupt type within the core
is the MSR[CE] bit, system implementations typically provide an alternative means for independently masking
the interrupt requests from the various devices that can collectively activate the PowerPC 476FP core critical
input interrupt request input.
Note: MSR[CE] also enables the watchdog timer interrupt.
When a critical input interrupt occurs, the interrupt processing registers are updated as indicated as follows
(all registers not listed are unchanged), and instruction execution resumes at address
IVPR[IVP] || IVOR0[IVO] || 0b0000.
Version 2.2
July 31, 2014
Page 185 of 322
User’s Manual
Critical Save/Restore Register 0 (CSRR0)
Set to the effective address of the next instruction to be executed.
Set to the contents of the MSR at the time of the interrupt.
Machine State Register (MSR)
ME
Unchanged.
All other MSR bits are set to ‘0’.
Note: Software is responsible for taking any actions that are required by the implementation to clear any critical input exception status such that the critical input interrupt request input signal is deasserted before reenabling MSR[CE] to avoid another redundant critical input interrupt.
7.5.2 Machine Check Interrupt
A machine check interrupt occurs when no higher priority exception exists, a machine check exception is
presented to the interrupt mechanism, and MSR[ME] = ‘1’. The PowerPC architecture specifies machine
check interrupts as neither synchronous nor asynchronous, and the exact causes and details of handling
such interrupts are implementation dependent. Regardless, for PowerPC 476FP core, it is useful to describe
the handling of interrupts caused by various types of machine check exceptions in those terms. The PowerPC
476FP core includes four types of machine check exceptions. They are as follows:
• Instruction synchronous machine check exception
An instruction synchronous machine check exception is caused when a timeout or read error is signaled
on the instruction read PLB interface during an instruction fetch operation.
Such an exception is not presented to the interrupt handling mechanism, however, until such a time as
the execution is attempted of an instruction at an address associated with the instruction fetch for which
the instruction machine check exception was asserted. When the exception is presented, the ESR[ISMC]
bit is set to indicated the type of exception, regardless of the state of the MSR[ME] bit.
If MSR[ME] = ‘1’ when the instruction machine check exception is presented to the interrupt mechanism,
execution of the instruction associated with the exception is suppressed, a machine check interrupt
occurs, and the interrupt processing registers are updated as described in Machine Check Save/Restore
Register 0 (MCSRR0) on page 187. If MSR[ME] = ‘0’, however, the instruction associated with the exception is processed as though the exception did not exist, and a machine check interrupt does not occur
(even if MSR[ME] is subsequently set to ‘1’), although the ESR is still updated as described in Machine
Check Save/Restore Register 0 (MCSRR0).
• Instruction asynchronous machine check exception
An instruction asynchronous machine check exception is caused when one of the following events
occurs:
– An instruction cache parity error is detected.
– The read interrupt request is asserted on the instruction read PLB interface.
Page 186 of 322
Version 2.2
July 31, 2014
User’s Manual
• Data asynchronous machine check exception
A data asynchronous machine check exception is caused when one of the following occurs:
– A timeout, read error, or read interrupt request is signaled on the data read PLB interface during a
data read operation.
– A timeout, write error, or write interrupt request is signaled on the data write PLB interface during a
data write operation.
– A parity error is detected on an access to the data cache.
• TLB asynchronous machine check exception
A TLB asynchronous machine check exception is caused when a parity error is detected on an access to
the TLB.
• GPR asynchronous machine check exception
A parity error is detected in one of the GPRs.
• FPR asynchronous machine check exception
A parity error is detected in one of the FPRs.
When any machine check exception that is handled as an asynchronous interrupt occurs, it is immediately
presented to the interrupt handling mechanism. MCSR[SUM] and other bits of the MCSR, as appropriate, are
set. A machine check interrupt occurs immediately if MSR[ME] = ‘1’, and the interrupt processing registers
are updated as described in the following subsections. If MSR[ME] = ‘0’, however, the exception is recorded
by the setting of the MCSR[SUM] bit and deferred until such time as MSR[ME] is subsequently set to ‘1’.
When the MCSR[SUM] and MSR[ME] are both set to ‘1’, the machine check interrupt is taken. Therefore,
MCSR[SUM] must be cleared by software in the machine check interrupt handler before executing an rfmci
to return to processing with MSR[ME] set to ‘1’.
When a machine check interrupt occurs, the interrupt processing registers are updated as follows. All registers not listed are unchanged, and instruction execution resumes at address
Machine Check Save/Restore Register 0 (MCSRR0)
For an instruction synchronous machine check exception, set to the effective address of the instruction
presenting the exception. For an instruction asynchronous machine check, data asynchronous machine
check, or TLB asynchronous machine check exception, set to the effective address of the next instruction to
be executed.
Machine Check Save/Restore Register 1 (MCSRR1)
All MSR bits are set to ‘0’.
Version 2.2
July 31, 2014
Page 187 of 322
User’s Manual
Exception Syndrome Register (ESR)
ISMC
Set to ‘1’ for an instruction machine check exception; otherwise left unchanged.
All other defined ESR bits are set to ‘0’ for an instruction machine check exception; otherwise, they are left
unchanged.
Note: If an instruction synchronous machine check exception is associated with an instruction, and execution of that instruction is attempted while MSR[ME] = ‘0’, no machine check interrupt occurs, but ESR[ISMC]
is still set to ‘1’ when the instruction executes. When set, ESR[ISMC] cannot be cleared except by software
using the mtspr instruction. When processing a machine check interrupt handler, software should query
ESR[ISMC] to determine the type of machine check exception and then clear ESR[ISMC]. Then, before reenabling machine check interrupts by setting MSR[ME] to ‘1’, software should query the status of ESR[ISMC]
again to determine whether any additional instruction machine check exceptions have occurred while
MSR[ME] was disabled.
Machine Check Syndrome Register (MCSR)
The MCSR collects status for the machine check exceptions that are handled as asynchronous interrupts.
MCSR[SUM] is set by any instruction asynchronous machine check exception, data asynchronous machine
check exception, or TLB asynchronous machine check exception. Other bits in the MCSR are set to indicate
the exact type of machine check exception.
See Section 7.4.12 Machine Check Syndrome Register (MCSR) on page 181 for more information about the
handling of machine check interrupts within the PowerPC 476FP core.
7.5.3 Data Storage Interrupt
A data storage interrupt can occur when no higher priority exception exists and a data storage exception is
presented to the interrupt mechanism. The PowerPC 476FP core includes four types of data storage exception as follows:
• Read access control exception
A read access control exception is caused by one of the following occurrences:
– While in user mode (MSR[PR] = ‘1’), a load, icbi, dcbst, dcbf, dcbi, dcbtls, dcblc, icbtls, or icblc
instruction attempts to access a location in storage that is not enabled for read access in user mode
(that is, the TLB entry associated with the memory page being accessed has UR = ‘0’).
– While in supervisor mode (MSR[PR] = ‘0’), a load, icbi, dcbst, dcbf, dcbi, dcbtls, dcblc, icbtls, or
icblc instruction attempts to access a location in storage that is not enabled for read access in supervisor mode (that is, the TLB entry associated with the memory page being accessed has SR = ‘0’).
Note: The instruction cache management instructions icbi and icbt are treated as loads from the
addressed byte with respect to address translation and protection. These instruction cache management
instructions use MSR[DS] rather than MSR[IS] to determine translation for their target effective address.
Similarly, they use the read access control field (UR or SR) rather than the execute access control field
(UX or SX) of the TLB entry to determine whether a data storage exception should occur. Instruction storage exceptions and instruction TLB miss exceptions are associated with the fetching of instructions not
with the execution of instructions. data storage exceptions and data TLB miss exceptions are associated
with the execution of instruction cache management instructions, and with the execution of load, store,
and data cache management instructions.
Page 188 of 322
Version 2.2
July 31, 2014
User’s Manual
• Write access control exception
A write access control exception is caused by one of the following:
– While in user mode (MSR[PR] = ‘1’), a store, dcbz, dcbtst, or dcbtstls instruction attempts to
access a location in storage that is not enabled for write access in user mode (that is, the TLB entry
associated with the memory page being accessed has UW = ‘0’).
– While in supervisor mode (MSR[PR] = ‘0’), a store, dcbz, dcbtst, or dcbtstls instruction attempts to
access a location in storage that is not enabled for write access in supervisor mode (that is, the TLB
entry associated with the memory page being accessed has SW = ‘0’).
• Cache locking exception
A cache locking exception is caused by one of the following:
– While in user mode (MSR[PR] = ‘1’) with MMUCR[IULXE] = ‘1’, execution of an icbi instruction is
attempted. The exception occurs whether the cache line targeted by the icbi instruction is actually
locked in the instruction cache.
– While in user mode (MSR[PR] = ‘1’) with MMUCR[DULXE] = ‘1’, execution of a dcbf instruction is
attempted. The exception occurs whether the cache line targeted by the dcbf instruction is actually
locked in the data cache.
See Section 5 Instruction and Data Caches on page 133 and Section 4.6.10 MMU Configuration Register
(MMUCR) on page 126 for more information about cache locking and cache locking exceptions, respectively.
If an stwcx. instruction causes a write access control exception, but the processor does not have the reservation from an lwarx instruction, a data storage interrupt does not occur, and the instruction completes,
updating CR[CR0] to indicate the failure of the store due to the lost reservation.
If a data storage exception occurs on any of the following instructions, the instruction is treated as a no-op,
and a data storage interrupt does not occur.
• lswx or stswx with a length of zero (although the target register of lswx will still be undefined, as it is
whether a data storage exception occurs)
• icbt
• dcbt
For all other instructions, if a data storage exception occurs, execution of the instruction causing the exception is suppressed, a data storage interrupt is generated, the interrupt processing registers are updated as
indicated as follows (all registers not listed are unchanged), and instruction execution resumes at address
Save/Restore Register 0 (SRR0)
Set to the effective address of the instruction causing the data storage interrupt.
Version 2.2
July 31, 2014
Page 189 of 322
User’s Manual
CE, ME, DE
Unchanged.
Data Exception Address Register (DEAR)
If the instruction causing the data storage exception does so with respect to the memory page targeted by the
initial effective address calculated by the instruction, the DEAR is set to this calculated effective address.
However, if the data storage exception only occurs due to the instruction causing the exception crossing a
memory page boundary, in that the exception is with respect to the attributes of the page accessed after
crossing the boundary, the DEAR is set to the address of the first byte within that page.
For example, consider a misaligned load word instruction that targets effective address x‘0000 0FFF’, and
that the page containing that address is a 4 KB page. The load word will thus cross the page boundary, and
access the next page starting at address x‘0000 000’. If a read access control exception exists within the first
page because the read access control field for that page is 0, the DEAR is set to x‘0000 0FFF’. However, if
the read access control field of the first page is ‘1’, but the same field is ‘0’ for the next page, the read access
control exception exists only for the second page and the DEAR is set to x‘0000 1000’. Furthermore, the load
word instruction in this latter scenario will have been partially executed (see Section 7.3.1 Partially Executed
Instructions on page 172).
See Section 7.4.11 Exception Syndrome Register (ESR) on page 179.
7.5.4 Instruction Storage Interrupt
An instruction storage interrupt occurs when no higher priority exception exists and an instruction storage
exception is presented to the interrupt mechanism. Note that, although an instruction storage exception can
occur during an attempt to fetch an instruction, such an exception is not actually presented to the interrupt
mechanism until an attempt is made to execute that instruction. The PowerPC 476FP core includes one type
of instruction storage exception, the execute access control exception.
Execute Access Control Exception
An execute access control exception is caused by one of the following:
• While in user mode (MSR[PR] = ‘1’), an instruction fetch attempts to access a location in storage that is
not enabled for execute access in user mode (that is, the TLB entry associated with the memory page
being accessed has UX = ‘0’).
• While in supervisor mode (MSR[PR] = ‘0’), an instruction fetch attempts to access a location in storage
that is not enabled for execute access in supervisor mode (that is, the TLB entry associated with the
memory page being accessed has SX = ‘0’).
Note: Book-III E of Power ISA Version 2.05 defines an additional instruction storage exception: the byte
ordering exception. This exception is defined to assist implementations that cannot support dynamically
switching byte ordering between consecutive instruction fetches or cannot support a given byte order at all.
Page 190 of 322
Version 2.2
July 31, 2014
User’s Manual
The PowerPC 476FP core, however, supports instruction fetching from both big-endian and little-endian
memory pages, so this exception cannot occur.
When an instruction storage interrupt occurs, the processor suppresses the execution of the instruction
causing the instruction storage exception, the interrupt processing registers are updated as indicated as
follows (all registers not listed are unchanged), and instruction execution resumes at address
Set to the effective address of the instruction causing the instruction storage interrupt.
CE, ME, DE
Unchanged.
See Section 7.4.11 Exception Syndrome Register (ESR) on page 179.
7.5.5 External Input Interrupt
An external input interrupt occurs when no higher priority exception exists, an external input exception is
presented to the interrupt mechanism, and MSR[EE] = ‘1’. An external input exception is caused by the activation of an asynchronous input to the PowerPC 476FP core. Although the only mask for this interrupt type
within the core is the MSR[EE] bit, system implementations typically provide an alternative means for independently masking the interrupt requests from the various devices that can collectively activate the core’s
external input interrupt request input.
Note: MSR[EE] also enables the external input decrementer interrupt and fixed interval timer interrupts.
When an external input interrupt occurs, the interrupt processing registers are updated as indicated as
Version 2.2
July 31, 2014
Page 191 of 322
User’s Manual
CE, ME, DE
Unchanged.
Note: Software is responsible for taking any actions that are required by the implementation to clear any
external input exception status (such that the external input interrupt request input signal is deasserted)
before reenabling MSR[EE] to avoid another redundant external input interrupt.
7.5.6 Alignment Interrupt
An alignment interrupt occurs when no higher priority exception exists and an alignment exception is
presented to the interrupt mechanism. An alignment exception occurs if execution of any of the following is
attempted:
• If CCR0[23] (FLSTA) is set, generate an alignment interrupt whenever a load/store instruction is not
operand aligned. This means that halfword requests must be aligned on a halfword boundary, word
requests must be aligned on a word boundary, and doubleword requests (APU/FPU only) must be
aligned on a double word boundary. Requests that are multiples are considered to be word requests. This
interrupt does not apply to strings because they are considered to be byte operations and are thus always
operand aligned
• If an FPU load/store operation is not operand aligned, generate an alignment interrupt.
• A dcbz instruction that targets a memory page that is either write-through required or caching inhibited.
• A lwarx/stwcx request that is not aligned on a word boundary will also generate an alignment interrupt.
If an stwcx. instruction causes an alignment exception, and the processor does not have the reservation from
an lwarx instruction, an alignment interrupt still occurs.
Note: The architecture does not support the use of an unaligned effective address by the lwarx and stwcx.
instructions. If an alignment interrupt occurs due to the attempted execution of one of these instructions, the
alignment interrupt handler must not attempt to emulate the instruction but instead, should treat the instruction as a programming error.
When an alignment interrupt occurs, the processor suppresses the execution of the instruction causing the
alignment exception, the interrupt processing registers are updated as indicated as follows (all registers not
listed are unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR5[IVO] || 0b0000.
Set to the effective address of the instruction causing the alignment interrupt.
CE, ME, DE
Unchanged
Page 192 of 322
Version 2.2
July 31, 2014
User’s Manual
Set to the effective address of the target data operand as calculated by the instruction causing the alignment
exception. Note that for dcbz, this effective address is not necessarily the address of the first byte of the
targeted cache block, but could be the address of any byte within the block (it will be the address calculated
by the dcbz instruction).
FP
Set to ‘1’ if the instruction causing the interrupt is a floating-point load or store; otherwise set
to ‘0’
ST
Set to ‘1’ if the instruction causing the interrupt is a store, dcbz, or dcbi instruction; otherwise
set to ‘0’.
AP
All other defined ESR bits are set to ‘0’.
7.5.7 Program Interrupt
A program interrupt occurs when no higher priority exception exists, a program exception is presented to the
interrupt mechanism, and, for the floating-point enabled form of program exception only, MSR[FE0,FE1] is
nonzero. The PowerPC 476FP core includes the following types of program exception:
• Illegal instruction exception
An illegal instruction exception occurs when execution is attempted of any of the following kinds of
instructions:
– A reserved-illegal instruction.
– When MSR[PR] = ‘1’ (user mode), an mtspr or mfspr that specifies an SPRN value with SPRN5 = ‘0’
(user-mode accessible) that represents an unimplemented Special Purpose Register. For mtspr, this
includes any SPR number other than the XER, LR, CTR, or USPRG0. For mfspr, this includes any
SPR number other than the ones listed for mtspr, plus SPRG4-7, TBU, and TBL.
– A defined instruction that is not implemented within the PowerPC 476FP core and that is not a floating-point instruction. This includes all instructions that are defined for 64-bit implementations only and
mfapidi (see the PowerPC Book-E specification).
– A defined floating-point instruction that is not recognized by an attached floating-point unit or when no
such floating-point unit is attached.
– An allocated instruction that is not implemented within the PowerPC 476FP core and that is not recognized by an attached auxiliary processor (or when no such auxiliary processor is attached).
See Section 2.3 Instruction Classes on page 47 for more information about the PowerPC 476FP core’s
support for defined and allocated instructions.
• Privileged instruction exception
A privileged instruction exception occurs when MSR[PR] = ‘1’ and execution is attempted of any of the
following kinds of instructions:
Version 2.2
July 31, 2014
Page 193 of 322
User’s Manual
– A privileged instruction.
– An mtspr or mfspr instruction that specifies an SPRN value with SPRN[5] = ‘1’ (a privileged instruction exception occurs regardless of whether the SPR referenced by the SPRN value is defined).
• IOC enabled trap exception
An IOC enabled trap exception occurs when an opcode match is made according to IOCR1 and IOCR2
registers and properly enabled in the IOCCR. The operation is converted to a special exception and
issued to IRACC with an indication that the operation is illegal.
The DISS logic faulty confirms the operation when it is equal to the CS tail. ESR[POT1] or ESR[POT2] is
set a result, and a program interrupt is taken.
• Trap exception
A trap exception occurs when any of the conditions specified in a tw or twi instruction are met. However,
if trap debug events are enabled (DBCR0[TRAP] = ‘1’), internal debug mode is enabled
(DBCR0[IDM] = ‘1’), and debug interrupts are enabled (MSR[DE] = ‘1’), a trap exception causes a debug
interrupt to occur rather than a program interrupt.
See Section 8 Debug Facilities on page 217 for more information about trap debug events.
• Unimplemented operation exception
An unimplemented operation exception occurs when execution is attempted of any of the following kinds
of instructions:
– A defined floating-point instruction that is recognized but not supported by an attached floating-point
unit, when floating-point instruction processing is enabled (MSR[FP] = ‘1’).
– An allocated instruction that is not implemented within the PowerPC 476FP core, and is recognized
but not supported by an attached auxiliary processor, when auxiliary processor instruction processing is enabled. The enabling of auxiliary processor instruction processing is implementation-dependent.
• Floating-point enabled exception
A floating-point enabled exception occurs when the execution or attempted execution of a defined floating-point instruction causes FPSCR[FEX] to be set to ‘1’ in an attached floating-point unit. FPSCR[FEX]
is the Floating-Point Status and Control Register Floating-Point Enabled Exception Summary bit.
If MSR[FE0,FE1] is nonzero when the floating-point enabled exception is presented to the interrupt
mechanism, a program interrupt occurs, and the interrupt processing registers are updated as described
below. If MSR[FE0,FE1] are both ‘0’, however, a program interrupt does not occur and the instruction
associated with the exception executes according to the definition of the floating-point unit (see the user’s
manual for the floating-point unit implementation). If MSR[FE0,FE1] are subsequently set to a nonzero
value, and the floating-point enabled exception is still being presented to the interrupt mechanism (that is,
FPSCR[FEX] is still set), a delayed program interrupt occurs, updating the interrupt processing registers
as described below.
See Section 7.2.2.2 Synchronous, Imprecise Interrupts on page 168 for more information about this special form of delayed floating-point enabled exception.
• Auxiliary processor enabled exception
Page 194 of 322
Version 2.2
July 31, 2014
User’s Manual
An auxiliary processor enabled exception might occur due to the execution or attempted execution of an
allocated instruction that is not implemented within the PowerPC 476FP core, but is recognized and supported by an attached auxiliary processor. The cause of such an exception is implementation-dependent.
When a program interrupt occurs, the processor suppresses the execution of the instruction causing the
program exception (for all cases except the delayed form of floating-point enabled exception described previously), the interrupt processing registers are updated as indicated as follows (all registers not listed are
unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR6[IVO] || 0b0000.
Set to the effective address of the instruction causing the program interrupt, for all cases except the delayed
form of floating-point enabled exception described previously.
For the special case of the delayed floating-point enabled exception, where the exception was already being
presented to the interrupt mechanism at the time MSR[FE0,FE1] was changed from 0 to a non-zero value,
SRR0 is set to the address of the instruction that would have executed after the MSR-changing instruction. If
the instruction that set MSR[FE0,FE1] was rfi, rfci, or rfmci, CSRR0 is set to the address to which the rfi,
rfci, or rfmci was returning, and not to the address of the instruction that was sequentially after the rfi, rfci, or
rfmci.
CE, ME, DE
Unchanged.
ISMC
Instruction machine check.
SS
Data storage interrupt (DSI), storage synchronization (lwarx/stwcx to W/I = ‘1’).
POT1
Program interrupt: opcode trap 1. This bit is set if an interrupt occurs and the opcode
matches the opcode specified in IOCR1.
POT2
Program interrupt: opcode trap 2. This bit is set if an interrupt occurs and the opcode
matches the opcode specified in IOCR2.
PIL
Program interrupt, illegal instruction exception.
PPR
Program interrupt, privileged instruction exception.
PTR
Program interrupt, trap instruction exception.
FP
FP operation. This is used with program, alignment, DSI, and data-side TLB miss interrupt
types.
ST
Store operation.
Version 2.2
July 31, 2014
Page 195 of 322
User’s Manual
ISMC
Instruction machine check.
SS
Data storage interrupt (DSI), storage synchronization (lwarx/stwcx to W/I = ‘1’).
DLK[0:1]
Data storage interrupt (DSI), locking exception.
00
01
10
11
No locking exception.
dcbf in user mode to a locked line.
icbi in user mode to a locked line.
Reserved.
AP
AP operation. This is used with program, alignment, DSI, and data-side TLB miss interrupt
types.
PUO
Program interrupt, unimplemented operation exception.
BO
Byte order error.
PIE
Program interrupt, imprecise exception.
PCRE
Program interrupt, condition register (CR) enable.
PCMP
Program interrupt, compare
PCRF
Program interrupt, condition register (CR) field
Note: The ESR[PCRE,PCMP,PCRF] fields are provided to assist the program interrupt handler with the
emulation of part of the function of the various floating-point CR-updating instructions when any of these
instructions cause a precise (nondelayed) floating-point enabled exception type program interrupt. The Power
ISA Version 2.05, Book-III E floating-point architecture defines that when such exceptions occur, the CR is to
be updated even though the rest of the instruction execution can be suppressed. The PowerPC 476FP core,
however, does not support such CR updates when the instruction that is supposed to cause the update is
being suppressed due to the occurrence of a synchronous, precise interrupt. Instead, the PowerPC 476FP
core records in the ESR[PCRE,PCMP,PCRF] fields information about the instruction causing the interrupt to
assist the program interrupt handler software in performing the appropriate CR update manually.
7.5.8 Floating-Point Unavailable Interrupt
A floating-point unavailable interrupt occurs when no higher priority exception exists, an attempt is made to
execute a floating-point instruction that is recognized by an attached floating-point unit, and MSR[FP] = ‘0’.
When a floating-point unavailable interrupt occurs, the processor suppresses the execution of the instruction
causing the floating-point unavailable exception, the interrupt processing registers are updated as follows (all
registers not listed are unchanged), and instruction execution resumes at address
Page 196 of 322
Version 2.2
July 31, 2014
User’s Manual
Set to the effective address of the instruction causing the floating-point unavailable interrupt.
CE, ME, DE
Unchanged.
7.5.9 System Call Interrupt
A system call interrupt occurs when no higher priority exception exists and a system call (sc) instruction is
executed.
When a system call interrupt occurs, the interrupt processing registers are updated as follows (all registers
not listed are unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR8[IVO] || 0b0000.
Set to the effective address of the instruction after the system call instruction.
CE, ME, DE
Unchanged.
7.5.10 Decrementer Interrupt
A decrementer interrupt occurs when no higher priority exception exists, a decrementer exception exists
(TSR[DIS] = ‘1’), and the interrupt is enabled (TCR[DIE] = ‘1’ and MSR[EE] = ‘1’).
Note: MSR[EE] also enables the external input and fixed interval timer interrupts.
When a decrementer interrupt occurs, the interrupt processing registers are updated as follows (all registers
not listed are unchanged), and instruction execution resumes at address
Version 2.2
July 31, 2014
Page 197 of 322
User’s Manual
CE, ME, DE
Unchanged.
Note: Software is responsible for clearing the decrementer exception status by writing to TSR[DIS] before
reenabling MSR[EE] to avoid another redundant decrementer interrupt.
7.5.11 Fixed-Interval Timer Interrupt
A fixed-interval timer interrupt occurs when no higher priority exception exists, a fixed interval timer exception
exists (TSR[FIS] = ‘1’), and the interrupt is enabled (TCR[FIE] = ‘1’ and MSR[EE] = ‘1’). See Section 6 Timer
Facilities on page 157 for more information about fixed interval timer exceptions.
Note: MSR[EE] also enables the external input and decrementer interrupts.
When a fixed interval timer interrupt occurs, the interrupt processing registers are updated as follows (all
CE, ME, DE
Unchanged.
Note: Software is responsible for clearing the fixed interval timer exception status by writing to TSR[FIS],
before reenabling MSR[EE] to avoid another redundant fixed interval timer interrupt.
Page 198 of 322
Version 2.2
July 31, 2014
User’s Manual
7.5.12 Watchdog Timer Interrupt
A watchdog timer interrupt occurs when no higher priority exception exists, a watchdog timer exception exists
(TSR[WIS] = ‘1’), and the interrupt is enabled (TCR[WIE] = ‘1’ and MSR[CE] = ‘1’). See Section 6 Timer
Facilities on page 157 for more information about watchdog timer exceptions.
Note: MSR[CE] also enables the critical input interrupt.
When a watchdog timer interrupt occurs, the interrupt processing registers are updated as follows (all registers not listed are unchanged), and instruction execution resumes at address
ME
Unchanged.
Note: Software is responsible for clearing the watchdog timer exception status by writing to TSR[WIS],
before reenabling MSR[CE] to avoid another redundant watchdog timer interrupt.
7.5.13 Data TLB Error Interrupt
A data TLB error interrupt can occur when no higher priority exception exists and a data TLB miss exception
is presented to the interrupt mechanism. A data TLB miss exception occurs when a load, store, icbi, icbt,
dcbst, dcbf, dcbz, dcbi, dcbt, or dcbtst instruction attempts to access a virtual address for which a valid
TLB entry does not exist. See Section 4 Memory Management Unit on page 103 for more information about
the TLB.
The data TLB error interrupt also includes an operand accessing a page with W = I = '1’, write-through, and
cache-inhibited at the same time.
Note: The instruction cache management instructions icbi and icbt are treated as loads from the addressed
byte with respect to address translation and protection and therefore, use MSR[DS] rather than MSR[IS] as
part of the calculated virtual address when searching the TLB to determine translation for their target storage
address. Instruction TLB miss exceptions are associated with the fetching of instructions, not with the execution of instructions. Data TLB miss exceptions are associated with the execution of instruction cache management instructions and with the execution of load, store, and data cache management instructions.
If an stwcx. instruction causes a data TLB miss exception, and the processor does not have the reservation
from an lwarx instruction, a data TLB error interrupt still occurs.
Version 2.2
July 31, 2014
Page 199 of 322
User’s Manual
If a data TLB miss exception occurs on any of the following instructions, the instruction is treated as a no-op,
and a data TLB error interrupt does not occur:
• lswx or stswx with a length of zero (although the target register of lswx will be undefined)
• icbt
• dcbt
• dcbtst
For all other instructions, if a data TLB miss exception occurs, execution of the instruction causing the exception is suppressed, a data TLB error interrupt is generated, the interrupt processing registers are updated as
Set to the effective address of the instruction causing the data TLB error interrupt.
CE, ME, DE
Unchanged.
If the instruction causing the data TLB miss exception does so with respect to the memory page targeted by
the initial effective address calculated by the instruction, the DEAR is set to this calculated effective address.
However, if the data TLB miss exception only occurs due to the instruction causing the exception crossing a
memory page boundary in that the missing TLB entry is for the page accessed after crossing the boundary,
the DEAR is set to the address of the first byte within that page.
For example, consider a misaligned load word instruction that targets effective address x‘0000 0FFF’, and
that the page containing that address is a 4 KB page. The load word will thus cross the page boundary and
attempt to access the next page starting at address x‘0000 1000’. If a valid TLB entry does not exist for the
first page, the DEAR will be set to x‘0000 0FFF’. However, if a valid TLB entry exists for the first page, but not
for the second, the DEAR will be set to x‘0000 1000’. Furthermore, the load word instruction in this latter
scenario will have been partially executed (see Section 7.3.1 Partially Executed Instructions on page 172).
FP
Set to ‘1’ if the instruction causing the interrupt is a floating-point load or store; otherwise set
to ‘0’.
ST
Set to ‘1’ if the instruction causing the interrupt is a store, dcbz, or dcbi instruction; otherwise
set to ‘0’.
Page 200 of 322
Version 2.2
July 31, 2014
User’s Manual
AP
Set to ‘1’ if the instruction causing the interrupt is an auxiliary processor load or store; otherwise set to ‘0’.
ISMC
Unchanged.
7.5.14 Instruction TLB Error Interrupt
An instruction TLB error interrupt occurs when no higher priority exception exists and an instruction TLB miss
exception is presented to the interrupt mechanism. Note that although an instruction TLB miss exception
might occur during an attempt to fetch an instruction, such an exception is not actually presented to the interrupt mechanism until an attempt is made to execute that instruction. An instruction TLB miss exception
occurs when an instruction fetch attempts to access a virtual address for which a valid TLB entry does not
exist. See Section 4 Memory Management Unit on page 103 for more information about the TLB.
The instruction TLB error interrupt also includes an instruction accessing a page with W = I = ‘1’, writethrough, and cache-inhibited at the same time.
When an instruction TLB error interrupt occurs, the processor suppresses the execution of the instruction
causing the instruction TLB miss exception, the interrupt processing registers are updated as follows (all
Set to the effective address of the instruction causing the instruction TLB error interrupt.
CE, ME, DE
Unchanged.
7.5.15 Debug Interrupt
A debug interrupt occurs when no higher priority exception exists, a debug exception exists in the Debug
Status Register (DBSR), the processor is in internal debug mode (DBCR0[IDM] = ‘1’), and debug interrupts
are enabled (MSR[DE] = ‘1’). A debug exception occurs when a debug event causes a corresponding bit in
the DBSR to be set.
DBCR0[IDM] and MSR[DE] must be set to enable debug interrupts. However, if DBCR0[IDM] is set but
MSR[DE] is not set, the processor operates in trace mode. In trace mode, no debug interrupts occur, but
DBSR is still set. To enable the core to broadcast instruction trace data, additional register settings are
required. See Section 8.2.3 Trace Mode on page 218.
Version 2.2
July 31, 2014
Page 201 of 322
User’s Manual
There are several types of debug exceptions, as follows:
• Instruction address compare (IAC) exception
An IAC debug exception occurs when execution is attempted of an instruction whose address matches
the IAC conditions specified by the various debug facility registers. This exception can occur regardless
of debug mode, and regardless of the value of MSR[DE].
• Data address compare (DAC) exception
A DAC debug exception occurs when the DVC mechanism is not enabled and execution is attempted of
a load, store, icbi, icbt, dcbst, dcbf, dcbz, dcbi, dcbt, or dcbtst instruction whose target storage operand address matches the DAC conditions specified by the various debug facility registers. This exception
can occur if MSR[DE] and DBCR0[IDM] are set.
Note: The instruction cache management instructions icbi and icbt are treated as loads from the
addressed byte with respect to debug exceptions. IAC debug exceptions are associated with the fetching
of instructions not with the execution of instructions. DAC debug exceptions are associated with the execution of instruction cache management instructions and with the execution of load, store, and data
cache management instructions.
• Data value compare (DVC) exception
A DVC debug exception occurs when execution is attempted of a load, store, or dcbz instruction whose
target storage operand address matches the DAC and DVC conditions specified by the various debug
facility registers. This exception can occur if MSR[DE] and DBCR0[IDM] are set.
• Branch taken (BRT) exception
A BRT debug exception occurs when BRT debug events are enabled (DBCR0[BRT] = ‘1’), and execution
is attempted of a branch instruction for which the branch conditions are met. This exception cannot occur
in internal debug mode when MSR[DE] = ‘0’ unless external debug mode or debug wait mode is also
enabled. Table 7-3 on page 202 lists BRT debug event actions.
Table 7-3. BRT Debug Event Actions
DBCR0
MSR[DE]
MSR[DWE] and
JDCR[DWE]
–
–
–
No action.
0
0
–
0
DBSR[BRT] is set through a normal commit
1
–
1
–
–
DBSR[BRT] is set through a faulty commit. Transition to
the STOP state.
1
–
0
–
1
DBSR[BRT] is set through a faulty commit. Transition to
the STOP state.
1
1
0
0
0
No action.
1
1
0
1
0
DBSR[BRT] is set through a faulty commit. A debug interrupt is taken. CSRR0 is set to the address of the branch
instruction.
[BRT]
[IDM]
[EDM]
0
–
1
Action If Event Occurs
• Trap (TRAP) exception
A TRAP debug exception occurs when TRAP debug events are enabled (DBCR0[TRAP] = ‘1’), and execution is attempted of a tw or twi instruction that matches any of the specified trap conditions. This
Page 202 of 322
Version 2.2
July 31, 2014
User’s Manual
exception can occur regardless of debug mode and regardless of the value of MSR[DE]. Table 7-4 lists
TRAP debug event actions.
Table 7-4. TRAP Debug Event Actions
DBCR0
MSR[DE]
MSR[DWE] and
JDCR[DWE]
–
–
–
Program interrupt taken.
0
0
–
0
DBSR[TRAP] is set.
1
–
1
–
–
DBSR[TRAP] is set. Transition to the STOP state.
1
–
0
–
1
1
1
0
0
0
DBSR[TRAP] is set. DBSR[IDE] is set. A program interrupt
is taken. SRR0 is set to the address of the trap instruction.
1
1
0
1
0
DBSR[TRAP] is set. A debug Interrupt taken. CSRR0 is
set to the address of the trap instruction.
[TRAP]
[IDM]
[EDM]
0
–
1
• Return (RET) exception
An RET debug exception occurs when RET debug events are enabled (DBCR0[RET] = ‘1’) and execution is attempted of an rfi, rfci, or rfmci instruction. For rfi, the RET debug exception can occur regardless of debug mode and regardless of the value of MSR[DE]. For rfci or rfmci, the RET debug exception
cannot occur in internal debug mode when MSR[DE] = ‘0’ unless external debug mode or debug wait
mode is also enabled. Table 7-5 on page 203 lists RET debug event actions.
Table 7-5. RET Debug Event Actions
DBCR0
MSR[DE]
MSR[DWE] and
JDCR[DWE]
–
–
–
None.
0
0
–
0
DBSR[RET] is set through a normal commit.
1
–
1
–
–
rfi faulty committed. DBSR[RET] is set. Transition to the
STOP state.
1
–
0
–
1
rfi faulty committed. DBSR[RET] is set. Transition to the
STOP state.
1
1
0
0
0
DBSR[RET] is set through a normal commit. DBSR[IDE] is
set.
1
1
0
1
0
rfi faulty committed. DBSR[RET] is set. A debug interrupt
is taken. CSRR0 is set to the address of the rfi instruction.
[RET]
[IDM]
[EDM]
0
–
1
• Instruction complete (ICMP) exception
An ICMP debug exception occurs when ICMP debug events are enabled (DBCR0[ICMP] = ‘1’), and execution of any instruction is completed. This exception cannot occur in internal debug mode when
MSR[DE] = ‘0’ unless external debug mode or debug wait mode is also enabled. Table 7-6 lists ICMP
debug event actions. Table 7-6 lists ICMP debug event actions.
Version 2.2
July 31, 2014
Page 203 of 322
User’s Manual
Table 7-6. ICMP Debug Event Actions
DBCR0
MSR[DE]
MSR[DWE] and
JDCR[DWE]
–
–
–
No action.
0
0
–
0
No action.
1
–
1
–
–
DBSR[ICMP] is set. Transition to the STOP state.
1
–
0
–
1
1
1
0
0
0
No action.
1
1
0
1
0
DBSR[ICMP] is set. Debug interrupt is taken. CSRR0 is
set to the address of the next instruction to be executed
after the ICMP instruction.
[ICMP]
[IDM]
[EDM]
0
–
1
• Interrupt (IRPT) exception
An IRPT debug exception occurs when IRPT debug events are enabled (DBCR0[IRPT] = ‘1’), and an
interrupt occurs. For noncritical class interrupt types, the IRPT debug exception can occur regardless of
debug mode and regardless of the value of MSR[DE]. For critical class interrupt types, the IRPT debug
exception cannot occur in internal debug mode (regardless of the value of MSR[DE]) unless external
debug mode or debug wait mode is also enabled. Table 7-7 on page 204 lists IRPT debug event actions.
Table 7-7. IRPT Debug Event Actions
DBCR0
MSR[DE]
MSR[DWE] and
JDCR[DWE]
–
–
–
None.
0
0
–
0
DBSR[IRPT] is set.
1
–
1
–
–
DBSR[IRPT] is set. Transition to the STOP state.
1
–
0
–
1
DBSR[IRPT] is set. Transition to the STOP state.
1
1
0
0
0
DBSR[IRPT] is set. DBSR[IDE] is set.
1
1
0
1
0
DBSR[IRPT] is set. A debug interrupt is taken. CSRR0 is set
to the address of the first instruction in the base class interrupt handler.
[RET]
[IDM]
[EDM]
0
–
1
• Unconditional debug event (UDE) exception
A UDE debug exception occurs when an unconditional debug event is signaled over the JTAG interface
to the PowerPC 476FP core. This exception can occur regardless of debug mode and regardless of the
value of MSR[DE]. Table 7-8 lists UDE debug event actions.
Table 7-8. UDE Debug Event Actions (Page 1 of 2)
DBCR0
MSR[DE]
MSR[DWE] and
JDCR[DWE]
0
–
0
DBSR[UDE] is set.
1
–
–
DBSR[UDE] is set. Transition to the STOP state.
[IDM]
[EDM]
0
–
Page 204 of 322
Version 2.2
July 31, 2014
User’s Manual
Table 7-8. UDE Debug Event Actions (Page 2 of 2)
DBCR0
MSR[DE]
MSR[DWE] and
JDCR[DWE]
0
–
1
1
0
0
0
DBSR[UDE] is set.
1
0
1
0
DBSR[UDE] is set. A debug interrupt is taken. CSRR0 is set to the
address of the CS trail at the time of the interrupt flush.
[IDM]
[EDM]
–
The PowerPC 476FP core supports the following four debug modes:
•
•
•
•
Internal debug mode
External debug mode
Debug wait mode
Trace mode
Debug exceptions and interrupts are affected by the debug modes that are enabled at the time of the debug
exception. Debug interrupts occur only when internal debug mode is enabled, although it is possible for
external debug mode or debug wait mode to be enabled as well. The remainder of this section assumes that
internal debug mode is enabled and that external debug mode and debug wait mode are not enabled, at the
time of a debug exception.
See Section 8 Debug Facilities on page 217 for more information about the different debug modes and the
behavior of each of the debug exception types when operating in each of the modes.
Note: It is a programming error for software to enable internal debug mode (by setting DBCR0[IDM] to ‘1’)
while debug exceptions are already present in the DBSR. Software must first clear all DBSR debug exception
status (that is, all fields except IDE, MRR, IAC12ATS, and IAC34ATS) before setting DBCR0[IDM] to ‘1’.
If a stwcx. instruction causes a DAC or DVC debug exception but the processor does not have the reservation from a lwarx instruction, the debug exception is not recorded in the DBSR, and a debug interrupt does
not occur. Instead, the instruction completes and updates CR[CR0] to indicate the failure of the store due to
the lost reservation.
If a DAC exception occurs on an lswx or stswx with a length of zero, the instruction is treated as a no-op, the
debug exception is not recorded in the DBSR, and a debug interrupt does not occur.
If a DAC exception occurs on an icbt, dcbt, or dcbtst instruction that is being no-op’ed due to some other
reason (either the referenced cache block is in a caching inhibited memory page or a data storage or data
TLB miss exception occurs), the debug exception is not recorded in the DBSR, and a debug interrupt does
not occur. However, if the icbt, dcbt, or dcbtst instruction is not being no-op’ed for one of these other
reasons, the DAC debug exception does occur and is handled in the same fashion as other DAC debug
exceptions.
For all other cases, when a debug exception occurs, it is immediately presented to the interrupt handling
mechanism. A debug interrupt occurs immediately if MSR[DE] = ‘1’, and the interrupt processing registers are
updated as described in the following subsections. If MSR[DE] = ‘0’, however, the exception condition
remains set in the DBSR. When MSR[DE] is subsequently set to ‘1’ and the exception condition is still present
in the DBSR, a delayed debug interrupt then occurs either as a synchronous, imprecise interrupt, or as an
asynchronous interrupt, depending on the type of debug exception.
When a debug interrupt occurs, the interrupt processing registers are updated as follows (all registers not
listed are unchanged) and instruction execution resumes at address IVPR[IVP] || IVOR15[IVO] || 0b0000.
Version 2.2
July 31, 2014
Page 205 of 322
User’s Manual
For debug exceptions that occur while debug interrupts are enabled (MSR[DE] = ‘1’), CSRR0 is set as
follows:
• For IAC, BRT, TRAP, and RET debug exceptions, set to the address of the instruction causing the debug
interrupt. Execution of the instruction causing the debug exception is suppressed, and the interrupt is
synchronous and precise.
• For DAC and DVC debug exceptions, if DBCR2[DAC12A] = ‘0’, set to the address of the instruction causing the debug interrupt. Execution of the instruction causing the debug exception is suppressed, and the
interrupt is synchronous and precise.
If DBCR2[DAC12A] = ‘1’, however, DAC and DVC debug exceptions are handled asynchronously, and
CSRR0 is set to the address of the instruction that would have executed next had the debug interrupt not
occurred. This could either be the address of the instruction causing the DAC or DVC debug exception, or
the address of a subsequent instruction.
• For ICMP debug exceptions, set to the address of the next instruction to be executed (the instruction after
the one whose completion caused the ICMP debug exception). The interrupt is synchronous and precise.
Because the ICMP debug exception does not suppress the execution of the instruction causing the
exception, but rather allows it to complete before causing the interrupt, the behavior of the interrupt is different in the special case where the instruction causing the ICMP debug exception is itself setting
MSR[DE] to ‘0’. In this case, the interrupt is delayed and occurs if MSR[DE] is again set to ‘1’, assuming
DBSR[ICMP] is still set. If the debug interrupt occurs in this fashion, it will be synchronous and imprecise,
and CSRR0 will be set to the address of the instruction after the one that set MSR[DE] to ‘1’ (not the one
that originally caused the ICMP debug exception and in so doing set MSR[DE] to ‘0’). If the instruction
that set MSR[DE] to ‘1’ was rfi, rfci, or rfmci, CSRR0 is set to the address to which the rfi, rfci, or rfmci
was returning and not to the address of the instruction that was sequentially after the rfi, rfci, or rfmci.
• For IRPT debug exceptions, set to the address of the first instruction in the interrupt handler associated
with the interrupt type that caused the IRPT debug exception. The interrupt is asynchronous.
• For UDE debug exceptions, set to the address of the instruction that would have executed next if the
debug interrupt had not occurred. The interrupt is asynchronous.
For all debug exceptions that occur while debug interrupts are disabled (MSR[DE] = ‘0’), the debug interrupt
is delayed and occurs if and when MSR[DE] is again set to ‘1’, assuming the debug exception status is still
set in the DBSR. If the debug interrupt occurs in this fashion, CSRR0 is set to the address of the instruction
after the one that set MSR[DE]. If the instruction that set MSR[DE] was rfi, rfci, or rfmci, CSRR0 is set to the
address to which the rfi, rfci, or rfmci was returning, and not to the address of the instruction that was
sequentially after the rfi, rfci, or rfmci. The interrupt is either synchronous and imprecise, or asynchronous,
depending on the type of debug exception, as follows:
• For IAC and RET debug exceptions, the interrupt is synchronous and imprecise.
• For BRT debug exceptions, this scenario cannot occur. BRT debug exceptions are not recognized when
MSR[DE] = ‘0’ if operating in internal debug mode.
• For TRAP debug exceptions, the debug interrupt is synchronous and imprecise. However, under these
conditions (TRAP debug exception occurring while MSR[DE] is 0), the attempted execution of the trap
instruction for which one or more of the trap conditions is met will itself lead to a trap exception type program interrupt. The corresponding debug interrupt that occurs later if debug interrupts are enabled is in
addition to the program interrupt.
Page 206 of 322
Version 2.2
July 31, 2014
User’s Manual
• For DAC and DVC debug exceptions, if DBCR2[DAC12A] = ‘0’, the interrupt is synchronous and imprecise. If DBCR2[DAC12A] = ‘1’, the interrupt is asynchronous.
• For ICMP debug exceptions, this scenario cannot occur in this fashion. ICMP debug exceptions are not
recognized when MSR[DE] = ‘0’ if operating in internal debug mode. However, a similar scenario can
occur when MSR[DE] = ‘1’ at the time of the ICMP debug exception, but the instruction whose completion
is causing the exception is itself setting MSR[DE] to ‘0’. This scenario is described in the subsection
about the ICMP debug exception for which MSR[DE] = ‘1’ at the time of the exception. In that scenario,
the interrupt is synchronous and imprecise.
• For IRPT and UDE debug exceptions, the interrupt is asynchronous.
CE, ME, DE
Unchanged.
7.6 Interrupt Ordering and Masking
Multiple exceptions can exist simultaneously, each of which can cause the generation of an interrupt. Furthermore, the Power ISA architecture does not provide for the generation of more than one interrupt of the same
class (critical or noncritical) at a time. Therefore, the architecture defines that interrupts are ordered with
respect to each other and provides a masking mechanism for certain persistent interrupt types.
When an interrupt type is masked (disabled) and an event causes an exception that would normally generate
an interrupt of that type, the exception persists as a status bit in a register (which register depends upon the
exception type). However, no interrupt is generated. Later, if the interrupt type is enabled (unmasked), and
the exception status has not been cleared by software, the interrupt due to the original exception event will
then finally be generated.
All asynchronous interrupt types can be masked. Machine check interrupts can be masked as well. In addition, certain synchronous interrupt types can be masked. The two synchronous interrupt types that can be
masked are the floating-point enabled exception type program interrupt (masked by MSR[FE0,FE1), and the
IAC, DAC, DVC, RET, and ICMP exception type debug interrupts (masked by MSR[DE]).
Note: When an otherwise synchronous, precise interrupt type is delayed in this fashion through masking,
and the interrupt type is later enabled, the interrupt that is then generated due to the exception event that
occurred while the interrupt type was disabled is then considered a synchronous, imprecise class of interrupt.
To prevent a subsequent interrupt from causing the state information (saved in SRR0/SRR1,
CSRR0/CSRR1, or MCSRR0/MCSRR1) from a previous interrupt to be overwritten and lost, the PowerPC
476FP core performs certain functions. As a first step, upon any noncritical class interrupt, the processor
automatically disables any further asynchronous, noncritical class interrupts (external input, decrementer,
and fixed interval timer) by clearing MSR[EE]. Likewise, upon any critical class interrupt, hardware automatically disables any further asynchronous interrupts of either class (critical and noncritical) by clearing
MSR[CE] and MSR[DE], in addition to MSR[EE]. The additional interrupt types that are disabled by the
Version 2.2
July 31, 2014
Page 207 of 322
User’s Manual
clearing of MSR[CE,DE] are the critical input, watchdog timer, and debug interrupts. For machine check interrupts, the processor automatically disables all maskable interrupts by clearing MSR[ME] and
MSR[EE,CE,DE].
This first step of clearing MSR[EE] (and MSR[CE,DE] for critical class interrupts, and MSR[ME] for machine
checks) prevents any subsequent asynchronous interrupts from overwriting the relevant save/restore registers (SRR0/SRR1, CSRR0/CSRR1, or MCSRR0/MCSRR1) before software can save their contents. The
processor also automatically clears, on any interrupt, MSR[WE,PR,FP,FE0,FE1,IS,DS]. The clearing of these
bits assists in the avoidance of subsequent interrupts of certain other types. However, guaranteeing that
these interrupt types do not occur and thus do not overwrite the save/restore registers also requires the cooperation of system software. Specifically, system software must avoid the execution of instructions that could
cause (or enable) a subsequent interrupt, if the contents of the save/restore registers have not yet been
saved.
7.6.1 Interrupt Ordering Software Requirements
The following list identifies the actions that system software must avoid, before saving the save/restore registers’ contents:
• Reenabling of MSR[EE] (or MSR[CE,DE] in critical class interrupt handlers). This prevents any asynchronous interrupts and in the case of MSR[DE], any debug interrupts, which include both synchronous and
asynchronous types.
• Branching (or sequential execution) to addresses not mapped by the TLB or mapped without execute
access permission. This prevents instruction storage and instruction TLB error interrupts.
• Load, store, or cache management instructions to addresses not mapped by the TLB or not having the
necessary access permission (read or write).
This prevents data storage and data TLB error interrupts.
• Execution of system call (sc) or trap (tw, twi) instructions.
This prevents system call and trap exception type program interrupts.
• Execution of any floating-point instructions. This prevents floating-point unavailable interrupts. Note that
this interrupt would occur upon the execution of any floating-point instruction due to the automatic clearing of MSR[FP]. However, even if software were to reenable MSR[FP], floating-point instructions must still
be avoided to prevent program interrupts due to the possibility of floating-point enabled or unimplemented
operation exceptions.
• Reenabling of MSR[PR]. This prevents privileged instruction exception type program interrupts. Alternatively, software can re-enable MSR[PR], but avoid the execution of any privileged instructions.
• Execution of any auxiliary processor instructions that are not implemented in the PowerPC 476FP core.
This prevents auxiliary processor unavailable interrupts and auxiliary processor enabled and unimplemented operation exception type program interrupts. Note that the auxiliary processor instructions that
are implemented within the PowerPC 476FP core do not cause any of these types of exceptions and can
therefore be executed before software saves the save/restore register contents.
• Execution of any illegal instructions or any defined instructions not implemented within the PowerPC
476FP core (64-bit instructions, mfapidi). This prevents illegal instruction exception type program interrupts.
Page 208 of 322
Version 2.2
July 31, 2014
User’s Manual
• Execution of any instruction that could cause an alignment interrupt. This prevents alignment interrupts.
See Section 7.5.6 Alignment Interrupt on page 192 for a complete list of instructions that might cause
alignment interrupts.
• In the machine check handler, use of the caches and TLBs until any detected parity errors have been corrected. This will avoid additional parity errors.
It is not necessary for hardware or software to avoid critical class interrupts from within noncritical class interrupt handlers (and hence the processor does not automatically clear MSR[CE,ME,DE] upon a noncritical
interrupt) because the two classes of interrupts use different pairs of save/restore registers to save the
instruction address and MSR. The converse, however, is not true. That is, hardware and software must cooperate in the avoidance of both critical and noncritical class interrupts from within critical class interrupt
handlers, even though the two classes of interrupts use different save/restore register pairs. This is because
the critical class interrupt might have occurred from within a noncritical class interrupt handler before the
noncritical class interrupt handler saved SRR0 and SRR1. Therefore, within the critical class interrupt
handler, both pairs of save/restore registers might contain data that is necessary to the system software.
Similarly, the machine check handler must avoid further machine checks and both critical and noncritical
interrupts because the machine check handler might have been called from within a critical or noncritical
interrupt handler.
7.6.2 Interrupt Order
The following is a prioritized listing of the various enabled interrupt types for which exceptions might exist
simultaneously:
1. Synchronous (nondebug) interrupts:
a. Data storage
b. Instruction storage
c. Alignment
d. Program
e. Floating-point unavailable
f. System call
g. Auxiliary processor unavailable
h. Data TLB error
i. Data TLB miss exception
j. Instruction TLB error
k. Instruction TLB miss exception
Only one of these types of synchronous interrupts can have an existing exception generating it at any
given time. This is guaranteed by the exception priority mechanism (see Section 7.7 Exception Priorities
on page 210) and the requirements of the sequential execution model defined by the Power ISA architecture.
2. Machine check
3. Debug
4. Critical input
Version 2.2
July 31, 2014
Page 209 of 322
User’s Manual
5. Watchdog timer
6. External input
7. Fixed-interval timer
8. Decrementer
Even though, as indicated previously, the noncritical, synchronous exception types listed under item1 are
generated with higher priority than the critical interrupt types listed in items 2 - 5, the fact is that these noncritical interrupts are immediately followed by the highest priority existing critical interrupt type without executing
any instructions at the noncritical interrupt handler. This is because the noncritical interrupt types do not automatically clear MSR[ME,DE,CE] and hence, do not automatically disable the critical interrupt types. In all
other cases, a particular interrupt type from the preceding list automatically disable any subsequent interrupts
of the same type and all other interrupt types that are listed after it in the priority order.
7.7 Exception Priorities
Power ISA requires all synchronous (precise and imprecise) interrupts to be reported in program order, as
implied by the sequential execution model. The one exception to this rule is the case of multiple synchronous
imprecise interrupts. Upon a synchronizing event, all previously executed instructions are required to report
any synchronous imprecise interrupt-generating exceptions, and the interrupts are then generated according
to the general interrupt ordering rules outlined in Section 7.6.2 Interrupt Order on page 209. For example, if a
mtmsr instruction causes MSR[FE0,FE1,DE] to all be set, it is possible that a previous floating-point enabled
exception (in the FPSCR) and a previous debug exception (in the DBSR) both are still being presented. In
such a scenario, a floating-point enabled exception type program interrupt occurs first, followed immediately
by a debug interrupt.
For any single instruction attempting to cause multiple exceptions for which the corresponding synchronous
interrupt types are enabled, this section defines the priority order by which the instruction is permitted to
cause a single enabled exception, thus generating a particular synchronous interrupt. Note that it is this
exception priority mechanism, along with the requirement that synchronous interrupts be generated in
program order, that guarantees that at any given time there exists for consideration only one of the synchronous interrupt types listed in item 1 of Section 7.6.2 Interrupt Order on page 209. The exception priority
mechanism also prevents certain debug exceptions from existing in combination with certain other synchronous interrupt-generating exceptions.
This section does not define the permitted setting of multiple exceptions for which the corresponding interrupt
types are disabled. The generation of exceptions for which the corresponding interrupt types are disabled will
have no effect on the generation of other exceptions for which the corresponding interrupt types are enabled.
Conversely, if a particular exception for which the corresponding interrupt type is enabled is shown in the
following sections to be of a higher priority than another exception, the occurrence of that enabled higher
priority exception will prevent the setting of the other exception, independent of whether that other exception’s
corresponding interrupt type is enabled or disabled.
Except as specifically noted in the following subsections, only one of the exception types listed for a given
instruction type is permitted to be generated at any given time, assuming the corresponding interrupt type is
enabled. The priority of the exception types are listed in the following sections, ranging from highest to lowest
within each instruction type.
Page 210 of 322
Version 2.2
July 31, 2014
User’s Manual
Finally, note that machine check exceptions are defined by the PowerPC architecture to be neither synchronous nor asynchronous. Therefore, machine check exceptions are not considered in the remainder of this
section, which specifically addresses the priority of synchronous interrupts.
7.7.1 Exception Priorities for Integer Load, Store, and Cache Management Instructions
The following list identifies the priority order of the exception types that might occur within the PowerPC
476FP core as the result of the attempted execution of any integer load, store, or cache management instruction. Included in this category is the former opcode for the icbt instruction, which is an allocated opcode still
supported by the PowerPC 476FP core.
1. Debug (IAC exception)
2. Instruction TLB error (instruction TLB miss exception)
3. Instruction storage (execute access control exception)
4. Program (illegal instruction exception)
Only applies to the defined 64-bit load, store, and cache management instructions, which are not recognized by the PowerPC 476FP core.
5. Program (privileged instruction)
Only applies to the dcbi instruction, and only occurs if MSR[PR] = ‘1’.
6. Data TLB error (data TLB miss exception).
7. Data storage (all exception types except byte ordering exception).
8. Alignment (alignment exception).
9. Debug (DAC or DVC exception).
10. Debug (ICMP exception.)
7.7.2 Exception Priorities for Floating-Point Load and Store Instructions
476FP core as the result of the attempted execution of any floating-point load or store instruction.
This exception occurs if no floating-point unit is attached to the PowerPC 476FP core or if the particular
floating-point load or store instruction is not recognized by the attached floating-point unit.
5. Floating-point unavailable (floating-point unavailable exception)
This exception occurs if an attached floating-point unit recognizes the instruction, but floating-point
instruction processing is disabled (MSR[FP] = ‘0’).
6. Program (unimplemented operation exception)
This exception occurs if an attached floating-point unit recognizes but does not support the instruction,
and floating-point instruction processing is enabled (MSR[FP] = ‘1’).
Version 2.2
July 31, 2014
Page 211 of 322
User’s Manual
7. Data TLB error (data TLB miss exception)
8. Data storage (all exception types except cache locking exception)
9. Alignment (alignment exception)
10. Debug (DAC or DVC exception)
11. Debug (ICMP exception)
7.7.3 Exception Priorities for Allocated Load and Store Instructions
476FP core as the result of the attempted execution of any allocated load or store instruction.
This exception occurs if no auxiliary processor unit is attached to the PowerPC 476FP core, or if the particular allocated load or store instruction is not recognized by the attached auxiliary processor.
5. Program (privileged instruction exception)
This exception occurs if an attached auxiliary processor unit recognizes the instruction and indicates that
the instruction is privileged, but MSR[PR] = ‘1’.
6. Auxiliary processor unavailable (auxiliary processor unavailable exception)
This exception occurs if an attached auxiliary processor recognizes the instruction but indicates that auxiliary processor instruction processing is disabled (whether auxiliary processor instruction processing is
enabled is implementation-dependent).
This exception occurs if an attached auxiliary processor recognizes but does not support the instruction,
and also indicates that auxiliary processor instruction processing is enabled (whether auxiliary processor
instruction processing is enabled is implementation-dependent).
8. Data TLB error (data TLB miss exception)
9. Data storage (all exception types except cache locking exception)
10. Alignment (alignment exception)
11. Debug (DAC or DVC exception)
7.7.4 Exception Priorities for Floating-Point Instructions (Other)
476FP core as the result of the attempted execution of any floating-point instruction other than a load or store.
Page 212 of 322
Version 2.2
July 31, 2014
User’s Manual
This exception occurs if no floating-point unit is attached to the PowerPC 476FP core or if the particular
floating-point instruction is not recognized by the attached floating-point unit.
5. Floating-point unavailable (floating-point unavailable exception)
This exception occurs if an attached floating-point unit recognizes the instruction but floating-point
instruction processing is disabled (MSR[FP] = ‘0’).
This exception occurs if an attached floating-point unit recognizes but does not support the instruction,
and floating-point instruction processing is enabled (MSR[FP] = ‘1’).
7. Program (floating-point enabled exception)
This exception occurs if an attached floating-point unit recognizes and supports the instruction, floatingpoint instruction processing is enabled (MSR[FP] = ‘1’), and the instruction sets FPSCR[FEX] to ‘1’.
7.7.5 Exception Priorities for Allocated Instructions (Other)
476FP core as the result of the attempted execution of any allocated instruction other than a load or store,
and which is not one of the allocated instructions implemented within the PowerPC 476FP core.
This exception occurs if no auxiliary processor unit is attached to the PowerPC 476FP core or if the particular allocated instruction is not recognized by the attached auxiliary processor and is not one of the
allocated instructions implemented within the PowerPC 476FP core.
This exception occurs if an attached auxiliary processor unit recognizes the instruction and indicates that
the instruction is privileged, but MSR[PR] = ‘1’.
6. Auxiliary processor unavailable (auxiliary processor unavailable exception)
This exception occurs if an attached auxiliary processor recognizes the instruction, but indicates that auxiliary processor instruction processing is disabled (whether auxiliary processor instruction processing is
enabled is implementation-dependent).
This exception occurs if an attached auxiliary processor recognizes but does not support the instruction,
and also indicates that auxiliary processor instruction processing is enabled (whether auxiliary processor
instruction processing is enabled is implementation-dependent).
8. Program (auxiliary processor enabled exception)
Version 2.2
July 31, 2014
Page 213 of 322
User’s Manual
This exception occurs if an attached auxiliary processor recognizes and supports the instruction, indicates that auxiliary processor instruction processing is enabled, and the instruction execution results in
an auxiliary processor enabled exception. Whether auxiliary processor instruction processing is enabled
is implementation-dependent, as is whether a given auxiliary processor instruction results in an auxiliary
processor enabled exception.
7.7.6 Exception Priorities for Privileged Instructions
476FP core as the result of the attempted execution of any privileged instruction other than dcbi, rfi, rfci,
rfmci, or any allocated instruction not implemented within the PowerPC 476FP core (all of which are covered
elsewhere). This list covers, however, the dci, dcread, ici, and icread instructions, which are privileged, allocated instructions that are implemented within the PowerPC 476FP core. This list also covers the defined 64bit privileged instructions and the mfapidi instruction, both of which are not implemented by the PowerPC
476FP core.
Only applies to the defined 64-bit privileged instructions and the mfapidi instruction.
Does not apply to the defined 64-bit privileged instructions or the mfapidi instruction.
Does not apply to the defined 64-bit privileged instructions or the mfapidi instruction.
7.7.7 Exception Priorities for Trap Instructions
476FP core as the result of the attempted execution of a trap (tw, twi) instruction.
4. Debug (trap exception)
5. Program (trap exception)
7.7.8 Exception Priorities for System Call Instruction
476FP core as the result of the attempted execution of a system call (sc) instruction:
Page 214 of 322
Version 2.2
July 31, 2014
User’s Manual
4. System call (system call exception)
Because the system call exception does not suppress the execution of the sc instruction, but rather the
exception occurs when the instruction has completed, an sc instruction can cause both a system call exception and an ICMP debug exception at the same time. In such a case, the associated interrupts occur in the
order indicated in Section 7.6.2 Interrupt Order on page 209.
7.7.9 Exception Priorities for Branch Instructions
476FP core as the result of the attempted execution of a branch instruction:
4. Debug (BRT exception)
7.7.10 Exception Priorities for Return From Interrupt Instructions
476FP core as the result of the attempted execution of an rfi or rfci instruction:
4. Debug (RET exception)
7.7.11 Exception Priorities for Preserved Instructions
476FP core as the result of the attempted execution of a preserved instruction:
Version 2.2
July 31, 2014
Page 215 of 322
User’s Manual
Applies to all preserved instructions except the mftb instruction, which is the only preserved class instruction implemented within the PowerPC 476FP core.
Only applies to the mftb instruction, which is the only preserved class instruction implemented within the
PowerPC 476FP core.
7.7.12 Exception Priorities for Reserved Instructions
476FP core as the result of the attempted execution of a reserved instruction:
Applies to all reserved instruction opcodes except the reserved-no-op instruction opcodes.
Only applies to the reserved-no-op instruction opcodes.
7.7.13 Exception Priorities for All Other Instructions
476FP core as the result of the attempted execution of all other instructions (that is, those not covered in
Section 7.7.1 Exception Priorities for Integer Load, Store, and Cache Management Instructions on page 211
through Section 7.7.12 Exception Priorities for Reserved Instructions on page 216). This includes both
defined instructions and allocated instructions implemented within the PowerPC 476FP core.
Applies only to the defined 64-bit instructions because these are not implemented within the PowerPC
476FP core.
Does not apply to the defined 64-bit instructions because these are not implemented by the PowerPC
476FP core.
Page 216 of 322
Version 2.2
July 31, 2014
User’s Manual
8. Debug Facilities
The PowerPC 476FP Embedded Processor Core includes facilities for debugging during hardware and software development. Debug registers control debug modes and events that are provided by the debug facilities. Developers can control the debug process using these debug modes and debug events.
The debug registers can be accessed through software running on the processor. Also, the Joint Test Action
Group (JTAG) debug port of the PowerPC 476FP core provides access to the debug registers.
The PowerPC 476FP debug facility is core centric. And thus for multiprocessor (MP) debug, the following
methods are recommended:
• Use an external trace logic unit or tool to trigger and gather information and the RISCWatch tool to provide readable information. Consult with IBM PowerPC support team for further details,
• Target one processor of interest and debug the processor.
8.1 Development Tool Support
The RISCWatch product is a development tool that uses external debug mode, debug events, and the JTAG
debug port to implement a hardware and software development tool. The RISCTrace feature of RISCWatch
uses the real-time instruction trace capability of the PowerPC 476FP core.
8.2 Debug Modes
The PowerPC 476FP core provides debug modes for use with particular types of debug tools or operations
that are typically used in embedded systems development. When these debug modes are enabled, debug
events are enabled by setting the corresponding bits in Debug Control Register 0 (DBCR0). These debug
events are recorded in the Debug Status Register (DBSR).
The PowerPC 476FP core supports four debug modes:
•
•
•
•
Internal debug mode
External debug mode
Debug wait enable mode
Trace mode
The Power ISA Book-III E architecture specification focuses only on internal debug mode and the relationship
of debug interrupts to the rest of the interrupt architecture. Internal debug mode is the mode that involves
debug software running on the processor itself, typically in the form of the debug interrupt handler. The other
debug modes, on the other hand, are outside the scope of the Power ISA architecture, and involve specialpurpose debug hardware external to the PowerPC 476FP core, connected either to the JTAG interface (for
external debug mode and debug wait mode) or the trace interface (for trace debug mode). Details of these
interfaces and their operation are beyond the scope of this manual. See the PowerPC 476FP Core Support
Manual and consult with PowerPC support team for further details.
8.2.1 Internal Debug Mode
When internal debug mode is enabled (DBCR0[IDM] = ‘1’), a debug event that sets the DBSR also causes a
debug interrupt if debug exceptions are enabled in the Machine State Register (MSR[DE] = ‘1’). See
Section 7.4.1 on page 173 for information about the MSR. Software at the debug interrupt vector location is
Version 2.2
July 31, 2014
Debug Facilities
Page 217 of 322
User’s Manual
given control when a debug event occurs. Using normal instructions, software can then access all architected
processor resources. This way, debug software can control the processor, gather status, and interact with
debugging hardware connected to the processor.
However, if internal debug mode is enabled, and debug exceptions are not enabled (MSR[DE] = ‘0’), the
processor is operating in trace debug mode. A debug event sets the DBSR, but no debug interrupts occur. To
enable the core to broadcast instruction trace data, additional register settings are required. See
Section 8.2.3 Trace Mode on page 218.
8.2.2 External Debug Mode
External debug mode is enabled by setting DBCR0[EDM]. External debug mode provides access to architected processor resources. It supports stopping, starting, and stepping the processor; setting hardware and
software breakpoints; and monitoring processor status. In this mode, debug events (including a move to
debug-status register [mtdbsr] instruction) are recorded in the DBSR. The debug events then cause a transition to the stop state.
In stop state, normal instruction execution stops to allow an external mechanism to handle the debug event.
In stop state, architected processor resources and memory can be accessed and altered using the JTAG
interface. Also, interrupts are temporarily disabled.
This stop is considered to be hard stop and is caused by the XXC476DEBUGHALT signal being asserted,
JDCR[STOP] being asserted, or by any DBSR debug event bit (any except for IDM or RST) and
DBCR0[EDM] being set. This stop state is exited when DBSR or DBCR0[EDM] is cleared.
A hard stop overrides a weak stop. The CPU must be in a stop (hard or weak) state without MSR[WE] set to
execute a step or stuff request from JTAG.
A weak stop accepts interrupts, causes a transition back to the run state and service the interrupts, and then
go back to the stop state.
8.2.3 Trace Mode
Trace mode is the absence of each of the other modes. That is, if internal debug mode, external debug mode,
and debug wait mode are all disabled, the processor is in trace debug mode. While in trace mode, all debug
events are simply recorded in the DBSR, and are indicated over the trace interface from the PowerPC 476FP
core. The processor does not enter the stop state, and a debug interrupt does not occur. See the PowerPC
476FP Core Support Manual for trace event and trace event trigger information.
Trace mode is an execution mode only. To allow the core to emit instruction trace data to be collected by an
external trace module, CCR0[ITE] must be set and CCR0[DTB] must be cleared.
8.2.4 Debug Wait Enable Mode
Debug wait enable mode is similar to external debug mode. It is set up by either MSR[DWE] = ‘1’ or
JDCR[DWE] = ‘1’. Any event (including an mtdbsr instruction) that sets the DBSR causes a transition to the
stop state. Unlike external debug mode, this is a weak stop request and can be exited by an interrupt or by
clearing DBSR, MSR[DWE], or JDCR[DWE].
Debug Facilities
Page 218 of 322
Version 2.2
July 31, 2014
User’s Manual
8.3 Debug Events
Debug events are used to cause debug exceptions, which are recorded in the DBSR. By setting the corresponding bit in DBCR0 and Debug Control Register 1 (DBCR1), a debug event is enabled to set a DBSR bit.
When the DBSR bit is set, a debug exception occurs. Furthermore, when a DBSR bit is set, if debug mode is
enabled (MSR[DE] = ‘1’), a debug interrupt is generated.
Certain debug events cannot occur when debug mode is not enabled (MSR[DE] = ‘0’). In such situations, no
debug exception occurs, and no DBSR bit is set. Other debug events can cause debug exceptions and set
DBSR bits regardless of the state of MSR[DE]. The associated debug interrupts that result from such debug
exceptions are delayed until MSR[DE] is set to ‘1’, provided the exceptions have not been cleared from DBSR
in the meantime.
Anytime a DBSR bit is set when MSR[DE] = ‘0’, the imprecise debug event (DBSR[IDE]) is set. DBSR[IDE]
indicates that the associated debug exception bit in the DBSR is set while debug interrupts are disabled using
the MSR[DE] bit. Debug interrupt handler software can use this bit to determine whether the address
recorded in the Critical Save/Restore Register 0 (CSRR0) must be interpreted as the address associated with
the instruction causing the debug exception or the address of the instruction after the one that set MSR[DE],
thereby enabling the delayed debug interrupt.
All debug registers are privileged, and therefore, debug set ups are a part of the software kernel. To access
the debug registers, call the debug utility.
The PowerPC 476FP core supports the following debug events:
•
•
•
•
•
•
•
•
Instruction address comparison (IAC)
Data address comparison (DAC)
Trap
Branch taken (BT)
Instruction completed (ICMP)
Interrupt (IRPT)
Return (RET)
Unconditional (UDE)
8.3.1 Broadcast of Debug Events
Debug events are enabled using DBCR0 and one of the previous modes.
All events (including an mtdbsr instruction) that set the DBSR are broadcast using the trace trigger event
bus. The functionality of the trace-trigger bus to user debug facilities depends on the chip-specific implementation. It is solely controlled by debug modes. The broadcast is done in a 4:1 clock ratio, provided by the
system-on-a-chip (SoC). See the documentation for your specific chip implementation for more information.
8.3.2 Exceptions
In general, a debug event causes an exception (sets the DBSR) based on the corresponding DBCR0 bit
being enabled, MSR[DE] being set, and not in stuff state.
However, branch taken (BT) and ICMP events do not cause exceptions if all of the following settings are true:
• DBCR0[IDM] = ‘1’
• MSR[DE] = ‘0’
• DBCR0[EDM] = ‘0’
Version 2.2
July 31, 2014
Debug Facilities
Page 219 of 322
User’s Manual
• MSR[DWE] = ‘0’
Also, an IRPT event that results from a critical interrupt or machine check does not cause an exception if all of
the following settings are true:
• DBCR0[IDM] = ‘1’
• DBCR0[EDM] = ‘0’
• MSR[DWE] = ‘0’
Furthermore, an exception might cause the following actions:
• Hard stop (if DBCR0[EDM] = ‘1’)
• Weak stop (if MSR[DWE] = ‘1’ and JDCR[DWE])
• Interrupt (if DBCR[IDM] = ‘1’ and MSR[DE] = ‘1’)
Notes:
• Stuff state overrides all DBSR settings because debug events are not allowed in stuff state.
• DBSR[IDE] must be set to ‘1’ when setting any DBSR event. Both DBCR0[IDM] and MSR[DE] must also
be set to ‘1’.
See Section 7 Processor Interrupts and Exceptions on page 167 for more information about exceptions and
machine states.
8.3.3 Instruction Address Comparison
IAC debug events occur when execution of an instruction is attempted for which the instruction address and
other parameters match the IAC conditions specified by DBCR0, DBCR1, and the four IAC registers
(IAC1 - IAC4). Depending on the IAC mode specified by DBCR1, these IAC registers can be used to specify
four independent, exact IAC addresses. Also, they can be configured in pairs to specify ranges of instruction
addresses for which IAC debug events must occur.
The IAC registers can be paired as follows:
• IAC1 and IAC2
• IAC3 and IAC4
8.3.3.1 IAC Debug Events
For a given IAC event to occur, the corresponding IAC event enable bit in DBCR0 must be set. DBCR0 and
DBCR1 are used to specify the IAC conditions. The four IAC events, IAC1, IAC2, IAC3, and IAC4 are enabled
by setting the following bits:
•
•
•
•
DBCR0[IAC1]
DBCR0[IAC2]
DBCR0[IAC3]
DBCR0[IAC4]
When a given IAC event occurs, the corresponding DBSR[IAC1], DBSR[IAC2], DBSR[IAC3], or DBSR[IAC4]
bit is set.
IAC events can be enabled to operate in three modes. DBCR1[IAC12M] controls the comparison mode for
the IAC1/IAC2 pair, and DBCR1[IAC34M] controls the comparison mode for the IAC3/IAC4 events. The three
comparison modes are described in the following sections.
Debug Facilities
Page 220 of 322
Version 2.2
July 31, 2014
User’s Manual
8.3.3.2 Exact Comparison Mode
This mode is enabled by setting DBCR1[IAC12M] = ‘00’ and DBCR1[IAC34M] = ‘00’. In this mode, the
instruction address is compared to the value in the corresponding IAC register. The IAC event occurs only if
the comparison is an exact match.
8.3.3.3 Range Inclusive Comparison Mode
This mode is enabled by setting DBCR1[IAC12M] = ‘01’ and DBCR1[IAC34M] = ‘01’. In this mode, the IAC1
or IAC2 event occurs only if the instruction address is within the range defined by the IAC1/IAC2 register
values as follows:
IAC1 ≤ address < IAC2.
Similarly, the IAC3 or IAC4 event occurs only if the instruction address is within the range defined by the
IAC3/IAC4 register values as follows:
IAC3 ≤ address < IAC4.
For a given IAC1/IAC2 or IAC3/IAC4 pair, when the instruction address falls within the specified range, either
one or both of the corresponding IAC debug event bits are set in the DBSR, as determined by which of the
two corresponding IAC event enable bits are set in DBCR0. For example, when the IAC1/IAC2 pair are set to
range inclusive comparison mode, and the instruction address falls within the defined range, DBCR1[IAC1]
and DBCR1[IAC2] determine whether DBSR[IAC1], DBSR[IAC2], or both are set. It is a programming error to
set either of the IAC pairs to a range comparison mode (either inclusive or exclusive) without also enabling at
least one of the corresponding IAC event enable bits in DBCR0.
The IAC range autotoggle mechanism can switch the IAC range mode from inclusive to exclusive or from
exclusive to inclusive. See IAC Range Mode Autotoggle Field on page 222.
8.3.3.4 Range Exclusive Comparison Mode
This mode is enabled by setting DBCR1[IAC12M] = ‘11’ and DBCR1[IAC34M] = ‘11’. In this mode, the IAC1
or IAC2 event occurs only if the instruction address is outside the range defined by the IAC1/IAC2 register
values, as follows:
address < IAC1 or address ³ IAC2.
Similarly, the IAC3 or IAC4 event occurs only if the instruction address is outside the range defined by the
IAC3/IAC4 register values, as follows:
address < IAC3 or address ³ IAC4.
For a given IAC1/IAC2 or IAC3/IAC4 pair, when the instruction address falls outside the specified range,
either one or both of the corresponding IAC debug event bits are set in the DBSR, as determined by which of
the two corresponding IAC event enable bits are set in DBCR0. For example, when the IAC1/IAC2 pair are
set to range exclusive comparison mode, and the instruction address falls outside the defined range,
DBCR1[IAC1] and DBCR1[IAC2] determine whether DBCR1[IAC1], DBCR1[IAC2], or both are set. It is a
programming error to set either of the IAC pairs to a range comparison mode (either inclusive or exclusive)
without also enabling at least one of the corresponding IAC event enable bits in DBCR0.
The IAC range autotoggle mechanism can switch the IAC range mode from inclusive to exclusive, or from
exclusive to inclusive. See IAC Range Mode Autotoggle Field on page 222.
Version 2.2
July 31, 2014
Debug Facilities
Page 221 of 322
User’s Manual
8.3.3.5 IAC User/Supervisor Field
DBCR1[IAC1US], DBCR1[IAC2US], DBCR1[IAC3US], and DBCR1[IAC4US] are the individual IAC
user/supervisor fields for each of the four IAC events. The IAC user/supervisor fields specify what operating
mode the processor must be in order for the corresponding IAC event to occur. The operating mode is determined by the problem state field of the Machine State Register (MSR[PR]). See Section 7.4.1 Machine State
Register (MSR) on page 173. When the IAC user/supervisor field is ‘00’, the operating mode does not matter;
the IAC debug event can occur independent of the state of MSR[PR]. When this field is ‘10’, the processor
must be operating in supervisor mode (MSR[PR] = ‘0’). When this field is ‘11’, the processor must be operating in user mode (MSR[PR] = ‘1’). The IAC user/supervisor field value of ‘01’ is reserved.
If a pair of IAC events (IAC1/IAC2 or IAC3/IAC4) are operating in range inclusive or range exclusive mode, it
is a programming error (and the results of any instruction address comparison are undefined) if the corresponding pair of IAC user/supervisor fields are not set to the same value. For example, if IAC1/IAC2 are operating in one of the range modes, both DBCR1[IAC1US] and DBCR1[IAC2US] must be set to the same value.
8.3.3.6 IAC Effective/Real Address Field
DBCR1[IAC1ER], DBCR1[IAC2ER], DBCR1[IAC3ER], and DBCR1[IAC4ER] are the individual IAC effective/real address fields for each of the four IAC events. The IAC effective/real address fields specify whether
the instruction address comparison is performed using the effective, virtual, or real address. When the IAC
effective/real address field is ‘00’, the comparison is performed using the effective address only; the IAC
debug event can occur independent of the instruction address space (MSR[IS]). When this field is ‘10’, the
IAC debug event occurs only if the effective address matches the IAC conditions and is in virtual address
space 0 (MSR[IS] = ‘0’). Similarly, when this field is ‘11’, the IAC debug event occurs only if the effective
address matches the IAC conditions and is in virtual address space 1 (MSR[IS] = ‘1’). In these latter two
modes, the virtual address space of the instruction is considered, not the entire virtual address. The process
identifier, which forms the final part of the virtual address, is not considered. Finally, the IAC effective/real
address field value of ‘01’ is reserved.
If a pair of IAC events (IAC1/IAC2 or IAC3/IAC4) are operating in range inclusive or range exclusive mode, it
is a programming error if the corresponding pair of IAC effective/real address fields are not set to the same
value. If this occurs, the results of any instruction address comparison are undefined. For example, if
IAC1/IAC2 are operating in one of the range modes, both DBCR1[IAC1ER] and DBCR1[IAC2ER] must be set
to the same value.
8.3.3.7 IAC Range Mode Autotoggle Field
DBCR1[IAC12AT] controls the toggling mechanism for the IAC1/IAC2 events. DBCR1[IAC34AT] controls the
toggle mechanism for the IAC3/IAC4 events. When the IAC mode for one of the pairs of IAC debug events is
set to one of the range modes (either range inclusive or range exclusive), the IAC range mode autotoggle
field corresponding to that pair of IAC debug events controls whether the range mode automatically toggles
from inclusive to exclusive, and from exclusive to inclusive. When the IAC range mode toggle field is set to ‘1’,
toggling is enabled; otherwise, it is disabled. It is a programming error if an IAC range mode autotoggle field is
set to ‘1’ without the corresponding IAC mode field being set to one of the range modes. If this occurs, the
results of any instruction address comparison are undefined.
When toggling is enabled for a pair of IAC debug events, upon each occurrence of an IAC debug event within
that pair the value of the corresponding autotoggle status field in the DBSR (DBSR[IAC12ATS] and
DBSR[IAC34ATS]) is reversed. That is, if the autotoggle status field is set to ‘0’ before the occurrence of the
Debug Facilities
Page 222 of 322
Version 2.2
July 31, 2014
User’s Manual
IAC debug event, it is changed to ‘1’ at the same time that the IAC debug event is recorded in the DBSR.
Conversely, if the autotoggle status field is set to ‘1’ before the occurrence of the IAC debug event, it is
changed to ‘0’ at the same time that the IAC debug event is recorded in the DBSR.
Furthermore, when autotoggle is enabled, the autotoggle status field of the DBSR affects the interpretation of
the IAC mode field of DBCR1. If the autotoggle status field is set to ‘0’, the IAC mode field value of ‘10’ selects
range-inclusive mode, whereas the value of ‘11’ selects range-exclusive mode. However, when the
autotoggle status field is set to ‘1’, the interpretation of the IAC mode field is reversed. That is, the IAC mode
field value of ‘10’ selects range-exclusive mode, whereas the value of ‘11’ selects range-inclusive mode.
The relationship of the IAC mode, IAC range mode autotoggle, and IAC range mode autotoggle status fields
is summarized in Table 8-1.
Table 8-1. IAC Range Mode Toggle Summary
DBCR1
IAC12M/IAC34M
DBCR1
IAC12AT/IAC34AT
DBSR
IAC12ATS/IAC34ATS
IAC Mode
‘10’
‘0’
N/A
Range Inclusive
‘10’
‘1’
‘0’
Range Inclusive
‘10’
‘1’
‘1’
Range Exclusive
‘11’
‘0’
N/A
Range Exclusive
‘11’
‘1’
‘0’
Range Exclusive
‘11’
‘1’
‘1’
Range Inclusive
8.3.4 Data Address Comparison
DAC debug events occur when execution is attempted of a load, store, or cache management instruction for
which the data storage address and other parameters match the DAC conditions specified by DBCR0.
DAC events are written to the DBSR based on the following criteria:
•
•
•
•
The DAC and data value comparison (DVC) exceptions that are enabled
Whether internal debug mode, external debug mode, or debug wait enable mode are enabled.
A DAC address comparison match.
A DVC data comparison match.
The four DAC events are as follows:
•
•
•
•
Data address comparison 1 read (DAC1R)
Data address comparison 1 write (DAC1W)
Data address comparison 2 read (DAC2R) debug events
Data address comparison 2 write (DAC2W)
If these debug events occur in trace mode, the events are recorded in the DBSR when the instruction that
caused the event is committed. However, if debug events occur when the processor is in debug interrupt
mode, the data cache unit (DCU) does not perform a normal confirmation of the instruction, but performs a
faulty confirmation the operation when it is in load write-back (LWB). If the instruction faulty commits, the
debug event is recorded in the DBSR.
8.3.4.1 DAC Debug Event Fields
The following fields in DBCR0 and DBCR2 are used to specify the DAC conditions.
Version 2.2
July 31, 2014
Debug Facilities
Page 223 of 322
User’s Manual
DAC Event Enable Field
DBCR0[DAC1R, DAC1W, DAC2R, DAC2W] are the individual DAC event enables for the two DAC events,
DAC1 and DAC2. For each of the two DAC events, one enable is for DAC read events, and the other is for
DAC write events. Load, dcbt, dcbtst, icbi, and icbt instructions might cause DAC read events, while store,
dcbst, dcbf, dcbi, and dcbz instructions might cause DAC write events. For a given DAC event to occur, the
corresponding DAC event enable bit in DBCR0 for the particular operation type must be set. When a DAC
event occurs, the corresponding DBSR[DAC1R, DAC1W, DAC2R, DAC2W] bit is set. These same DBSR bits
are shared by DVC debug event.
DAC Mode Field
DBCR2[DAC12M] controls the comparison mode for the DAC1 and DAC2 events. There are four comparison
modes supported by the PowerPC 476FP core:
• Exact comparison mode (DBCR2[DAC12M] = ‘00’)
In this mode, the data address is compared to the value in the corresponding DAC register, and the DAC
event occurs only if the comparison is an exact match.
• Address bit mask mode (DBCR2[DAC12M] = ‘01’)
In this mode, the DAC1 or DAC2 event occurs only if the data address matches the value in the DAC1
register, as masked by the value in the DAC2 register. That is, the DAC1 register specifies an address
value, and the DAC2 register specifies an address bit mask that determines which bit of the data address
should participate in the comparison to the DAC1 value. For every bit set to 1 in the DAC2 register, the
corresponding data address bit must match the value of the same bit position in the DAC1 register. For
every bit set to 0 in the DAC2 register, the corresponding address bit comparison does not affect the
result of the DAC event determination.
This comparison mode is useful for detecting accesses to a particular byte address, when the accesses
might be of various sizes. For example, if the debugger is interested in detecting accesses to byte
address x‘00000003’, these accesses might occur because of a byte access to that specific address, or
because of a halfword access to address x‘00000002’, or because of a word access to address
x‘00000000’. By using address bit mask mode and specifying that the low-order two bits of the address
should be ignored (that is, setting the address bit mask in DAC2 to x‘FFFFFFFC’), the debugger can
detect each of these types of access to byte address x‘00000003’.
When the data address matches the address bit mask mode conditions, either one or both of the DAC
debug event bits corresponding to the operation type (read or write) are set in the DBSR, as determined
by which of the corresponding two DAC event enable bits are set in DBCR0. That is, when an address bit
mask mode DAC debug event occurs, the setting of DBCR2[DAC1R, DAC1W, DAC2R, DAC2W] determines whether one or the other or both of the DBSR[DAC1R, DAC1W, DAC2R, DAC2W] bits corresponding to the operation type are set. It is a programming error to set the DAC mode field to address bit
mask mode without also enabling at least one of the four DAC event enable bits in DBCR0.
Debug Facilities
Page 224 of 322
Version 2.2
July 31, 2014
User’s Manual
• Range inclusive comparison mode (DBCR2[DAC12M] = ‘10’)
In this mode, the DAC1 or DAC2 event occurs only if the data address is within the range defined by the
DAC1 and DAC2 register values, as follows:
DAC1 ? address < DAC2.
When the data address falls within the specified range, either one or both of the DAC debug event bits
corresponding to the operation type (read or write) are set in the DBSR, as determined by which of the
corresponding two DAC event enable bits are set in DBCR0. That is, when a range inclusive mode DAC
debug event occurs, the setting of DBCR2[DAC1R, DAC1W, DAC2R, DAC2W] determines whether one
or the other or both of the DBSR[DAC1R, DAC1W, DAC2R, DAC2W] bits corresponding to the operation
type are set. It is a programming error to set the DAC mode field to a range comparison mode (either
inclusive or exclusive) without also enabling at least one of the four DAC event enable bits in DBCR0.
• Range exclusive comparison mode (DBCR2[DAC12M] = ‘11’)
In this mode, the DAC1 or DAC2 event occurs only if the data address is outside the range defined by the
DAC1 and DAC2 register values, as follows:
address < DAC1 or address ? DAC2.
When the data address falls outside the specified range, either one or both of the DAC debug event bits
corresponding to the operation type (read or write) are set in the DBSR, as determined by which of the
corresponding two DAC event enable bits are set in DBCR0. That is, when a range exclusive mode DAC
debug event occurs, the setting of DBCR2[DAC1R, DAC1W, DAC2R, DAC2W] determines whether one
or the other or both of the DBSR[DAC1R, DAC1W, DAC2R, DAC2W] bits corresponding to the operation
type are set. It is a programming error to set the DAC mode field to a range comparison mode (either
inclusive or exclusive) without also enabling at least one of the four DAC event enable bits in DBCR0.
DAC User/Supervisor Field
DBCR2[DAC1US, DAC2US] are the individual DAC user/supervisor fields for the two DAC events. The DAC
user/supervisor fields specify what operating mode the processor must be for the corresponding DAC event
to occur. The operating mode is determined by the Problem State field of the Machine State Register
(MSR[PR]. When the DAC user/supervisor field is ‘00’, the operating mode does not matter—the DAC debug
event may occur independent of the state of MSR[PR]. When this field is ‘10’, the processor must be operating in supervisor mode (MSR[PR] = ‘0’). When this field is ‘11’, the processor must be operating in user
mode (MSR[PR] = ‘1’). The DAC user/supervisor field value of ‘01’ is reserved.
If the DAC mode is set to one of the paired modes (address bit mask mode, or one of the two range modes),
it is a programming error (and the results of any data address comparison are undefined) if DBCR2[DAC1US]
and DBCR2[DAC2US] are not set to the same value.
DAC Effective/Real Address Field
DBCR2[DAC1ER, DAC2ER] are the individual DAC effective/real address fields for the two DAC events.
The DAC effective/real address fields specify whether the instruction address comparison should be
performed using the effective, virtual, or real address for an explanation of these different types of
addresses). When the DAC effective/real address field is ‘00’, the comparison is performed using the effective address only; the DAC debug event may occur independent of the data address space (MSR[DS]). When
this field is ‘10’, the DAC debug event occurs only if the effective address matches the DAC conditions and is
Version 2.2
July 31, 2014
Debug Facilities
Page 225 of 322
User’s Manual
in virtual address space 0 (MSR[DS] = ‘0’). Similarly, when this field is ‘11’, the DAC debug event occurs only
if the effective address matches the DAC conditions and is in virtual address space 1 (MSR[DS] = ‘1’). Note
that in these latter two modes, in which the virtual address space of the data is considered, it is not the entire
virtual address which is considered.
The process ID, which forms the final part of the virtual address, is not considered. Finally, the DAC effective/real address field value of ‘01’ is reserved, and corresponds to the PowerPC Book-E architected real
address comparison mode, which is not supported by the PowerPC 476FP core.
If the DAC mode is set to one of the paired modes (address bit mask mode, or one of the two range modes),
it is a programming error (and the results of any data address comparison are undefined) if DBCR2[DAC1ER]
and DBCR2[DAC2ER] are not set to the same value.
DVC Byte Enable Field
DBCR2[DVC1BE, DVC2BE] are the individual data value compare (DVC) byte enable fields for the two DVC
events. These fields must be disabled (by being set to ‘0000’) for the corresponding DAC debug event to be
enabled. In other words, when any of the DVC byte enable field bits for a given DVC event are set to ‘1’, the
corresponding DAC event is disabled, and the various DAC field conditions are used with the DVC field
conditions to determine whether a DVC event should occur.
8.3.4.2 DAC Debug Events Applied to Instructions that Result in Multiple Storage Accesses
Certain misaligned load and store instructions are handled by making multiple, independent storage
accesses. Similarly, load and store multiple and string instructions that access more than one register result
in more than one storage access. Load and Store Alignment provides a detailed description of the circumstances that lead to such multiple storage accesses being made as the result of the execution of a single
instruction.
Whenever the execution of a given instruction results in multiple storage accesses, the data address of each
access is independently considered for whether or not it will cause a DAC debug event.
8.3.4.3 DAC Debug Events Applied to Various Instruction Types
Various special cases apply to the cache management instructions, the store word conditional indexed
(stwcx.) instruction, and the load and store string indexed (lswx, stswx) instructions, with regards to DAC
debug events. These special cases are as follows:
dcbz
The dcbz instruction is considered store with respect to both storage access control and DAC debug events.
The dcbz instruction directly changes the contents of a given storage location. As “store” operations, they
may cause DAC write debug events.
dcbst, dcbf, dcbi
The dcbst, dcbf, and dcbi instructions are considered loads with respect to storage access control because
they do not change the contents of a given storage location. They might merely cause the data at that storage
location to be moved from the data cache out to memory. However, in a debug environment, the fact that
these instructions might lead to write operations on the external interface is typically the event of interest.
Debug Facilities
Page 226 of 322
Version 2.2
July 31, 2014
User’s Manual
Therefore, these instructions are considered stores with respect to DAC debug events, and might cause DAC
write debug events.
dcbt, dcbtst, icbt
The touch instructions are considered loads, except for dcbtst, which is a store with respect to both storage
access control and DAC debug events. However, these instructions are treated as no-ops if they refer to
caching inhibited storage locations or if they cause data storage or data TLB miss exceptions. Consequently,
if a touch instruction is treated as a no-op for one of these reasons, it does not cause a DAC read debug
event.
However, if a touch instruction is not treated as a no-op for one of these reasons, it might cause a DAC read
debug event.
dcba
The dcba instruction is treated as a no-op, and thus will not cause a DAC debug
event.
icbi
The icbi instruction is considered a load with respect to both storage access
control and DAC debug events, and thus might cause a DAC read debug event.
dci, dcread, ici, icread
The dci and ici instructions do not generate an address. But rather, the dci
instruction affects the entire data cache, and the ici instruction affects the entire
instruction cache. Similarly, the dcread and icread instructions do not generate an
address, but rather an index that is used to select a particular location in the
respective cache, without regard to the storage address represented by that location. Therefore, none of these instructions cause DAC debug events.
stwcx.
If the execution of a stwcx. instruction would otherwise have caused a DAC write
debug event, but the processor does not have the reservation from a lwarx
instruction, the DAC write debug event does not occur because the storage location does not get written.
lswx, stswx
DAC debug events do not occur for lswx or stswx instructions with a length of 0
(XER[TBC] = ‘0’) because these instructions do not access storage.
8.3.4.4 Data Value Compare (DVC) Debug Event
DVC debug events occur when execution is attempted of a load, store, or dcbz instruction for which the data
storage address and other parameters match the DAC conditions specified by DBCR0, DBCR2, and the DAC
registers, and for which the data accessed matches the DVC conditions specified by DBCR2 and the DVC
registers. In other words, for a DVC debug event to occur, the conditions for a DAC debug event must first be
met, and then the data must also match the DVC conditions. In addition to the DAC conditions, there are two
DVC registers DVC1 and DVC2. The DVC registers can be used to specify two independent, 4-byte data
values, which are selectively compared against the data being accessed by a given load, store, or cache
management instruction.
When a DVC event occurs, the corresponding DBSR[DAC1R, DAC1W, DAC2R, DAC2W] bit is set. These
same DBSR bits are shared by DAC debug events.
Version 2.2
July 31, 2014
Debug Facilities
Page 227 of 322
User’s Manual
DVC Debug Event Fields
In addition to the DAC debug event fields described in Section 8.3.4.1 DAC Debug Event Fields on page 223
and the DVC registers themselves, two fields in DBCR2 are used to specify the DVC conditions, as follows:
• DVC byte enable field
DBCR2[DVC1BE, DVC2BE] are the individual DVC byte enable fields for the two DVC events. When one
or the other (or both) of these fields is disabled (by being set to ‘0000’), the corresponding DVC debug
event is disabled (the corresponding DAC debug event can still be enabled, as determined by the DAC
debug event enable field of DBCR0). When either one or both of these fields is enabled (by being set to a
nonzero value), the corresponding DVC debug event is enabled.
Each bit of a given DVC byte enable field corresponds to a byte position within an aligned word of memory.
For a given aligned word of memory, the byte offsets (or byte lanes) within that word are numbered 0, 1,
2, and 3, starting from the left-most (most significant) byte of the word. Accordingly, bits 0:3 of a given
DVC byte enable field correspond to bytes 0:3 of an aligned word of memory being accessed.
For an access to match the DVC conditions for a given byte, the access must be transferring data on that
given byte position and the data must match the corresponding byte value within the DVC register.
For each storage access, the DVC comparison is made against the bytes that are being accessed within
the aligned word of memory containing the starting byte of the transfer. For example, consider a load
word instruction with a starting data address of x‘01’. The four bytes from memory are located at
addresses x‘01’ - x‘04’, but the aligned word of memory containing the starting byte consists of addresses
x‘00’ - x‘03’. Thus, the only bytes being accessed within the aligned word of memory containing the starting byte are the bytes at addresses x‘00’ - x‘03’, and only these bytes are considered in the DVC comparison. The byte transferred from address x‘04’ is not considered.
• DVC mode field
DBCR2[DVC1M, DVC2M] are the individual DVC mode fields for the two DVC events. Each one of these
fields specifies the particular data value comparison mode for the corresponding DVC debug event.
The PowerPC 476FP core supports three comparison modes:
– AND comparison mode (DBCR2[DVC1M, DVC2M] = ‘01’)
In this mode, all data byte lanes enabled by a DVC byte enable field must be being accessed and
must match the corresponding byte data value in the corresponding DVC1 or DVC2 register.
– OR comparison mode (DBCR2[DVC1M, DVC2M] = ‘10’)
In this mode, at least one data byte lane that is enabled by a DVC byte enable field must be being
accessed and must match the corresponding byte data value in the corresponding DVC1 or DVC2
register.
– AND-OR comparison mode (DBCR2[DVC1M, DVC2M] = ‘11’)
In this mode, the four byte lanes of an aligned word are divided into two pairs, with byte lanes 0 and 1
being in one pair, and byte lanes 2 and 3 in the other pair. The DVC comparison mode for each pair
of byte lanes operates in AND mode, and then the results of these two AND mode comparisons are
ORed together to determine whether a DVC debug event occurs. In other words, a DVC debug event
occurs if either one or both of the pairs of byte lanes satisfy the AND mode comparison requirements.
Debug Facilities
Page 228 of 322
Version 2.2
July 31, 2014
User’s Manual
This mode may be used to cause a DVC debug event upon an access of a particular halfword data
value in either of the two halfwords of a word in memory.
DVC Debug Events Applied to Instructions that Result in Multiple Storage Accesses
Certain misaligned load and store instructions are handled by making multiple, independent storage
accesses. Similarly, load and store multiple and string instructions that access more than one register result
in more than one storage access. Whenever the execution of a given instruction results in multiple storage
accesses, the address and data of each access is independently considered for whether it will cause a DVC
debug event.
Data Matching
There are three modes of data matching: all bytes, any bytes, or halfword match. The bytes that are
compared are determined by the DVC[BE] bits, which also enable the DVC. The modes are set by DVC[M].
The DVC comparison matches against the data as it is read out of memory. Endianness and byte-reversal
are not accounted for when doing this comparison. Therefore, the data bytes must be in the same order as
they are stored in the memory location. The DVC comparison is performed on byte lanes. This means
DVC[0:7] can only match against bytes 0 or 4 of the double word, DVC[8:15] can only match against bytes 1
or 5 of the double word, and so on.
All memory accesses are segmented, or replicated, based on double-word boundary. If an access is not data
word-aligned and crosses the double word boundary, only the valid data from the first double word will be
available for comparison. If an access does not cross the double word boundary, the entire word will be available for comparison. Given that the data will be aligned in the appropriate byte lanes, an unaligned access
might appear to have the data wrapped around in the DVC register.
For example, the data at address x‘0’ is x‘01234567_89abcdef’. An access to the word at location x‘0’ would
yield x‘01234567’, and that is the value that should be stored in the DVC registers for comparison. An access
to the word at location x‘2’ would yield x‘456789ab’. However, this address is unaligned, so the data as
written is not in the appropriate byte lanes. Aligning this data to the appropriate byte lanes yields a DVC value
of x‘89ab4567’ if a match is expected. If a word access is performed to the second half of the double word,
only the valid data from the first double word will be available for comparison. For example, if an access is for
a word at address x‘5’, only the bytes at address x‘5’, x‘6’, and x‘7’ will be available for comparison. The last
byte of the word (address x‘8’) will get fetched by a second operation. As a result, in the case of an unaligned
access, the DVC would have to be configured to only match on the number of bytes that would be accessed
from the first dword.
DVC Debug Events Applied to Various Instruction Types
Various special cases apply to the cache management instructions, the store word conditional indexed
(stwcx.) instruction, and the load and store string indexed (lswx, stswx) instructions, with regards to DVC
debug events. These special cases are as follows:
Version 2.2
July 31, 2014
Debug Facilities
Page 229 of 322
User’s Manual
dcbz
The dcbz instruction is the only cache management instruction that can cause a DVC debug
event. dcbz is the only such instruction that actually writes new data to a storage location (in
this case, an entire 128-byte data L2 cache line is written to zeroes).
stwcx.
If the execution of a stwcx. instruction would otherwise have caused a DVC write debug event,
but the processor does not have the reservation from a lwarx instruction, the DVC write debug
event does not occur because the storage location does not get written.
lswx, stswx DVC debug events do not occur for lswx or stswx instructions with a length of 0
(XER[TBC] = ‘0’) because these instructions do not access storage.
8.3.5 Trap
A trap debug event occurs if trap debug events are enabled (DBCR0[TRAP] = ‘1’), a trap instruction (trap
word [tw] or trap word immediate [twi] is executed, and the conditions specified by the instruction for the trap
are met. Table 8-2 summarizes the behavior and actions.
Table 8-2. Trap Debug Event Actions
DBCR0
MSR[DE]
MSR[DWE]
and
JDCR[DWE]
[TRAP]
[IDM]
[EDM]
0
–
–
–
–
Program interrupt taken.
1
0
0
–
0
DBSR[TRAP] is set.
1
–
1
–
–
1
-
0
–
1
1
1
0
0
0
DBSR[TRAP] is set. DBSR[IDE] is set. A program interrupt is
taken. SRR0 is set to the address of the trap instruction.
1
1
0
1
0
DBSR[TRAP] is set. Debug interrupt is taken. CSRR0 is set to
the address of the trap instruction.
Note: This debug trap is different from opcode traps based on the IOCCR (Instruction Opcode Compare
Control Register.
8.3.6 Branch Taken
A Branch Taken (BRT) debug event occurs if the processor has internal debug mode, external debug mode,
or debug wait mode enabled; BRT debug events are enabled (DBCR0[BRT] = ‘1’); and execution of a branch
instruction whose direction is taken confirmation (that is, either an unconditional branch or a conditional
branch whose branch condition is met).
In internal debug mode, MSR[DE] must be set to ‘1’ for BT events to be recorded in the DBSR. This is
because branch instructions occur frequently. Allowing these common events to be recorded as exceptions in
the DBSR when debug interrupts are disabled through MSR[DE] = ‘1’ results in an inordinate number of
imprecise debug interrupts. Therefore, BT debug events are not recognized if MSR[DE] = ‘0’ at the time of the
execution of the branch instruction, and DBSR[IDE] cannot be set by a branch taken debug event. Table 8-3
summarizes the debug register setting and the actions.
Debug Facilities
Page 230 of 322
Version 2.2
July 31, 2014
User’s Manual
Table 8-3. BRT Debug Event Actions
DBCR0
MSR[DE]
MSR[DWE]
and
JDCR[DWE]
[BRT]
[IDM]
[EDM]
0
–
–
–
–
No action.
1
0
0
–
0
DBSR[BRT] is set through a normal commit.
1
–
1
–
–
DBSR[BRT] is set through a faulty commit. Transition to the
STOP state.
1
–
0
–
1
DBSR[BRT] is set through a faulty commit. Transition to the
STOP state.
1
1
0
0
0
No action.
1
1
0
1
0
DBSR[BRT] is set through a faulty commit. A debug interrupt is
taken. CSRR0 is set to the address of the branch instruction.
8.3.7 Instruction Completed
An ICMP debug event occurs if DBCR0[ICMP] = ‘1’, execution of any instruction is completed, and
MSR[DE] = ‘1’. The IU handles an ICMP debug event as a context synchronizing (rsync, which is similar to
csync [or CSI]) operation. However, the operation associated with the ICMP debug event is issued to its
normal pipe. When enabled as a trace event, the ICMP debug event sets the DBSR on a normal commitment
using the tag in the central scrutinizer (CS). For ICMP debug events in internal debug mode, external debug
mode, or debug wait mode, commitment of the csync operation (or a normal commitment) can occur at any
time to set the DBSR and cause the interrupt or stop.
If execution of an instruction is suppressed because the instruction is causing another exception that is
enabled to generate an interrupt, the attempted execution of that instruction does not cause an ICMP debug
event. However, the system call (sc) instruction does not fall into the category of an instruction whose execution is suppressed because the instruction completes execution and then generates a system call interrupt. In
this case, the ICMP debug exception is also set.
ICMP debug events are not recognized if MSR[DE] = ‘0’ at the time the instruction is executed. Also,
DBSR[IDE] cannot be set by an ICMP debug event. This is because if the common event of instruction
completion is recorded as an exception in the DBSR while debug interrupts are disabled through MSR[DE],
the debug interrupt handler software receives an inordinate number of imprecise debug interrupts every time
debug interrupts are re-enabled using MSR[DE].
When an ICMP debug event occurs, DBSR[ICMP] is set to ‘1’ to record the debug exception, a debug interrupt occurs immediately (provided no higher priority exception is enabled to cause an interrupt), and CSRR0
is set to the address of the instruction after the one causing the ICMP debug exception. Table 8-4 summarizes the debug register setting and the actions.
DBCR0
MSR[DE]
MSR[DWE]
and
JDCR[DWE]
[ICMP]
[IDM]
[EDM]
0
–
–
–
–
No action.
1
0
0
–
0
No action.
Version 2.2
July 31, 2014
Debug Facilities
Page 231 of 322
User’s Manual
DBCR0
MSR[DE]
MSR[DWE]
and
JDCR[DWE]
[ICMP]
[IDM]
[EDM]
1
–
1
–
–
1
–
0
–
1
1
1
0
0
0
No action.
1
1
0
1
0
DBSR[ICMP] is set. A debug interrupt is taken. CSRR0 is set
to the next instruction to be executed after an ICMP instruction.
8.3.8 Return Debug Events
RET debug events occur if DBCR0[RET] = ‘1’ and an attempt is made to execute any of the following instructions:
• Return from interrupt (rfi)
• Return from critical interrupt (rfci),
• Return from machine-check interrupt (rfmci)
When a RET debug event occurs, DBSR[RET] is set to ‘1’ to record the debug exception.
A RET debug event operates similarly to BT events in that RET debug events occur before the rfi, rfci, or
rfmci instruction is executed. That is, CSRR0 points to the rfi, rfci, or rfmci instruction, not to the instruction
to which the rfi, rfci, or rfmci instruction is returning.
If an rfci or rfmci instruction is executed, an RET debug event does not occur if MSR[DE] = ‘0’,
DBCR0[IDM] = ‘1’, and MSR[DWE] = ‘0’. In other words, RET debug events do not occur imprecisely in
internal debug mode if an rfci or rfmci instruction is executed. However, if DBCR0[EDM] = ‘1’ or
MSR[DWE] = ‘1’, RET debug events can occur imprecisely (because MSR[DE] = ‘0’) for the rfci or rfmci
instructions. Setting DBCR0[IDM] = ‘1’ does not affect this.
For the rfi instruction, imprecise RET debug events can occur regardless of debug mode. Table 8-5
describes debug register setting and the actions.
Table 8-5. RET Debug Event Actions
DBCR0
MSR[DE]
MSR[DWE]
and
JDCR[DWE]
[ICMP]
[IDM]
[EDM]
0
–
–
–
–
None.
1
0
0
–
0
DBSR[RET] is set through a normal commit.
1
–
1
–
–
rfi faulty committed. DBSR[RET] is set. Transition to the STOP
state.
1
–
0
–
1
rfi faulty committed. DBSR[RET] is set. Transition to the STOP
state.
1
1
0
0
0
DBSR[RET] is set through a normal commit. DBSR[IDE] is set.
1
1
0
1
0
rfi faulty committed. DBSR[RET] is set. A debug interrupt is
taken. CSRR0 is set to the address of the rfi instruction.
Debug Facilities
Page 232 of 322
Version 2.2
July 31, 2014
User’s Manual
8.3.9 Interrupt Debug Events
IRPT debug events occur when IRPT debug events are enabled (DBCR0[IRPT] = ‘1’) and an interrupt occurs.
When operating in external debug mode or debug wait mode, the occurrence of an IRPT debug event is
recorded in DBSR[IRPT] and causes the processor to enter the stop state and cease processing instructions.
The program counter will contain the address of the instruction that would have executed next had the IRPT
debug event not occurred. Because the IRPT debug event is caused by the occurrence of an interrupt, by
definition this address is that of the first instruction of the interrupt handler for the interrupt type that caused
the IRPT debug event.
When operating in internal debug mode with external debug mode and debug wait mode both disabled (and
regardless of the value of MSR[DE]), an IRPT debug event can only occur because of a noncritical class
interrupt. Critical class interrupts (machine check, critical input, watchdog timer, and debug interrupts) cannot
cause IRPT debug events in internal debug mode (unless also in external debug mode or debug wait mode),
as otherwise the debug interrupt which would occur as the result of the IRPT debug event would by necessity
always be imprecise because the critical class interrupt which would be causing the IRPT debug event would
itself be causing MSR[DE] to be set to ‘0’.
For a noncritical class interrupt which is causing an IRPT debug event while internal debug mode is enabled
and external debug mode and debug wait mode are both disabled, the occurrence of the IRPT debug event is
recorded in DBSR[IRPT]. If MSR[DE] is ‘1’ at the time of the IRPT debug event, a debug interrupt occurs with
CSRR0 set to the address of the instruction that would have executed next had the IRPT debug event not
occurred. Because the IRPT debug event is caused by the occurrence of some other interrupt, by definition
this address is that of the first instruction of the interrupt handler for the interrupt type that caused the IRPT
debug event. If MSR[DE] is ‘0’ at the time of the IRPT debug event, the imprecise debug event (IDE) field of
the DBSR is also set and a Debug interrupt does not occur immediately. Instead, instruction execution
continues, and a debug interrupt occurs if MSR[DE] is set to ‘1’, thereby enabling debug interrupts, assuming
software has not cleared the IRPT debug event status from the DBSR in the meantime. Upon such a delayed
interrupt, the debug interrupt handler software can query the DBSR[IDE] field to determine that the debug
interrupt has occurred imprecisely.
When operating in trace mode, the occurrence of an IRPT debug event is recorded in DBSR[IRPT] and is
indicated over the trace interface, and instruction execution continues. Table 8-6 describes the debug register
setting and the actions.
Table 8-6. IRPT Debug Event Actions
DBCR0
MSR[DE]
MSR[DWE]
and
JDCR[DWE]
[IRPT]
[IDM]
[EDM]
0
–
–
–
–
None.
1
0
0
–
0
DBSR[IRT] is set.
1
–
1
–
–
DBSR[IRT] is set. Transition to the STOP state.
1
–
0
–
1
DBSR[IRT] is set. Transition to the STOP state.
1
1
0
0
0
DBSR[IRPT] is set. DBSR[IDE] is set.
1
1
0
1
0
DBSR[IRPT] is set. A debug interrupt is taken. CSRR0 is set to
the address of the first instruction in the base class interrupt
handler.
Version 2.2
July 31, 2014
Debug Facilities
Page 233 of 322
User’s Manual
8.3.10 Unconditional Debug Events
A UDE occurs immediately upon being set using the JTAG debug port.
When a UDE occurs, DBSR[UDE] is set to ‘1’ to record the debug exception. If MSR[DE] = ‘0’, DBSR[IDE] is
also set to ‘1’ to record the imprecise debug event.
If MSR[DE] = ‘1’ at the time of the unconditional debug exception, a debug interrupt occurs immediately
(provided there exists no higher priority exception that is enabled to cause an interrupt). CSRR0 is set to the
address of the instruction that would have executed next had the interrupt not occurred.
If MSR[DE] = ‘0’ at the time of the UDE, a debug interrupt does not occur. Later, if the UDE has not been
reset by clearing DBSR[UDE], and MSR[DE] is set to ‘1’, a delayed debug interrupt occurs. In this case,
CSRR0 contains the address of the instruction after the one that enabled the debug interrupt by setting
MSR[DE] to ‘1’. Software in the debug interrupt handler can monitor DBSR[IDE] to determine how to interpret
the value in CSRR0. Table 8-7 summarizes the debug register setting and the actions.
Table 8-7. UDE Debug Event Actions
DBCR0
MSR[DE]
MSR[DWE] and
JDCR[DWE]
[IDM]
[EDM]
0
0
–
0
DBSR[UDE] is set.
–
1
–
–
–
0
–
1
1
0
0
0
DBSR[UDE] is set.
1
0
1
0
DBSR[UDE] is set. A debug interrupt is taken. CSRR0 is set to the
address of the CS tail at the time of the interrupt flush.
8.4 Debug Timer Freeze
To maintain the semblance of real time operation while a system is being debugged, DBCR0[FT] can be set
to ‘1’, which will cause all of the timers within the PowerPC 476FP core to stop incrementing or decrementing
for as long as a debug event bit is set in the DBSR, or until DBCR0[FT] is set to ‘0’. See Section 6 Timer
Facilities on page 157 for more information on the operation of the PowerPC 476FP core timers.
Debug Facilities
Page 234 of 322
Version 2.2
July 31, 2014
User’s Manual
8.5 Debug Special Purpose Registers
All debug related registers and their bits descriptions are listed.
RET
DAC2W
DAC2R
DAC1W
DAC1R
IAC4
IAC3
IAC2
IAC1
TRAP
IRPT
BRT
RST
ICMP
IDM
EDM
8.5.1 Debug Control Register 0 (DBCR0)
Reserved
FT
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32
EDM
External debug mode.
0
External debug mode is disabled.
1
External debug mode is enabled.
33
IDM
Internal debug mode.
0
1
Internal debug mode is enabled.
34:35
RST
Reset.
Setting this field starts a software-initiated reset.
00
No action.
01
Core reset.
10
Chip reset.
11
System reset.
Note: Writing ‘01’, ‘10’, or ‘11’ to these bits resets the processor.
36
ICMP
Instruction completed debug event; software single-step.
0
Instruction completed debug events are disabled.
1
Instruction completed debug events are enabled.
37
BRT
Branch taken debug event.
0
Branch taken debug events are disabled.
1
Branch taken debug events are enabled.
38
IRPT
Interrupt debug event.
0
Interrupt debug events are disabled.
1
Interrupt debug events are enabled.
39
TRAP
Trap debug event.
0
Trap debug events are disabled.
1
Trap debug events are enabled.
40
IAC1
Instruction address comparison 1 debug event.
0
IAC 1 debug events are disabled.
1
IAC 1 debug events are enabled.
41
IAC2
0
1
42
IAC3
0
1
Version 2.2
July 31, 2014
Description
Debug Facilities
Page 235 of 322
User’s Manual
Bits
Field Name
Description
43
IAC4
0
1
44
DAC1R
Data address comparison 1 read debug event.
0
DAC 1 read debug events are disabled.
1
DAC 1 read debug events are enabled.
45
DAC1W
Data address comparison 1 write debug event.
0
DAC 1 write debug events are disabled.
1
DAC 1 write debug events are enabled.
46
DAC2R
0
1
DAC 2 read debug events are enabled.
47
DAC2W
0
DAC 2 write debug events are disabled.
1
DAC 2 write debug events are enabled.
48
RET
49:62
Reserved
63
FT
Return debug event.
0
Return debug events are disabled.
1
Return debug events are enabled.
Freeze timers.
0
Freeze timers are disabled.
1
Freeze timers are enabled.
Reserved
IAC34AT
IAC34M
IAC4ER
IAC4US
IAC3ER
IAC3US
Reserved
IAC12AT
IAC12M
IAC2ER
IAC2US
IAC1ER
IAC1US
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:33
IAC1US
Instruction address comparison 1 user/supervisor.
00
Both.
01
Reserved.
10
Supervisor-only (MSR[PR] = ‘0’).
11
User-only (MSR[PR] = ‘1’).
The IAC1US field (and not the IAC2US field) is used when IAC12M is in range mode.
34:35
IAC1ER
Instruction address comparison 1 effective/real.
00
Effective (MSR[IS] = Don’t care).
01
Reserved.
10
Effective (MSR[IS] = ‘0’).
11
Effective (MSR[IS] = ‘1’).
The IAC1ER field (and not the IAC2ER field) is used when IAC12M is in range mode.
36:37
IAC2US
Instruction address comparison 2 user/supervisor. See IAC1US for field values.
38:39
IAC2ER
Instruction address comparison 2 effective/real. See IAC1ER for field values.
Debug Facilities
Page 236 of 322
Description
Version 2.2
July 31, 2014
User’s Manual
Bits
Field Name
Description
40:41
IAC12M
42:46
Reserved
47
IAC12AT
Instruction address comparison autotoggle12.
0
Automatic toggling for IAC1 and IAC2 events is disabled.
1
Automatic toggling for IAC1 and IAC2 events is enabled.
48:49
IAC3US
Instruction address comparison 3 user/supervisor (see IAC1US).
The IAC3US field (and not the IAC4US field) is used when IAC34M is in range mode.
50:51
IAC3ER
Instruction address comparison 3 effective/real (see IAC1ER).
The IIAC1ER field (and not the IAC4ER field) is used when IAC34M is in range mode.
52:53
IAC4US
Instruction address comparison 4 user/supervisor. See IAC1US for field values.
54:55
IAC4ER
Instruction address comparison 4 effective/real. See IAC1ER for field values.
56:57
IAC34M
Instruction address comparison 3/4 mode. See IAC12M for field values.
58:62
Reserved
63
IAC34AT
Instruction address comparison 1/2 mode.
00
Exact match.
Match if address[0:29] is the same as IAC1[0:29] or IAC2[0:29]. These are two independent comparisons.
01
Reserved.
10
Range inclusive.
Match if IAC1 ≤ address < IAC2.
11
Range exclusive.
Match if (address < IAC1) or (IAC2 ≤ address).
Instruction address comparison auto toggle34.
0
Automatic toggling for IAC3 and IAC4 events is disabled.
1
Automatic toggling for IAC3 and IAC4 events is enabled.
Reserved
DAC12M
DAC2ER
DAC2US
DAC1ER
DAC1US
This register controls the operating modes of the data and the address compare registers.
DVC1M DVC2M
Reserved
DVC1BE
Reserved
DVC2BE
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:33
DAC1US
Data address compare 1 user or supervisor.
00
Both user and supervisor.
01
Reserved.
10
Supervisor only. MSR[PR] = ‘0’.
11
User only. MSR[PR] = ‘1’.
34:35
DAC1ER
Data address compare 1 effective or real.
00
Effective. MSR[DS] = don’t care.
01
Reserved.
10
Effective. MSR[DS] = ‘0’.
11
Version 2.2
July 31, 2014
Description
Debug Facilities
Page 237 of 322
User’s Manual
Bits
Field Name
36:37
DAC2US
Data address compare 2 user or supervisor.
00
Both user and supervisor.
01
Reserved.
10
Supervisor only. MSR[PR] = ‘0’.
11
User only. MSR[PR] = ‘1’.
38:39
DAC2ER
Data address compare 2 effective or real.
00
Effective. MSR[DS] = don’t care.
01
Reserved.
10
11
40:41
DAC12M
Data address compare 1 and 2 mode.
00
Exact match. A match occurs if address[0:31] equals either DAC1[0:31] or DAC2[0:31].
Two independent comparisons are performed.
01
Address bit mask. A match occurs if the data address bits selected by DAC2 equal the bit
values in DAC1.
10
Range inclusive. A match occurs if DAC1 ≤ address < DAC2.
11
Range exclusive. A match occurs if either the address < DAC1 or if DAC2 ≤ address.
42:43
Reserved
44:45
DVC1M
Data value compare 1 mode.
00
Reserved.
01
AND all bytes enabled by DVC1BE.
10
OR all bytes enabled by DVC1BE.
11
AND-OR pairs of bytes enabled by DVC1BE (0 AND 1) OR (2 AND 3).
46:47
DVC2M
Data value compare 2 mode.
00
Reserved.
01
AND all bytes enabled by DVC2BE
10
OR all bytes enabled by DVC2BE.
11
AND-OR pairs of bytes enabled by DVC2BE (0 AND 1) OR (2 AND 3).
48:51
Reserved
52:55
DVC1BE
56:59
Reserved
60:63
DVC2BE
Debug Facilities
Page 238 of 322
Description
DVC 1 byte enables 0:3 (see DVC1M on page 238).
DVC 2 byte enables 0:3 (see DVC2M on page 238).
Version 2.2
July 31, 2014
User’s Manual
IAC34ATS
Reserved
IAC12ATS
RET
DAC2W
DAC2R
DAC1W
DAC1R
IAC4
IAC3
IAC2
IAC1
TRAP
IRPT
BRT
MRR
ICMP
UDE
IDE
8.5.4 Debug Status Register (DBSR)
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32
IDE
Imprecise debug event.
33
UDE
Unconditional debug event.
34:35
MRR
Most recent reset type.
These two bits are set to one of three values when reset occurs.
These two bits are undefined at power-up.
00
No reset occurred since these bits last cleared.
01
Core reset.
10
Chip reset.
11
System reset.
36
ICMP
Instruction completed debug event.
37
BRT
Branch taken debug event.
38
IRPT
Interrupt debug event.
39
TRAP
Trap debug event.
40
IAC1
41
IAC2
42
IAC3
43
IAC4
44
DAC1R
45
DAC1W
46
DAC2R
47
DAC2W
48
RET
49:61
Reserved
62
IAC12ATS
Instruction address comparison 1/2 auto toggle status.
63
IAC34ATS
Instruction address comparison 3/4 auto toggle status.
Version 2.2
July 31, 2014
Description
Return debug event.
Debug Facilities
Page 239 of 322
User’s Manual
8.5.5 Setting the DBSR Based on MSR[DE] and DBCR0[IDM]
Table 8-8 summarizes how the DBSR bits are set depending on the setting of MSR[DE] and DBCR0[IDM].
Table 8-8. Setting the DBSR based on MSR[DE] and DBCR0[IDM]
DE
IDM
UDE
IRPT
ICMP
BRT
IAC
DAC
TRAP
RET
0
0
Set
Set
Set
Set
Set
Set
Set
Set
0
1
Set, IDE
Noncritical only.
Set, IDE
-
-
Set, IDE
Set, IDE
Set, IDE
rfi only
Set, IDE
1
0
Set
Set
Set
Set
Set
Set
Set
Set
1
1
Set
Noncritical only.
Set
Set
Set
Set
Set
Set
Set
Reserved
8.5.6 Instruction Address Comparison 1 - 4 (IAC1 - IAC4)
ADDR
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:61
ADDR
62:63
Reserved
Description
The address for matching.
8.5.7 Setup Order for IACs, DACs, and DVCs
The following setup order is required for an IAC, DAC, or DVC to ensure that all signals are initialized in simulation.
mtdbcr0
mtdbsr 0xFFFF
mtiacX
mtdacX
mtdvcX
mtdbcr1
mtdbcr2
isync
mtdbcr0 bits
isync
Debug Facilities
Page 240 of 322
# clear the dbcr0
# clear the dbsr
# enable debug mode and IAC, DAC, DVC
Version 2.2
July 31, 2014
User’s Manual
8.6 JTAG and Debug Capabilities in a Multiprocessor SoC Environment
The PowerPC 476FP core provides RISCWatch and JTAG interfaces, as follows:
• Reset the core
• Stop, halt, and start the core
• Debug
• Trace statuses of the processor and other devices
In addition, the PowerPC 476FP core provides additional debug and status-observation capabilities to debug
and monitor processors cores and other cores in an MP SoC environment.
The following capabilities, which are defined by the Power.org Common Debug Interface Technical
committee, are implemented:
• Observe individual DBSR status bits under JTAG control (using the Debug Bus Out Mask Register)
• Stop and run processors by JTAG control (using the Debug Bus Input Mask Register)
8.6.1 Debug Bus Out Mask Register (DBOMask)
Figure 8-1 illustrates how the DBOMask Register traces the status of processors (bits are individually
observed).
Figure 8-1. JTAG-Controlled MP DBSR Monitor Capability
Debug Status
Debug Status
Register
(32 Bits)
DBO (JTAG Pinout)
Mask
32 Bits
32 Bits
DBIMask Register
JTAG Input
Version 2.2
July 31, 2014
Debug Facilities
Page 241 of 322
User’s Manual
The DBOMask Register is used to allow observation of an individual bit of a trace trigger event type.
DBOMask
0
1
2
3
4
5
6
Bits
Field Name
0:13
DBOMask
14:31
Reserved
7
Reserved
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Description
This 14-bit field corresponds to C476_TRCTRGGEREVENTTYPE[0:13] to enable the appropriate
bit.
8.6.2 Debug Input Mask Register (DBIMask)
Figure 8-2 illustrates the JTAG interface stop and run control capability (stop and run is controlled by JTAG).
Figure 8-2. JTAG-Controlled MP Stop and Run Control Capability.
Other Halts
Debug Interface
Debug Interface
Register
Mask
JTAG Halt
OR
Processor Halt
DBIMask Register
JTAG Inputs
0
1
2
3
4
TEIMASK
SSMASK
UCEMASK
This register is used to stop and start the processor through JTAG control. When a DBIMask Register bit is
set and the corresponding debug event bit is set, the processor is stopped. When either the debug event bit
or the DBIMask bit is cleared, the processor starts running again.
5
6
Reserved
7
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bits
Field Name
Description
0:4
SSMASK
When a bit of this field is set to ‘1’, a corresponding bit of the DBGC476SYSTEMSTATUS0-4 input
bus can stop the processor.
5
UCEMASK
When this bit is set to ‘1’, an asserted input signal XXC476UNCONDEVENT can stop the processor.
6
TEIMASK
When this bit is set to ‘1’, an asserted input signal XXC476TRIGGEREVENTIN can stop the processor.
7:31
Reserved
Debug Facilities
Page 242 of 322
Version 2.2
July 31, 2014
User’s Manual
9. Initialization
This section describes the initial state of the PowerPC 476FP processor core after a hardware reset, and
contains a description of the initialization software that is required to complete the initialization so that the
PowerPC 476FP processor core can begin executing application code. Initialization of other on-chip or offchip system components might also be required.
9.1 Processor Core State after Reset
In general, the contents of registers and other facilities within the PowerPC 476FP processor core are undefined after a hardware reset. Reset is defined to initialize only the minimal resources required so that instructions can be fetched and run from the initial program-memory page, and so that repeatable, deterministic
behavior can be guaranteed if the correct software initialization sequence is followed. System software must
fully configure the remainder of the PowerPC 476FP processor core resources and the other facilities within
the chip or system.
The following list summarizes the processor state immediately after reset.
• All fields of the Machine State Register (MSR) are set to ‘0’, disabling all asynchronous interrupts, placing
the processor in supervisor mode, and specifying that instruction and data accesses are to the system
(as opposed to the application) address space.
• DBCR0[RST] is set to ‘0’, thereby ending any previous software-initiated reset operation.
• The Debug Status Register (DBSR) most-recent reset (MRR) field records the type of the just-ended
reset operation (core, chip, or system; see Reset Types on page 249).
• The Timer Control Register (TCR) watchdog-timer reset control (WRC) field is set to ‘00’, thereby disabling the watchdog timer reset operation.
• The Timer Status Register (TSR) watchdog timer reset status (WRS) records the type of the just-ended
reset operation, if the reset was initiated by the watchdog timer. Otherwise, this field is unchanged from its
prereset value.
• The Processor Version Register (PVR) is defined, after reset and otherwise, to contain a value that indicates the specific processor version number.
• The program counter (PC) is set to x‘FFFF FFFC’: the effective address (EA) of the last word of the
address space.
The memory management resources are set to values such that the processor is able to successfully fetch
and execute instructions and read (but not write) data within the 4 KB program memory page located at the
end of the 32-bit effective address space. Exactly how this is accomplished is implementation-dependent. For
example, a translation lookaside buffer (TLB) entry might be established in a manner that is visible to software that uses the TLB management instructions. Regardless of how the implementation enables access to
the initial program memory page, instruction execution starts at the effective address of x‘FFFF FFFC’. The
instruction at this address must be an unconditional branch backwards to the start of the initialization
sequence, which must lie somewhere within the first 4 KB program-memory page. The real address to which
the initial effective address is translated is also implementation- or system-dependent, as are the various
storage attributes of the initial program-memory page, such as the caching inhibited and endian attributes.
Note: In the PowerPC 476FP processor core, a single entry is established in the instruction shadow TLB
(ITLB) and data shadow TLB (DTLB) at reset with the properties described in Table 9-1 Reset Values of Registers and Other PowerPC 476FP Facilities on page 244. Initialization software must insert an entry into the
Version 2.2
July 31, 2014
Initialization
Page 243 of 322
User’s Manual
unified translation lookaside buffer to cover this same memory region before performing any context-synchronizing operation (including causing any exceptions that might lead to an interrupt), because a context-synchronizing operation invalidates the shadow TLB entries.
Initialization software should consider all other resources within the PowerPC 476FP processor core to be
undefined after reset, in order for the initialization sequence to be compatible with other PowerPC implementations. However, additional resources are initialized by reset to guarantee correct and deterministic operation of the processor during the initialization sequence. Table 9-1 shows the reset state of all PowerPC 476FP
processor core resources that are defined to be initialized by reset. Although certain other register fields and
other facilities within the PowerPC 476FP processor core are affected by reset, this is neither an architectural
nor a hardware requirement, and software must treat those resources as undefined. Likewise, even those
resources that are included in Table 9-1 but which are not identified in the previous list as being architecturally required, should be treated as undefined by the initialization software.
During chip initialization, some chip control registers must be initialized to ensure correct chip operation.
Peripheral devices can also be initialized as appropriate for the system design.
Table 9-1. Reset Values of Registers and Other PowerPC 476FP Facilities (Page 1 of 6)
Resource
MSR
Field
Reset Value
Comment
WE
0
The wait state is disabled.
CE
0
Asynchronous critical interrupts are disabled.
EE
0
Asynchronous noncritical interrupts are disabled.
PR
0
The processor is in supervisor mode.
FP
0
The processor cannot execute floating-point instructions.
ME
0
Machine-check interrupts are disabled.
FE0
0
Floating-point enabled interrupts are disabled.
DWE
0
Debug wait mode is disabled.
DE
0
Debug interrupts are disabled.
FE1
0
Floating-point enabled interrupts are disabled.
IS
0
Instruction fetch access is to system-level virtual address space.
DS
0
Data access is to system-level virtual address space.
PMM
0
Gathering statistics for marked processes is disabled.
Note: TLB entry refers to an entry in the shadow instruction and data TLB arrays that is automatically configured by the PowerPC
476FP processor core to enable fetching and reading (but not writing) from the initial program memory page. The PowerPC 476FP processor core also automatically initializes one entry in the UTLB with the same data at reset to avoid accidental erasure of the initial TLB
entry by MP initialization coding.
Initialization
Page 244 of 322
Version 2.2
July 31, 2014
User’s Manual
Resource
CCR0
CCR1
CCR2
Field
Reset Value
PRE
0
Semirecoverable parity mode enabled for the data cache.
CRPE
0
Disable parity information reads.
ICS
0
The icbi request size is set to 32-byte.
DAPUIB
0
Enable broadcast of instruction data to auxiliary processor interface.
ICWRIDX[0:3]
0000
DTB
0
Enable broadcast of trace information.
FLSTA
0
No alignment exception occurs on integer storage access instructions, regardless of
alignment.
DQWPM[0:1]
00
Data cache quadword prediction is disabled.
IQWPM[0:1]
00
Instruction cache quadword prediction mode is set to use EA[19].
GPRPEI
00
Does not record GPR parity errors.
FPRPEI
00
Does not record FPR parity errors.
ICDPEI
00
Records odd data parity errors in the instruction cache.
ICLPEI
00
Records odd LRU parity errors in the instruction cache.
ICTPEI
00
Records odd tag parity errors in the instruction cache
DCTPEI
00
Records odd tag parity errors in the data cache
DCDPEI
00
Records odd data parity errors in the data cache.
DCLPEI
00
Records odd LRU parity errors in the data cache.
MMUTPEI
0
Records odd tag parity errors in the memory management unit (MMU).
MMUDPEI
0
Records odd data parity errors in the MMU.
TSS
0
Selects the timer clock source.
DPC
0
Disables or enables parity checking in the L1 cache core.
TCS
00
Determines what clock frequency runs the timers.
DSTG
00
Stores to all bytes within an L1 halfline can be gathered into a single transfer.
DLFPD
0
Line fill match prediction is enabled.
DSTI
0
When context synchronization occurs, invalidate shadow TLBs (ITLB, DTLB).
PMUD
0
Enables or disables performance monitor unit (PMU) counting.
DCSTGW
0
Disables or enables cacheable store gathering with write-through.
00
Determines how long a store request remains in the store buffer queue (SBQ)
before a write request is sent.
DISTG
0
Disables or enables cache inhibited store gathering.
SPC5C1
0
Enables or disables AT field static branch predict.
MCDTO
0
Enables or disables DCR timeout machine check.
STGCTR
Comment
The index value to write to the instruction cache.
Version 2.2
July 31, 2014
Initialization
Page 245 of 322
User’s Manual
Resource
DBCR0
Field
Reset Value
EDM
0
IDM
0
Disables or enables internal debug mode.
RST
00
Software-initiated debug reset is disabled.
ICMP
0
Instruction-completion debug events are disabled.
BRT
0
Branch-taken debug events are disabled.
IRPT
0
Interrupt debug events are disabled.
TRAP
0
Disables or enables trap debug events.
IAC1
0
Instruction address compare 1 (IAC1) debug events are disabled.
IAC2
0
IAC2 debug events are disabled.
IAC3
0
IAC4
0
DAC1R
0
DAC1W
0
DAC2R
0
DAC2W
0
RET
0
Return debug events are disabled.
FT
0
Freeze timers are disabled.
IDE
0
An imprecise debug event has not occurred.
UDE
0
An unconditional debug event has not occurred.
MRR
DBSR
Comment
The MRR indicates the most recent type of reset.
Value Type of Reset
00
No reset since this field was last cleared by software.
Reset-dependent
01
Core reset.
10
Chip reset.
11
System reset.
ICMP
0
The instruction completion debug event has not occurred.
BRT
0
The branch taken debug event has not occurred.
IRPT
0
The interrupt debug event has not occurred.
TRAP
0
The trap debug event has not occurred.
IAC1
0
The IAC1 debug event has not occurred.
IAC2
0
IAC3
0
IAC4
0
Initialization
Page 246 of 322
Version 2.2
July 31, 2014
User’s Manual
Resource
Field
Reset Value
DAC1R
0
The data address compare 1 (DAC1) read debug event has not occurred.
DAC1W
0
The DAC1 write debug event has not occurred.
DAC2R
0
The DAC2 read debug event has not occurred.
DAC2W
0
The DAC2 write debug event has not occurred.
RET
0
The return debug event has not occurred.
IAC12ATS
0
Instruction address comparison 1/2 auto toggle status is disabled.
IAC34ATS
0
Instruction address comparison 3/4 auto toggle status is disabled.
ESR
ISMC
0
The synchronous instruction machine-check exception has not occurred.
MCSR
MCS
0
The asynchronous instruction machine-check exception has not occurred.
VE0
0
VE1
0
VE2
0
VE3
0
VE4
0
VE5
0
ORD1
0000
ORD2
0000
ORD3
0000
ORD4
0000
ORD5
0000
ORD6
0000
ORD7
0000
ORD1
0000
ORD2
0000
ORD3
0000
ORD4
0000
ORD5
0000
ORD6
0000
ORD7
0000
MMUBE0
MMUBE1
SSPCR
USPCR
Comment
Only 4KB pages are searched.
Version 2.2
July 31, 2014
Initialization
Page 247 of 322
User’s Manual
Resource
ISPCR
Field
Reset Value
ORD1
0000
ORD2
0000
ORD3
0000
ORD4
0000
ORD5
0000
ORD6
0000
ORD7
0000
PC
x‘FFFF FFFC’
Comment
After reset, the first instruction is fetched from the last word of the effective-address
space.
OWN
System-dependent PVR[OWN] value (after reset and otherwise) is specified by core input signals.
PVN
System-dependent PVR[PVN] value (after reset and otherwise) is specified by core input signals.
PVR
U0
System-dependent
U1
System-dependent
U2
System-dependent All Reset Configuration Register (RSTCFG) fields are specified by core input sigSystem-dependent nals.
RSTCFG
U3
E
System-dependent
ERPN
System-dependent
Initialization
Page 248 of 322
Version 2.2
July 31, 2014
User’s Manual
Resource
Field
Reset Value
EPN[0:19]
x‘FFFFF’
V
1
The translation table entry for the initial program memory page is valid.
TS
0
The translation space (TS) is reset to ‘0’. The initial program-memory page is in system-level virtual address space.
SIZE
x‘1’
TID
x‘0000’
RPN[0:21]
x‘FFFFF’ || ‘00’
ERPN
System-dependent
The extended real-page number of the initial program memory page is specified by
core input signals.
U[0:3]
System-dependent
The reset value of user-definable storage attributes are specified by core input signals.
W
0
The write-through storage attribute is disabled.
I
1
The caching-inhibited storage attribute is enabled.
M
0
The memory-coherent storage attribute is disabled.
G
1
The guarded-storage attribute is enabled.
TLBentry
(see footnote)
E
TCR
TSR
Comment
Matches the effective address of the initial reset instruction. EPN[20:21] are undefined. They are not compared to the EA because the page size is 4 KB.
The initial program-memory page size is 4 KB.
The translation identifier (TID) is set to zero. The initial program-memory page is
globally shared; no match against the Process Identifier Register (PID) is required.
The initial program-memory page is mapped effective = real.
System-dependent The reset value of the endian-storage attribute is specified by a core input signal.
SX
1
The supervisor execute (SX) access is enabled.
SW
0
The supervisor write (SW) access is disabled.
SR
1
The supervisor read (SR) access is enabled.
WRC
00
The watchdog-timer reset control is disabled.
WRS
TCR[WRC]
TCR[WRC] is copied into WRS if the reset is caused by a watchdog-timer timeout.
Unchanged
WRS is not changed if the reset is caused by other than a watchdog-timer timeout.
Undefined
WRS is undefined after a power-on event.
9.2 Reset Types
The PowerPC 476FP processor core supports three reset types: core, chip, and system. The type of reset is
indicated by a set of core input signals. For each type of reset, the core resources are initialized as indicated
in Table 9-1 on page 244. Core reset is intended to reset the PowerPC 476FP processor core without necessarily resetting the rest of the on-chip logic. The chip reset operation is intended to reset the entire chip, but
off-chip hardware in the system is not informed of the reset operation. System reset is intended to reset the
entire chip, and also to signal the rest of the off-chip system that the chip is being reset. Whether the system
reset operation is used to reset the rest of the system board is established by the board designer.
Version 2.2
July 31, 2014
Initialization
Page 249 of 322
User’s Manual
9.3 Reset Sources
A reset operation can be initiated on the PowerPC 476FP processor core through the use of any of four separate mechanisms. The first is a set of three input signals to the core, one for each of the three reset types.
These signals can be asserted asynchronously by hardware outside the core to initiate a reset operation. The
second reset source is the TCR[WRC] field, which can be set up by software to initiate a reset operation upon
certain watchdog timer expiration events. The third reset source is the DBCR0[RST] field, which can be
written by software to immediately initiate a reset operation. The fourth reset source is the Joint Test Action
Group (JTAG) interface, which can be used by a JTAG-attached debug tool to initiate a reset operation asynchronously to program execution on the PowerPC 476FP processor core.
9.4 Initialization Software Requirements
After a reset operation occurs, the PowerPC 476FP processor core is initialized to a minimum configuration to
enable the fetching and execution of the software initialization code. The initialization also guarantees deterministic behavior of the core during the execution of this code. Initialization software is necessary to complete
the configuration of the processor core and the rest of the on-chip and off-chip system.
The system must provide nonvolatile memory (or memory initialized by some mechanism other than the
PowerPC 476FP processor core) at the real address corresponding to effective address x‘FFFF FFFC’ and at
the rest of the initial program memory page. The instruction at the initial address must be an unconditional
branch backwards to the beginning of the initialization software sequence.
The initialization software functions described in this section perform the configuration tasks required to
prepare the PowerPC 476FP processor core to start an operating system and subsequently execute an application program.
The initialization software must also perform functions associated with hardware resources that are outside
the PowerPC 476FP processor core. The additional initialization is beyond the scope of this document. This
section refers to some of these functions, but their full scope should be described in the user’s manual for the
specific chip or system implementation.
Initialization software should perform the following tasks to fully configure the PowerPC 476FP processor
core. For more information about the various functions referenced in the initialization sequence, see the
corresponding sections of this document. Proceed as follows:
1. Branch backwards from effective address x‘FFFF FFFC’ to the start of the initialization sequence.
2. Set up and clear the DBCR0 Register to disable all debug events.
Although the PowerPC 476FP processor core is defined to reset some of the debug-event enables during
the reset operation (as specified in Table 9-1 on page 244), this is not required by the architecture.
Therefore, the initialization software should not assume this behavior. Software should disable all debug
events to prevent nondeterministic behavior on the trace interface to the core.
3. Clear the DBSR to initialize all debug event status.
Although the PowerPC 476FP processor core is defined to reset the DBSR debug event status bits during the reset operation (as specified in Table 9-1), this is not required by the architecture. Therefore, the
initialization software should not assume this behavior. Software should clear all such status to prevent
nondeterministic behavior on the JTAG interface to the core.
Initialization
Page 250 of 322
Version 2.2
July 31, 2014
User’s Manual
4. Initialize the core configuration registers, CCR0, CCR1, and CCR2, as necessary. In most cases, these
bits can be left in the reset state. A thorough understanding of the implications of changing these register
fields must be a prerequisite for making any changes. Reserved fields must be left in the reset state.
5. Configure the memory management unit control registers (MMUBE0, MMUBE1, SSPCR, USPCR,
ISPCR) as appropriate.
6. Set up a TLB entry to cover the initial program memory page.
The PowerPC 476FP processor core only initializes an architecturally invisible shadow TLB entry and
one entry in the UTLB during the reset operation. All other shadow TLB entries are invalidated upon any
context synchronization, and all other UTLB entries except index-address-0 and one entry are undefined.
Because of these properties, special care must be taken during the initialization sequence until this step
is completed and an architected TLB entry has been established in the TLB.
a. Initialize the MMU Configuration Register (MMUCR). Complete the following steps:
(1) Specify the TID field to be written to TLB entries.
(2) Specify the TS field to be used for TLB searches.
(3) Specify the store miss allocation behavior.
b. Write a TLB entry for the initial program memory page. Complete the following steps:
(1) Specify the effective page number (EPN), the real page number (RPN), the extended real page
number (ERPN), and the SIZE as appropriate for the system.
(2) Set the valid bit.
(3) Specify TID = ‘0’ (disable comparison to the PID), or initialize the PID to a matching value.
(4) Specify TS = ‘0’ (system address space), or set MSR[IS,DS] to correspond to TS = ‘1’.
(5) Specify storage attributes (W, I, M, G, E, U0 - U3) as appropriate for the system.
(6) Enable supervisor execute (SX), supervisor read (SR), and supervisor write (SW) access.
c. Initialize the PID to match the TID field of the TLB entry (unless TID = ‘0’).
d. Set up for subsequent MSR[IS,DS] initialization to correspond to the TS field of the TLB entry. This is
necessary only if the TS field of the TLB entry is being set to ‘1’ (MSR[IS,DS] is already reset to ‘0’).
Complete the following steps:
(1) Write the new MSR value into SRR1.
(2) Write the address from which to continue execution into SRR0.
Version 2.2
July 31, 2014
Initialization
Page 251 of 322
User’s Manual
e. Set up for the subsequent change in the instruction fetch address. This is necessary only if the EPN
field of the TLB entry changed from the initial value (EPN[0:19] ? x‘FFFFF’). Complete the following
steps:
(1) Write the initial or new MSR value into SRR1.
(2) Write the address from which to continue execution into SRR0.
f. Fully initialize the TLB. Issue a tlbwe to all three words of each TLB entry; issuing tlbre to TLB
entries that are not fully initialized can result in parity exceptions. All unused TLB entries must be
invalidated by setting V-bit = '0'.
g. Perform a context synchronization to invalidate the shadow TLB contents and to cause the new TLB
contents to take effect.
• Use the isync instruction if neither the MSR contents nor the effective address of the rest of the
initialization sequence are being changed.
• Use the rfi instruction if the MSR is being changed to match the new TS field of the TLB entry.
SRR1 will be copied into the MSR, and program execution will resume at the address saved in
SRR0.
• Use the rfi instruction if the next instruction fetch address is being changed to correspond to the
new EPN field of the TLB entry. SRR1 will be copied into MSR, and program execution will
resume at the address saved in SRR0.
At this point in the initialization process, if the corresponding TLB entry has been set up with the caching
inhibited storage attribute set to ‘0’, the instruction and data caches begin to be used. Initialization software can now branch outside of the initial 4 KB memory region as controlled by the address and size of
the new TLB entry or any other TLB entries that have been set up.
7. Initialize the interrupt resources. Complete the following steps:
a. Initialize the Interrupt Vector Prefix Register (IVPR) to specify the high-order address of the interrupt
handling routines.
Ensure that the corresponding address region is covered by a TLB entry or entries.
b. Initialize the IVOR0 - IVOR15 Registers to set their individual interrupt vector addresses.
Ensure that the corresponding addresses are covered by a TLB entry or entries. Because the loworder 4 bits of IVOR0 - IVOR15 are reserved, those bits are ignored when the registers are written
and are read as zeros when an interrupt uses the register address values. Therefore, all interrupt
vector offsets are implicitly aligned on quadword boundaries. Software must ensure that all interrupt
handlers are quadword aligned.
c. Load the interrupt handling routines into program memory.
d. Synchronize any program memory changes as required. See Section 5.4 Self-Modifying Code on
page 140 for more information about the instruction sequence necessary to synchronize changes to
program memory before executing the new instructions.
8. Configure the debug facilities as required. Complete the following steps:
a. Write DBCR1 and DBCR2 to specify the instruction address compare (IAC) and data address compare (DAC) event conditions.
b. Clear the DBSR to initialize the IAC auto-toggle status.
c. Initialize the IAC1 - IAC4, DAC1 - DAC2, and DVC1 - DVC2 Registers to the required values.
Initialization
Page 252 of 322
Version 2.2
July 31, 2014
User’s Manual
d. If required, write to MSR[DWE] to enable the debug wait mode.
e. Write to DBCR0 to enable the required debug modes and events.
f. Perform a context synchronization isync to establish the new debug facility context.
9. Configure the timer facilities as required. The TSR is cleared. Complete the following steps:
a. Write zeros to the Time Base Lower (TBL) Register to prevent fixed-interval timer and watchdog timer
exceptions when the TSR is cleared and to prevent an incrementation carry into the Timer Base
Upper (TBU) Register before full initialization is completed.
b. The TCS field of CCR1 can be initialized at this point or earlier with the rest of CCR1.
c. Write zeros to the TSR to clear all timer exception status.
d. Write the TCR to configure and enable timers as required. Note that software can enable the watchdog timer reset function, but only a reset can disable it.
e. Initialize the TBU value as required.
f. Initialize the TBL value as required.
g. If the decrementer auto-reload function is required, initialize the Decrementer Auto Reload Register
(DECAR) to the required value.
h. Initialize the DEC Register to the required value.
10. Initialize the L2 cache using Device Control Register (DCRs). See the PowerPC 476FP L2 Cache Core
Databook.
11. Initialize PLB6 using DCRs. See the PLB6 Bus Controller Core Databook.
12. Initialize the MSR to enable interrupts as required. Complete the following steps:
a. Set MSR critical interrupt enable (CE) to enable or disable critical-input and watchdog-timer interrupts.
b. Set MSR external interrupt enable (EE) to enable or disable external-input, decrementer, and fixedinterval timer interrupts.
c. Set MSR debug interrupt enable (DE) to enable or disable debug interrupts.
d. Set MSR machine-check enable (ME) to enable or disable machine-check interrupts.
Software should first check the status of the Exception Syndrome Register (ESR) machine-check
interrupt (ISMC) field and Machine Check Syndrome Register (MCSR) machine check summary
(MCS) fields to determine whether any machine-check exceptions have occurred since these fields
were cleared by reset but before machine-check interrupts were enabled (by this step). Any such
exceptions would have set ESR[ISMC] or MCSR[MCS] to ‘1’, and this status can only be cleared
explicitly by software. After the MCSR[MCS] field is known to be clear, the MCSR status bits
(MCSR[1:8]) should be cleared by software to avoid possible confusion upon later service of a
machine-check interrupt. After MSR[ME] has been set to ‘1’, subsequent machine-check exceptions
result in a machine-check interrupt.
e. Perform a context synchronization by using an isync instruction to establish a new MSR context.
13. Initialize any other processor core resources as required by the system (General Purpose Registers
[GPRs], Special Purpose Registers for general use [SPRGs], and so on). Failure to initialize GPRs might
result in parity errors.
14. Initialize any other facilities outside the processor core as required by the system. Initialize system memory as required by the system software.
Version 2.2
July 31, 2014
Initialization
Page 253 of 322
User’s Manual
15. Synchronize any program memory changes as required. Section 5.4 Self-Modifying Code on page 140
for more information about the instruction sequence necessary to synchronize changes to program memory before executing the new instructions.
16. Start the system software.
System software is generally responsible for initializing or managing the rest of the MSR fields, including
the following fields:
• MSR floating-point enable (FP) to enable or disable the execution of floating-point instructions.
• MSR[FE0,FE1] to enable or disable floating-point-enabled exception-type program interrupts.
• MSR problem state (PR) to specify user mode or supervisor mode.
• MSR[IS,DS] to specify application address space or system address space for instructions and data.
• MSR wait state enable (WE) to place the processor into wait state (to halt execution pending an interrupt).
Initialization
Page 254 of 322
Version 2.2
July 31, 2014
User’s Manual
10. L2 Cache and UTLB Synchronous Interfaces
This section describes the level-2 (L2) cache and UTLB synchronous interfaces. See the PowerPC 476FP L2
Cache Core User’s Guide for more information.
See the PowerPC 476FP Core Support Manual for further details.
10.1 L2 Cache Interface
The PowerPC 476FP core interfaces directly to an L2 cache. The PowerPC 476FP core and the L2 cache
synthesizable core are implemented in different clock domains. Figure 10-1 illustrates the L2 cache interface.
Version 2.2
July 31, 2014
L2 Cache and UTLB Synchronous Interfaces
Page 255 of 322
User’s Manual
Figure 10-1. L2 Cache and Interface Block Diagram
Frequency Ratio
(2:1, 3:1, 4:1)
L2 Configuration (256 KB, 512 KB, 1 MB)
476FP Core
(1.6 - 2.0 GHz)
L2 Cache
(400 - 800 MHz)
I-Read
I-Snoop
Frequency Ratio
(N:1)
PLB6 Bus
(400 - 800 MHz)
4-Way Set-Associative Cache with ECC
RSV-Snoop
RAM
TAG
L1 I-Cache
RAM
TAG
RAM
TAG
RAM
TAG
D-Write
D-Read
D-Snoop
L1 D-Cache
Write
Queue
TLB-Snoop
LRU
Write
Queue
Write
Read
MSR[PMM]
UTLB
Snoop
Control
PMUCC0[FAC]
Snoop
Reservation
PM Events1
Other Devices
(Alternate DCR Implement)
DCR Bus
DCR
Arbiter
Key:
DCR
ECC
L1
L2
LRU
PM
PLB6
UTLB
Device configuration register
Error checking and correction
Level 1
Level 2
Least recently used
Performance monitor
Processor local bus 6
Unified translation lookaside buffer
Performance Monitor
Note:
1. Performance monitor events include L1 cache hits, shadow TLB misses, and commitments.
Page 256 of 322
Version 2.2
July 31, 2014
User’s Manual
10.2 L2 Cache Features
For detailed information regarding the features and capabilities of the L2 cache controller, see the PowerPC
476FP L2 Cache Controller User’s Guide.
10.2.1 L2 Cache Storage Reservation Management
The PowerPC 476FP core supports memory pages and requires memory coherence regardless of the page
attribute M state.
The L2 cache manages lwarx/stwcx. storage reservations to improve the performance of multiprocessor
(MP) reservation handling.
See Section 4.5.4 Memory Coherence Required (M) on page 117.
The L2 cache tracks the reservation status that is set by the lwarx instruction and cleared by the stwcx.
instruction and snoop operations.
The lwarx instruction broadcasts to the L2 cache using the read interface. If the lwarx is an L1 cache hit, the
L1 cache requests no data. If the lwarx is an L1 cache miss, the L2 cache returns data to the L1 cache,
similar to a normal load instruction. In both cases, the L2 cache updates the reservation granule with the
lwarx address and sets the reservation bit. Table 10-1 describes some special cases in which the lwarx
instruction must not set the reservation bit resulting from an incoming snoop. This is because the reservation
flag is in the L2 cache, and the L2 cache handles all snoops with regards to the reservation.
The stwcx. instruction broadcasts to the L2 cache using the write interface. The L1 pipeline guarantees that
the line referenced by the stwcx. is invalidated in the L1 cache and line fill buffers. The stwcx. is then sent to
the L2 cache. Meanwhile, the processor core does not update the appropriate CR bit field until receiving a
completion signal from the L2 cache. The bit is set to ‘0’ for successful completion of the stwcx. and to ‘1’ for
a failed stwcx..
The L1 cache does not send lwarx/stwcx. instructions that are faulty as a result of alignment or data storage
interrupt (DSI) exceptions to the L2 cache.
The L2 cache can handle multiple lwarx/stwcx. instructions. Most importantly, the processor core receives a
completion signal for each stwcx.. Table 10-1 shows the lwarx/stwcx. process.
Version 2.2
July 31, 2014
Page 257 of 322
User’s Manual
Table 10-1. lwarx and stwcx. Actions in the L2 Cache and Processor Core
L1 D-Cache
L1 D-Cache to L2-Cache
L2 Cache
Reservation Unit or Processor Core
lwarx
Hit
Read with data, L1
D-cache invalidated.
lwarx request
Hit
The L1 D-cache invalidates the cache line, and
the L2 cache returns data to the L1 D-cache.
The reservation will be set in the L2 cache.
lwarx
Hit
Read with data, L1
D-cache invalidated.
lwarx request
Miss
The L1 D-cache invalidates the cache line, and
the L2 cache returns data to the L1 D-cache when
the L2 cache returns the cache line.
The reservation is set in the L2 cache after the L2
cache gains ownership.
lwarx
Miss
Read with data,
lwarx request
Hit
The reservation is set with the address.
Data returns to the
L1cache
lwarx
Miss
Read with data,
lwarx request
Miss
The reservation is set with the address after the
Data returns to the L2 cache gets the data with shared or exclusive
use.
L1 cache
stwcx.
Hit
Write with stwcx. data
Hit
L1 D-cache invalidates the line.
If the reservation is set and the address matches,
the reservation succeeds and the L2 cache performs the write operation and
L1 CR[EQ] is set.
stwcx.
Hit
Miss
L1 D-cache invalidates the line.
If stwcx. succeeds, the L2cache performs the
write after owning the line, and L1 CR[EQ] is set.
Most likely L2 line is snooped, and the reservation
is lost.
stwcx.
Miss
Hit
If stwcx. succeeds, the L2 cache performs the
write and L1 CR[EQ] is set.
stwcx.
Miss
Miss
If stwcx. succeeds, the L2 cache performs the
write after owning the line, and L1 CR[EQ] is set.
Note: Some cache operation effects follow:
A dcbtst for a line-matching reservation granule by a remote processor causes the reservation to be lost.
Any store and dcbz to a line-matching reservation granule by a remote processor causes the reservation to be lost.
A dcbst to a line-matching reservation granule by a remote processor does not affect the reservation.
A dcbf or dcbt to a line-matching reservation granule by a remote processor does not affect the reservation.
A dcbi is converted to a dcbf by a processor, and therefore, dcbi (or dcbf in the L2 cache) to the line-matching reservation granule by a remote processor does not affect reservation.
6. An icbt or icbi to a line-matching reservation granule by a remote processor does not affect the reservation.
1.
2.
3.
4.
5.
10.2.2 Performance Monitor
The PowerPC 476FP L2 cache core provides a performance monitor to report the number of occurrences for
a number of L2 cache events. See the PowerPC 476FP L2 Cache Core User’s Guide for additional information regarding these events.
The performance monitor can be controlled from the PowerPC 476FP core though the Performance Monitor
Unit Core Control Register 0 (PMUCC0), described in the following subsection.
Page 258 of 322
Version 2.2
July 31, 2014
User’s Manual
FAC
10.2.2.1 Performance Monitor Unit Core Control Register 0 (PMUCC0)
Reserved
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bit
Name
Description
32
FAC
33:63
Reserved
0
1
Performance monitor enabled.
Freeze all performance monitor unit (PMU) counters.
10.2.3 Cache Operations Handling
Some cache management instructions contain a CT field that is used to specify a cache level within a cache
hierarchy or a portion of a cache structure to which the instruction is to be applied. Table 10-2 shows the
correspondence between the CT value specified and the cache level.
Table 10-2. CT Field Value and Cache Level
CT Field Value
Cache Level
0
Primary cache
2
Secondary cache
Note: The CT values that are not show can be used to specify implementation-dependent cache levels or implementation-dependent
portions of a cache structure.
Any cache operations that generate exception conditions that are detected before commitment will not be
broadcast to the L2 cache. For instance, dcbz to a page that is marked write-through or cache inhibited
generates an alignment exception and never reaches the L2 cache.
Table 10-3 on page 259 shows how cache operations are handled.
Table 10-3. Cache Operations (Page 1 of 3)
Cache
Operation
CT Field
icbi
icbt
CT = 0
L1 Cache
dcbf (L = 0)
PLB6
Remote L2 Cache Remote L1 Cache
Invalid L1 I-cache. Pass through.
iKill. See the Pow- Pass-through.
erPC PLB6 User’s
Guide.
L1 I-cache touch.
L2 cache touch.
Touch if L2 cache
miss.
L2 cache touch.
Touch if L2 cache
miss.
Write with flush if modified.
Write with flush if
modified.
CT = 2
dcba
L2 Cache
Invalid L1 I-cache.
No-op.
L1 D-cache line
invalidate four
lines.
Write with flush.
Possible snoop
invalid.
Note:
1. For dcbt, the TH field has a similar function to the CT field of other cache operations.
2. For dcbtst, the TH field has a similar function to the CT field of other cache operations.
Version 2.2
July 31, 2014
Page 259 of 322
User’s Manual
Cache
Operation
CT Field
dcbi (privileged)
L1 Cache
L1 D-cache line
invalidate four
lines if L1 hit.
dcbst
dcbt1
dcbtst2
L1 D-cache touch. Touch, mark D-side.
Touch if L2 cache
miss.
TH = 2
Touch, mark D-side.
Touch if L2 cache
miss.
Write with clean.
Possible snoop
invalid.
TH = 2
Store-touch (exclusive
or modified), mark
D-side.
RWITM (allocate). Write with flush.
Possible snoop
invalid.
L1 D-cache zeros
four lines if L1 hit.
Zero the L2 cache line.
D-claim if W = I = 0 Snoop invalid.
Snoop invalid.
L1 I-cache touch
and lock.
Touch if miss.
Touch if L2 miss.
L2 touch and lock.
Touch if L2 miss.
CT = 0
CT = 0
L1 I-cache unlock
a line.
CT = 0
CT = 0
CT = 0
N/A
L2 cache unlock a line.
L1 D-cache touch
and lock.
L1 D-cache touch
and lock.
Touch if miss.
Touch if L2 cache
miss.
L2 touch and lock mark
with D-side.
Touch if L2 cache
miss.
Touch if miss.
Touch if L2 cache
miss.
L2 store-touch and lock
mark with D-side.
Store-touch if L2
cache miss.
L1 D-cache unlock No action.
line.
CT = 2
ici (privileged)
Possible snoop
invalid.
Read with intent to Write with flush.
modify (RWITM)
(allocate).
CT = 2
dcblc
(privileged)
Flush.
L1 D-cache touch. Store-touch (exclusive
or modified), mark
D-side.
CT = 2
dcbtstls
(privileged)
TH = 0
CT = 2
dcbtls
(privileged)
Flush if modified.
TH = 0
CT = 2
icblc
(privileged)
Flush if modified.
PLB6
Write with clean if modi- Clean if modified.
fied.
dcbz
icbtls
(privileged)
L2 Cache
L2 unlock line.
CT = 0
Invalidate L1
I-cache.
CT <> 0
No-op.
Note:
Page 260 of 322
Version 2.2
July 31, 2014
User’s Manual
Cache
Operation
CT Field
dci (privileged)
CT = 0
Invalidate L1
D-cache.
CT = 2
Invalidate L1
D-cache.
L1 Cache
icread
(privileged)
L1 I-cache debug
read.
dcread
(privileged)
L1 D-cache debug
read.
L2 Cache
PLB6
Invalidate L2 cache.
No action.
The local L2 cache must
invalidate all entries.
Note:
10.2.4 tlbivax, tlbsync, msync, mbar Handling
See Section 4.8.4 TLB Invalidate, Virtual Address Indexed (tlbivax) on page 130 and Section 4.9 UTLB
Coherency on page 130 for information about tlbivax and tlbsync. See Section 5.5.18 Memory Barrier
Instructions on page 149 for information about msync, mbar, and lwsync.
10.3 L1 Cache UTLB Snoop Interface
The L1 cache unified translation lookaside buffer (UTLB) snoop interface maintains coherency for the TLB
between processors.
The PowerPC 476FP core is designed with hardware supported coherency. The tlbivax instruction invalidates the target entry in all processors. The tlbivax instruction will be treated as a tlbie instruction on the
PLB6. The tlbsync is fully implemented to provide an ordering function of tlbivax instructions. Both instructions are privileged.
See Section 4.8.4 TLB Invalidate, Virtual Address Indexed (tlbivax) on page 130 for information about tlbivax
operations. See Section 4.9 UTLB Coherency on page 130 for information about tlbsync operations.
Software must ensure the ordering requirement, such as use of the mbar and isync instructions, and
updates of the AS or PID fields. The hardware is not required to order the tlbivax operation.
Version 2.2
July 31, 2014
Page 261 of 322
User’s Manual
Page 262 of 322
Version 2.2
July 31, 2014
User’s Manual
Appendix A. Register Summary
This appendix provides an alphabetic listing of Special Purpose Registers and bit definitions for the registers
contained in the PowerPC 476FP processor core. These SPRs are accessed by mfspr and mtspr instructions. The access column in Table A-1 uses the following terms for different access types:
R/W
Readable and writable.
Read
Read only.
Write
Write only.
R/C
Read and clear.
Clear
Clear means that ‘1’ bits in the register are cleared when that SPR is accessed by
an mtspr instruction.
Table A-1. Register Categories (Page 1 of 3)
Category
Branch Control
Name
Short Name
Privileged
CTR
No
R/W
x‘009’
60
LR
No
R/W
x‘008’
59
DCRIPR
Yes
R/W
x‘37B’
74
DCDBTRH
Yes
Read
x‘39D’
153
DCDBTRL
Yes
Read
x‘39C’
152
DCESR
Yes
R/W
x‘352’
268
ICDBDR0
Yes
Read
x‘3D3’
137
ICDBDR1
Yes
Read
x‘3D4’
137
ICDBTRH
Yes
Read
x‘39F’
138
ICDBTRL
Yes
Read
x‘39E’
137
Instruction Cache Exception Syndrome Register
ICESR
Yes
R/W
x‘353’
140
IOCCR
Yes
R/W
x‘35C’
269
IOCR1
Yes
R/W
x‘35D’
269
IOCR2
Yes
R/W
x‘35E’
270
Counter Register
1
1
Link Register
Cache Debug
Access Address
Page
1. See the Power ISA V2.05 Specification for details.
2. This register is renamed to VRSAVE in the Power ISA V2.05 Specification.
Version 2.2
July 31, 2014
Register Summary
Page 263 of 322
User’s Manual
Category
Debug
Name
Short Name
Privileged
Access Address
Page
Data Cache Address Compare 1 Register
DAC1
Yes
R/W
x‘13C’
266
Data Cache Address Compare 2 Register
DAC2
Yes
R/W
x‘13D’
266
Data Value Compare 1 Register
DVC1
Yes
R/W
x‘13E’
266
Data Value Compare 2 Register
DVC2
Yes
R/W
x‘13F’
266
DBCR0
Yes
R/W
x‘134’
235
DBCR1
Yes
R/W
x‘135’
236
DBCR2
Yes
R/W
x‘136’
237
Debug Data Register
DBDR
Yes
R/W
x‘3F3’
266
DBSR
Yes
R/C
x‘130’
239
Write
x‘330’
IAC1
Yes
R/W
x‘138’
240
IAC2
Yes
R/W
x‘139’
240
IAC3
Yes
R/W
x‘13A’
240
IAC4
Yes
R/W
x‘13B’
240
XER
No
R/W
x‘001’
64
Interrupts and Exceptions Critical Save and Restore Register 0
CSRR0
Yes
R/W
x‘03A’
175
Critical Save and Restore Register 1
CSRR1
Yes
R/W
x‘03B’
176
DEAR
Yes
R/W
x‘03D’
177
ESR
Yes
R/W
x‘03E’
179
IVOR0
through
IVOR15
Yes
R/W
x‘190’
through
x‘19F’
178
IVPR
Yes
R/W
x‘03F’
179
Machine Check Save and Restore Register 0
MCSRR0
Yes
R/W
x‘23A’
176
Machine Check Save and Restore Register 1
MCSRR1
Yes
R/W
x‘23B’
177
MCSR
Yes
R/W
x‘23C’
181
Yes
Clear
x‘33C’
181
Yes
Read
mfmsr
173
Write
mtmsr
Integer Processing
Fixed-Point Exception
Register1
Data Cache Exception Address Register
Interrupt Vector Offset Register 0 through 15
L2 Cache Performance
Monitor
MSR
Save and Restore Register 0
SRR0
Yes
R/W
x‘01A’
174
Save and Restore Register 1
SRR1
Yes
R/W
x‘01B’
175
PMUCC0
Yes
R/W
x‘35A’
259
No
Read
x‘34A’
259
PMU Core Control Register
Register Summary
Page 264 of 322
Version 2.2
July 31, 2014
User’s Manual
Category
Memory Management
Name
Short Name
Privileged
ISPCR
Yes
MMU Bolted Entries 0 Register
MMUBE0
MMU Bolted Entries 1 Register
MMU Configuration Register
Invalidate Search Priority Configuration Register
123
Yes
x‘334’
121
MMUBE1
Yes
x‘335’
121
MMUCR
Yes
R/W
x‘3B2’
126
PID
Yes
R/W
x‘030’
120
RMPD
Yes
R/W
x‘339’
120
RSTCFG
Yes
Read
x‘39B’
126
SSPCR
Yes
R/W
x‘33E’
122
USPCR
Yes
R/W
x‘33F’
125
CCR0
Yes
R/W
x‘3B3’
69
Core Configuration Register1
CCR1
Yes
R/W
x‘378’
70
Core Configuration Register2
CCR2
Yes
R/W
x‘379’
73
PIR
Yes
Read
x‘11E’
68
PVR
Yes
Read
x‘11F’
68
SPR General 0
SPRG0
Yes
R/W
x‘110’
67
SPR General 1
SPRG1
Yes
R/W
x‘111’
67
SPR General 2
SPRG2
Yes
R/W
x‘112’
67
SPRG3
Yes
R/W
x‘113’
67
No
Read
x‘103’
67
Real Mode Page Description Register
Reset Configuration Register
SPR General
31
SPR General 4
SPRG4
Yes
R/W
x‘104’
67
SPR General 5
SPRG5
Yes
R/W
x‘105’
67
SPR General 6
SPRG6
Yes
R/W
x‘106’
67
SPR General 7
SPRG7
Yes
R/W
x‘107’
67
SPRG8
Yes
R/W
x‘25C’
N/A
USPRG0
No
R/W
x‘100’
67
DECAR
Yes
R/W
x‘036’
159
Decrementer Register
DEC
Yes
R/W
x‘016’
159
Time Base Lower Register
TBL
Yes
Write
x‘11C’
158
No
Read
x‘10C’
Yes
Write
x‘11D’
No
Read
x‘10D’
SPR General
81
User SPR General 0
Timers
R/W
Page
x‘33D’
Process ID Register
Processor Control
Access Address
2
Decrementer Auto-Reload Register
Time Base Upper Register
TBU
158
TCR
Yes
R/W
x‘154’
163
TSR
Yes
R/C
x‘150’
164
Write
x‘350’
Version 2.2
July 31, 2014
Register Summary
Page 265 of 322
User’s Manual
A.1 Data Cache Address Compare 1 Register (DAC1)
DAC1
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:63
DAC1
Description
Data cache address compare 1
This register holds the compare address for a data address compare event.
A.2 Data Cache Address Compare 2 Register (DAC2)
DAC2
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:63
DAC2
Description
Data cache address compare 2
This register holds the compare address for a data address compare event.
A.3 Data Cache Value Compare 1 Register (DVC1)
DVC1
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:63
DVC1
Description
Data cache value compare 1
This register holds the compare value for a data value compare event.
A.4 Data Cache Value Compare 2 Register (DVC2)
DVC2
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:63
DVC2
Description
Data cache value compare 2
This register holds the compare value for a data value compare event.
A.5 Debug Data Register (DBDR)
Debug information
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Register Summary
Page 266 of 322
Version 2.2
July 31, 2014
User’s Manual
Bits
Field Name
32:63
DBDR
Version 2.2
July 31, 2014
Description
Debug information.
Register Summary
Page 267 of 322
User’s Manual
A.6 Data Cache Exception Syndrome Register (DCESR)
Reserved
DCDAAPU
DCDAHIT
DCINDXPE
DCSNPPE
DCOSPE
DCLRUPE
DCESPE
Reserved
Reserved
DCTAPE
DCRDPE
DCDAPE
The DCESR is written upon a parity error in the data cache. When it is written, it will not be updated on any
future error until written by an mtspr.
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Bits
Field Name
32:35
DCRDPE
36:39
Reserved
40:43
DCESPE
Data cache even set parity error.
0:3
The ways in set with (address[19] XOR address[26]) are equal to ‘0’.
even though the request is for an odd set.
44:47
DCOSPE
Data cache odd set parity error.
Multiple bits can be set.
0:3
The ways in set with (addr[19] XOR addr[26])are equal to ‘1’.
even though the request is for an even set.
48:54
DCINDXPE
55
DCDAPE
Data cache data array parity error.
If set, the requested data has a parity error. If the request is a miss, no error is reported.
56
DCTAPE
Data cache tag array parity error.
If set, at least one of the tags associated with a way in either the “even” or “odd” set has a parity
error. The designation of which way is specified by the DCESPE and DCOSPE fields.
57
Reserved
Reserved.
58
DCLRUPE
Data cache LRU, valid, or lock parity error.
A parity error exists in either the even or odd LRU, valid, or lock field for the requested set.
59
DCSNPPE
Data cache snoop parity error
60
DCDAHIT
Data cache data array hit.
This bit modifies the DCDAPE field when set. If both DCDAPE and DCDAHIT are set, there is a
data parity error on a load request that hits in the data cache. If only DCDAPE is set, the parity error
is because of a request that serviced from the line fill buffers. If DCDAPE is not set, this bit is
ignored.
61
DCDAAPU
Data cache data array APU.
This bit modifies the DCDAPE field when set. If both DCDAPE and DCDAAPU are set, there is a
data parity error on a load request for the APU. If only DCDAPE is set, the parity error is because of
a processor core request. If DCDAPE is not set, this bit is ignored.
62:63
Reserved
Register Summary
Page 268 of 322
Description
Data cache read interface parity error.
This fields represents bits [20:26] of the real address. Bit 19 can be inferred from the DCESPE and
DCOSPE fields.
Version 2.2
July 31, 2014
User’s Manual
IOCR1EN
IOCR1M
IOCR1EN
IOCR1M
IOCR1ME
IOCR2ME
IOCR1U
IOCR2U
A.7 Instruction Opcode Compare Control Register (IOCCR)
0
1
2
3
4
5
6
7
Reserved
8
9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bits
Field Name
Description
0
IOCR1EN
IOCR1 enabled.
1
IOCR1M
IOCR1 mode.
0
Match primary and secondary opcodes.
1
Match primary opcode only.
2
IOCR1EN
IOCR2 enabled.
3
IOCR1M
IOCR2 mode.
0
Match primary and secondary opcodes.
1
Match primary opcode only.
4
IOCR1ME
IOCR1 Mask Enable
0
Compare instruction to IOCR without mask.
1
Mask instruction and IOCR for comparison.
5
IOCR2ME
IOCR2 Mask Enable
0
Compare instruction to IOCR w/o mask
1
Mask instruction and IOCR for comparison
6
IOCR1U
IOCR1 User Mode
0
Trap if match in user/privileged modes
1
Trap if match only in user mode
7
IOCR2U
IOCR2 User Mode
0
Trap if match in user/privileged modes
1
Trap if match only in user mode
8:31
Reserved
A.8 Instruction Opcode Compare Register 1 (IOCR1)
PRI
0
1
2
3
SEC
4
5
6
7
8
9
SECMASK
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bits
Field Name
0:5
PRI
Primary opcode to trap.
6:15
SEC
Secondary opcode mask.
16:21
PRIMASK
Primary opcode mask.
22:31
SECMASK
Version 2.2
July 31, 2014
PRIMASK
Description
Register Summary
Page 269 of 322
User’s Manual
A.9 Instruction Opcode Compare Register 2 (IOCR2)
PRI
0
1
2
3
SEC
4
5
6
7
8
9
Field Name
0:5
PRI
Primary opcode to trap.
6:15
SEC
16:21
PRIMASK
Primary opcode mask.
22:31
SECMASK
Page 270 of 322
SECMASK
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Bits
Register Summary
PRIMASK
Description
Version 2.2
July 31, 2014
User’s Manual
Version 2.0
Appendix B. Instruction Summary
This appendix is to provide a list and description of instructions that are unique to the PowerPC 476FP
processor. Standard instructions that are already covered in the PowerPC User Instruction Set Architecture
Book I are not described in this section.
Some instructions optionally end in a period (dot). The undotted instructions do not update the Condition
Register. The dotted instructions do update the Condition Register.
B.1 Instructions That Behave Differently from the Power ISA Specification
Table B-1 shows instructions for the PowerPC 476FP core that are different from the Power Instruction Set
Architecture V2.05 specification.
Table B-1. New Instructions in the PowerPC 476FP Core
Mnemonic
dcbt
dcbtst
dci
Instruction
See Page
Data cache block touch.
147
Data cache block touch for store.
148
Data cache invalidate.
146
B.2 Unsupported Power ISA Instructions
The wait instruction (wait category) is not supported by the PowerPC 476FP processor. PowerPC 4xx
embedded processors provide a different method for putting the processor into the wait mode.
B.3 Integer Instructions in the PowerPC 476FP Processor
Table B-2 lists all of the integer instructions that are implemented in the PowerPC 476FP processor. The
opcodes and extended opcodes are shown in decimal.
Table B-2. Power ISA V2.05 Integer Instructions (Page 1 of 6)
Mnemonic
Opcode Extended Opcode
Instruction
add[o][.]
31
266
Add.
addc[o][.]
31
10
Add carrying.
adde[o][.]
31
138
Add extended.
addi
14
—
Add immediate.
addic
12
—
Add immediate carrying.
addic.
13
—
Add immediate carrying and record.
addis
15
—
Add immediate shifted.
addme[o][.]
31
234
Add to minus one extended.
addze[o][.]
31
202
Add to zero extended.
and[.]
31
28
AND.
Version 2.2
July 31, 2014
Instruction Summary
Page 271 of 322
User’s Manual
Mnemonic
Instruction
andc[.]
31
60
AND with complement.
andi.
28
—
AND immediate.
andis.
29
—
AND immediate shifted.
b[a][l]
18
—
Branch.
bc[a][l]
16
—
Branch conditional.
bcctr[l]
19
528
Branch conditional to count register.
bclr[l]
19
16
Branch conditional to link register.
cmp
31
0
Compare.
cmpb
31
508
cmpi
11
—
Compare immediate.
cmpl
31
32
Compare logical.
cmpli
10
—
Compare logical immediate.
cntlzw[.]
31
26
Count leading zeros word.
crand
19
257
Condition register AND.
crandc
19
129
Condition register AND with complement.
creqv
19
289
Condition register equivalent.
crnand
19
225
Condition register NAND.
crnor
19
33
Condition register NOR.
cror
19
449
Condition register OR.
crorc
19
417
Condition register OR with complement.
crxor
19
193
Condition register XOR.
dcba
31
758
No-op. The ISA intention for this instruction is data cache block allocate.
dcbf
31
86
Data cache block flush.
dcbi
31
470
Data cache block invalidate.
dcblc
31
390
Data cache block lock clear.
dcbst
31
54
Data cache block store.
dcbt
31
278
Data cache block touch
dcbtls
31
166
Data cache block touch and lock set.
dcbtst
31
246
Data cache block touch for store.
dcbtstls
31
134
Data cache block touch for store and lock set.
dcbz
31
1014
Data cache block set to zero.
dci
31
454
Data cache invalidate.
dcread
31
326
Data cache read (alternate encoding).
divw[o][.]
31
491
Divide word.
divwu[o][.]
31
459
Divide word unsigned.
dlmzb[.]
31
78
Determine left most zero byte.
eqv[.]
31
284
Equivalent.
Instruction Summary
Page 272 of 322
Compare bytes.
Version 2.2
July 31, 2014
User’s Manual
Mnemonic
Instruction
extsb[.]
31
954
Extend sign byte.
extsh[.]
31
922
Extend sign halfword.
icbi
31
982
Instruction cache block invalidate.
icblc
31
230
Instruction cache block lock clear.
icbt
31
22
Instruction cache block touch.
icbtls
31
486
Instruction cache block touch and lock set.
ici
31
966
Instruction cache invalidate.
icread
31
998
Instruction cache read.
isel
31
15
Integer select.
isync
19
150
Instruction synchronize.
lbz
34
—
Load byte and zero.
lbzu
35
—
Load byte and zero with update.
lbzux
31
119
Load byte and zero with update indexed.
lbzx
31
87
Load byte and zero indexed.
lha
42
—
Load halfword algebraic.
lhau
43
—
Load halfword algebraic with update.
lhaux
31
375
Load halfword algebraic with update indexed.
lhax
31
343
Load halfword algebraic indexed.
lhbrx
31
790
Load halfword byte-reversed indexed.
lhz
40
—
Load halfword and zero.
lhzu
41
—
Load halfword and zero with update.
lhzux
31
311
Load halfword and zero with update indexed.
lhzx
31
279
Load halfword and zero indexed.
lmw
46
—
lswi
31
597
Load string word immediate.
lswx
31
533
Load string word indexed.
lwarx
31
20
Load word and reserve indexed.
lwbrx
31
534
Load word byte-reversed indexed.
lwz
32
—
Load word and zero.
lwzu
33
—
Load word and zero with update.
lwzux
31
55
Load word and zero with update indexed.
lwzx
31
23
Load word and zero indexed.
macchw[o][.]
4
172
Multiply accumulate cross halfword to word modulo signed.
macchws[o][.]
4
236
Multiply accumulate cross halfword to word saturate signed.
macchwsu[o][.]
4
204
Multiply accumulate cross halfword to word saturate unsigned.
macchwu[o][.]
4
140
Multiply accumulate cross halfword to word modulo unsigned.
machhw[o][.]
4
44
Multiply accumulate high halfword to word modulo signed.
Version 2.2
July 31, 2014
Load multiple word.
Instruction Summary
Page 273 of 322
User’s Manual
Mnemonic
Instruction
machhws[o][.]
4
108
Multiply accumulate high halfword to word saturate signed.
machhwsu[o][.]
4
76
Multiply accumulate high halfword to word saturate unsigned.
machhwu[o][.]
4
12
Multiply accumulate high halfword to word modulo unsigned.
maclhw[o][.]
4
428
Multiply accumulate low halfword to word modulo signed.
maclhws[o][.]
4
492
Multiply accumulate low halfword to word saturate signed.
maclhwsu[o][.]
4
460
Multiply accumulate low halfword to word saturate unsigned.
maclhwu[o][.]
4
396
Multiply accumulate low halfword to word modulo unsigned.
mbar
31
854
Memory barrier.
mcrf
19
0
mcrxr
31
512
Move to condition register from Integer Exception Register (XER).
mfcr
31
19
Move from Condition Register.
mfdcr
31
323
Move from Device Control Register.
mfdcrux
31
291
Move from Device Control Register user-mode indexed.
mfdcrx
31
259
Move from Device Control Register indexed.
mfmsr
31
83
Move from Machine State Register.
mfocrf
31
19
Move from one Condition Register field.
mfspr
31
339
Move From Special Purpose Register.
mftb
31
371
This is a special instruction, provided in lieu of mftbu and mftbl.
msync
31
598
Synchronize.
mtcrf
31
144
Move To Condition Register fields.
mtdcr
31
451
Move to Device Control Register.
mtdcrux
31
419
Move to Device Control Register user-mode indexed.
mtdcrx
31
387
Move to Device Control Register indexed.
mtmsr
31
146
Move to Machine State Register.
mtocrf
31
144
Move to one Condition Register field.
mtspr
31
467
Move to Special Purpose Register.
mulchw[.]
4
168
Multiply cross halfword to word signed.
mulchwu[.]
4
136
Multiply cross halfword to word unsigned.
mulhhw[.]
4
40
Multiply high halfword to word signed.
mulhhwu[.]
4
8
Multiply high halfword to word unsigned.
mulhw[.]
31
75
Multiply high word.
mulhwu[.]
31
11
Multiply high word unsigned.
mullhw[.]
4
424
Multiply low halfword to word signed.
mullhwu[.]
4
392
Multiply low halfword to word unsigned.
mulli
7
—
mullw[o][.]
31
235
Multiply low word.
nand[.]
31
476
NAND.
Instruction Summary
Page 274 of 322
Move condition register field.
Multiply low immediate.
Version 2.2
July 31, 2014
User’s Manual
Mnemonic
Instruction
neg[o][.]
31
104
Negate.
nmacchw[o][.]
4
174
Negative multiply accumulate cross halfword to word modulo signed.
nmacchws[o][.]
4
238
Negative multiply accumulate cross halfword to word saturate signed.
nmachhw[o][.]
4
46
Negative multiply accumulate high halfword to word modulo signed.
nmachhws[o][.]
4
110
Negative multiply accumulate high halfword to word saturate signed.
nmaclhw[o][.]
4
430
Negative multiply accumulate low halfword to word modulo signed.
nmaclhws[o][.]
4
494
Negative multiply accumulate low halfword to word saturate signed.
nor[.]
31
124
NOR.
or[.]
31
444
OR.
orc[.]
31
412
OR with complement.
ori
24
—
OR immediate.
oris
25
—
OR immediate shifted.
popcntb
31
122
Population count bytes.
prtyw
31
154
Parity word.
rfci
19
51
Return from critical interrupt.
rfi
19
50
Return from interrupt.
rfmci
19
38
Return from machine check interrupt.
rlwimi[.]
20
—
Rotate left word immediate then mask insert.
rlwinm[.]
21
—
Rotate left word immediate then AND with mask.
rlwnm[.]
23
—
Rotate left then AND with mask.
sc
17
—
System call.
slw[.]
31
24
Shift left word.
sraw[.]
31
792
Shift right algebraic word.
srawi[.]
31
824
Shift right algebraic word immediate.
srw[.]
31
536
Shift right word.
stb
38
—
Store byte.
stbu
39
—
Store byte with update.
stbux
31
247
Store byte with update indexed.
stbx
31
215
Store byte indexed.
sth
44
—
sthbrx
31
918
sthu
45
—
sthux
31
439
Store halfword with update indexed.
sthx
31
407
Store halfword indexed.
stmw
47
—
stswi
31
725
Store string word immediate.
stswx
31
661
Store string word indexed.
Version 2.2
July 31, 2014
Store halfword.
Store halfword byte-reversed indexed.
Store halfword with update.
Store multiple word.
Instruction Summary
Page 275 of 322
User’s Manual
Mnemonic
Instruction
stw
36
—
Store word.
stwbrx
31
662
Store word byte-reversed indexed.
stwcx.
31
150
Store word conditional indexed.
stwu
37
—
stwux
31
183
Store word with update indexed.
stwx
31
151
Store word indexed.
subf[o][.]
31
40
Subtract from.
subfc[o][.]
31
8
Subtract from carrying.
subfe[o][.]
31
136
Subtract from extended.
subfic
8
—
Subtract from immediate carrying.
subfme[o][.]
31
232
Subtract from minus one extended.
subfze[o][.]
31
200
Subtract from zero extended.
tlbivax
31
786
TLB invalidate virtual address indexed.
tlbre
31
946
TLB read entry.
tlbsx[.]
31
914
TLB search indexed.
tlbsync
31
566
TLB synchronize.
tlbwe
31
978
TLB write entry.
tw
31
4
Trap word.
twi
3
—
Trap word immediate.
wrtee
31
131
Write Machine State Register (MSR) external enable.
wrteei
31
163
Write MSR external enable immediate.
xor[.]
31
316
XOR.
xori
26
—
XOR immediate.
xoris
27
—
XOR immediate shifted.
Store word with update.
B.4 Floating-Point Instructions
Table B-3 lists all of the floating-point instructions that are implemented in the PowerPC 476FP processor.
Table B-3. Floating-Point Instructions (Page 1 of 3)
Mnemonic
Opcode
Extended Opcode
fabs[.]
63
264
Floating absolute value.
fadd[.]
63
21
Float add.
fadds[.]
59
21
Float add single.
fcfid[.]
63
846
Floating convert from integer doubleword.
fcmpo
63
32
Floating compare ordered.
fcmpu
63
0
Floating compare unordered.
Instruction Summary
Page 276 of 322
Instruction
Version 2.2
July 31, 2014
User’s Manual
Mnemonic
Opcode
Extended Opcode
fcpsgn[.]
63
8
fctid[.]
63
814
Floating convert to integer doubleword.
fctidz[.]
63
815
Floating convert to integer doubleword with round toward zero.
fctiw[.]
63
14
Floating convert to integer word.
fctiwz[.]
63
15
Floating convert to integer word with round toward zero.
fdiv[.]
63
18
Float divide.
fdivs[.]
59
18
Float divide single.
fmadd[.]
63
29
Float multiply-add.
fmadds[.]
59
29
Float multiply-add single.
fmr[.]
63
72
Floating move register.
fmsub[.]
63
28
Float multiply-subtract.
fmsubs[.]
59
28
Float multiply-subtract single.
fmul[.]
63
25
Float multiply.
fmuls[.]
59
25
Float multiply single.
fnabs[.]
63
136
Float negative absolute value.
fneg[.]
63
40
Floating negate.
fnmadd[.]
63
31
Float negative multiply-add.
fnmadds[.]
59
31
Float negative multiply-add single.
fnmsub[.]
63
30
Float negative multiply-subtract.
fnmsubs[.]
59
30
Float negative multiply-subtract single.
fre[.]
63
24
Float reciprocal estimate.
fres[.]
59
24
Float reciprocal estimate single.
frim[.]
63
488
Floating round to integer minus.
frin[.]
63
392
Floating round to integer nearest.
frip[.]
63
456
Floating round to integer plus.
friz[.]
63
424
Floating round to integer toward zero.
frsp[.]
63
12
Floating round to single precision.
frsqrte[.]
63
26
Float reciprocal square root estimate.
frsqrtes[.]
59
26
Float reciprocal square root estimate single.
fsel[.]
63
23
Floating select.
fsqrt[.]
63
22
Float square root.
fsqrts[.]
59
22
Float square root single.
fsub[.]
63
20
Float subtract.
fsubs[.]
59
20
Float subtract single.
lfd
50
—
Load floating-point double.
lfdu
51
—
Load floating-point double with update.
lfdux
31
631
Version 2.2
July 31, 2014
Instruction
Floating copy sign.
Load floating-point double with update indexed.
Instruction Summary
Page 277 of 322
User’s Manual
Mnemonic
Opcode
Extended Opcode
lfdx
31
599
Load floating-point double with indexed.
lfiwax
31
855
Load floating-point as integer word algebraic indexed.
lfs
48
—
Load floating-point single.
lfsu
49
—
Load floating-point single with update.
lfsux
31
567
Load floating-point single with update indexed.
lfsx
31
535
Load floating-point single indexed.
mcrfs
63
64
Move to Condition Register from FPSCR.
mffs[.]
63
583
Move from FPSCR.
mtfsb0[.]
63
70
mtfsb1[.]
63
38
mtfsf[.]
63
711
Move to FPSCR fields.
mtfsfi[.]
63
134
Move to FPSCR field immediate.
stfd
54
—
Store floating-point double.
stfdu
55
—
Store floating-point double with update.
stfdux
31
759
Store floating-point double with update indexed.
stfdx
31
727
Store floating-point double with indexed.
stfiwx
31
983
Store floating-point as integer word indexed.
stfs
52
—
Store floating-point single.
stfsu
53
—
Store floating-point single with update.
stfsux
31
695
Store floating-point single with update indexed.
stfsx
31
663
Store floating-point single indexed.
Instruction Summary
Page 278 of 322
Instruction
Version 2.2
July 31, 2014
User’s Manual
Appendix C. Instruction Execution Performance for Code Optimization
This appendix describes how the PowerPC 476FP processor core fetches, dispatches, issues, and executes
instructions. The instruction operation timing information provided in this appendix will help compiler developers and application programmers optimize their code. Though this appendix does not comprehensively
identify every microarchitectural characteristic that might have a potential impact on instruction execution
time within the PowerPC 476FP core, it provides a high-level overview of basic instruction operation and
pipeline performance. The information provided is sufficient to analyze the performance of code sequences to
a high degree of accuracy.
The overall design characteristics of the PowerPC 476FP core follow:
• Instruction predecode is performed outside of the 9-stage pipeline or preinstruction cache (I-cache).
• Two-cycle accesses for the I-cache and data cache (D-cache).
• A four-instruction submit (or a fetch group of four instructions) at a time.
• Four simultaneous instruction issues.
• Out-of-order instruction issue, execute, and complete, but in-order instruction commit (allowed to complete).
• Six-pipeline structure.
• Super pipeline floating-point (FP) execution.
• Hardware symmetrical multiprocessor (SMP) support.
C.1 PowerPC 476FP Pipeline Overview
The PowerPC 476FP is a superscalar processor core capable of issuing four instructions (three integer
instructions and one FP instruction) per cycle. The three integer instructions are issued to the branch execution pipeline, simple integer execution pipeline, and complex integer pipeline. The FP instruction is issued to
the FP execution pipeline.
C.1.1 PowerPC 476FP Integer Pipelines
The PowerPC 476FP integer unit has a nine-stage pipeline structure. Figure C-1 on page 280 illustrates the
integer pipeline structure.
Version 2.2
July 31, 2014
Instruction Execution Performance for Code Optimization
Page 279 of 322
User’s Manual
Figure C-1. PowerPC 476FP Integer Pipeline Structure
ICRD
Stages 1 - 3:
Fetch and
Instruction Capture
IST
I-Cache
(Instructions and
Predecoded Instruction
Controls)
ISD[0:3]
Stage 4:
Decode and Issue
To FP Instruction Queue
(See Figure C-2 on page 283)
4 Instructions
Per Cycle (3 Integer, 1 FP)
DISS[0:7]
3 Instructions Per Cycle
Branch Pipeline
(B-Pipe)
Stage 5:
Register
Access
Queue
Complex Integer Pipeline
(I-Pipe and M-Pipe)
Simple Integer Pipeline
(L-Pipe and J-Pipe)
IRACC
LRACC/JRACC
BE0
BPQ
JPQ
AGEN
BE1
IPQ
GPR
JEXE1
IEXE1
MEXE1
Branch
Correction
IPGPR
BE2
D-Cache
CRD
JPGPR
IEXE2
Stages 6 - 9:
Execution
Stages
MEXE2
Divide/Multiply
Accumulate
BE3
DST
IEXE3
MEXE3
SPR Read
BEW
LWB
IWB/MWB
C.1.1.1 ICRD, IST, and ISD Pipeline Stages
In Figure C-1, the first three pipeline stages provide the fetch (instruction cache read [ICRD] and instruction
steering [IST]) and instruction decode (ISD). The first three stages are also common between the integer
pipelines and the floating-point pipeline.
The fetch and ISD are designed to access four instructions, or a half I-cache line (a fetch group), at a time.
The instructions of the fetch group are transmitted to both the integer dispatch/issue queue (decode and
issue [DISS] queue) and the FP dispatch/issue queue (instruction queue [INSTQ]), simultaneously. See
Figure C-2 on page 283 for more information about the INSTQ.
Page 280 of 322
Version 2.2
July 31, 2014
User’s Manual
The I-cache contains both instructions and their associated predecoded instruction controls and designators.
These controls also are transmitted to the DISS and FP INSTQ to simplify DISS and INSTQ decoding.
C.1.1.2 DISS Stage
The DISS consists of eight entries and can hold up to eight instructions. Also, the DISS can issue up to three
instructions per cycle.
In the DISS stage, the three positions (DISS[0:2]) are eligible for issue to the register access (RACC) stage.
Typically, DISS[2] is the oldest instruction of the three. However, if the RACC stage has available resources,
DISS[2] can be issued to the RACC stage out-of-order.
The DISS stage and the predecoded designators determine which pipeline is used for the instruction execution. The details of this algorithm are proprietary to the PowerPC 476FP core and not exposed to the user.
C.1.1.3 RACC Stage
The next pipeline stage is the register access (RACC) stage. The RACC stage consists of branch execute 0
(BE0) in the branch pipe, load/store register access (LRACC) and J-pipe (simple integer) register access
(JRACC), and integer register access (IRACC). The RACC stage provides register access and dispatch of up
to three instructions per cycle. The IRACC stage is dedicated to the complex integer (I-pipe) pipeline. The
LRACC/JRACC stage is shared between the simple integer (J-pipe) and load/store (L-pipe) pipelines.
Within the RACC stage are queuing stages. These stages are the branch pipe queue (BPQ), J-pipe queue
(JPQ), and the I-pipe queue (IPQ). These queues reduce the timing constraint on the RACC stages from the
lower execution stages. The remaining pipeline stages are provided for the execution of instructions.
The BE0, LRACC, and IRACC pipeline stages are generally where instructions are held until their source
operands to become ready. This is the case for all source operands, including conditional register resources.
However, the data operands for store instructions are held in the address generation (AGEN) stage, and the
accumulate operands for integer multiply-accumulate instructions are held in the IEXE1 stage.
When the instruction source operands become ready, the instruction is dispatched to the first execution stage
of the corresponding pipeline. Dispatch refers specifically to the action of moving from one of the RACC
stages to the first execution stage of a pipeline: BE1, J-pipe execute 1 (JEXE1), AGEN, or IEXE1.
Instructions in the RACC stages can be dispatched out-of-order with respect to each other if the pipeline
required by the newer instruction becomes available before the pipeline required by the older instruction.
Because of the out-of-order issue capability from the DISS stage, there is no guaranteed relative order
between the two instructions in the RACC stage. However, the sequence of instructions contained within a
given execution pipeline, such as branch pipe, L-pipe, J-pipe, I-pipe, and multiply and divide pipe (M-pipe) are
always guaranteed to be in order with respect to each other to simplify resource order.
C.1.1.4 Execution Pipeline Stages
The final four stages are the execution pipeline stages. In this stage, the I-pipe is divided into the I-pipe and
M-pipe, making five pipelines. The last four stages of the pipeline are unique for each of the five pipelines.
Version 2.2
July 31, 2014
Page 281 of 322
User’s Manual
B-Pipe Execution Stages
In the B-pipe, the BE1 stage tests the CR bits, determines whether the branch instruction is predicted
correctly, and reports whether a branch correction is required. The remaining stages are provided to track the
branch instruction until it is allowed to complete.
L-Pipe Execution Stages
In the L-pipe, addresses are generated in the AGEN stage. Also, store instructions are held in the AGEN
stage until the store data operands become ready. The D-cache read (CRD) and data steering (DST) stages
are where the D-cache is accessed to determine whether the target location exists in the D-cache and to
obtain load data. The load write-back (LWB) stage is where load hit data is written back to the General
Purpose Register (GPR) file and where store hit data is written back to the D-cache.
J-Pipe Execution Stages
In the J-pipe, the JEXE1 stage is the first cycle of instruction execution. For most operations, the result is
available to be forwarded from the end of this stage to subsequent instructions requiring the result as a
source operand. The J-pipe pre-GPR (JPGPR) data holding stage holds up to four results, similar to the
I-pipe operation. This is to simplify and improve the execution bandwidth of the pipeline in case any subsequent resource hold conditions exist. The JPGPR can be written back to the GPR file when instructions are
committed or allowed to complete.
I-Pipe Execution Stages
In the I-pipe, the IEXE1 stage is the first cycle of instruction execution. For most operations, the result is available to be forwarded from the end of this stage to subsequent instructions requiring the result as a source
operand. The I-pipe pre-GPR (IPGPR) data holding stage holds up to four results. This is to simplify and
improve the execution bandwidth of the pipeline in case any subsequent resource hold conditions exist.
The IEXE2, IEXE3, and integer write-back (IWB) stages track the conditions of dot (.) instructions and other
miscellaneous instructions. Both IPGPR and IWB can be written back to the GPR file when instructions are
committed or allowed to complete.
M-Pipe Execution Stages
MEXE1 is also where integer multiply-accumulate instructions hold until the accumulate source operand
becomes ready.
Some operations (such as multiply and divide instructions) must continue to execute in MEXE2, MEXE3, and
IWB to fully calculate their results. Divide instructions reside in IWB for various cycles while they iteratively
calculate their result, at which point they write the result back to the GPR file.
The divider is based on a radix-2 SRT division algorithm. The first two stages, MEXE1 and MEXE2, are used
to prepare to compute the leading zeros of the dividend to reduce its execution iterations. The MEXE3 stage
is used for division computation.
Page 282 of 322
Version 2.2
July 31, 2014
User’s Manual
C.1.2 PowerPC 476FP Floating-Point Pipelines
Floating-point instructions are issued from the ICRD, IST, and ISD pipeline stages in the integer pipeline (see
Section C.1.1 PowerPC 476FP Integer Pipelines on page 279) to the instruction queue (INSTQ) in the
floating-point execution unit.
Figure C-2 on page 283 illustrates the floating-point pipelines.
Figure C-2. PowerPC 476FP Floating-Point Pipeline Structure
From ICU IST
IDI Subunit
INSTQ
7
6
5
4
3
2
1
0
FA Decode
Logic
FL Decode
Logic
RPM Subunit
RDA-A
FARACC
RDA-B
RDA-C
FARACCQ
RDD-A
RDD-B
RDD-C
FACMPLX
RDA-S
FLRACC
FPR
Logic
and
Macro
WRFA
FAUOP
FLRACCQ
RDD-S
WRFL
LSC Subunit
FAEXE1
FLEXE1
Allocate in
Lowest Available LDQ
Entry
11
Loads
FAEXE2
Stores
(Denormal) Stores
(Normal)
Normalize
LD Data Bank Mux
A
...
...
L
FAEXE3
FAEXE4
FLST
FAEXE5
FLWFPR
12 Fanout Allocate in
1st Available
Entry
Load Data
from DCU
Send Store
Data to DCU
12-Way
...
...
0
Instruction
and Data
Data
Convert
FLWB
FAEXE6
FAWB
FAWFPR
ASE Subunit
Version 2.2
July 31, 2014
Page 283 of 322
User’s Manual
The PowerPC 476FP FP pipeline structure consists of two pipelines: one for FP execution, the other for FP
loads/stores. The INSTQ is similar to DISS of the integer unit and has eight entries. However, only
INSTQ[0:1] are eligible to issue instructions. The oldest instruction is always in position 0, second oldest in
position 1, and so on. The floating-point unit (FPU) has resources for only two instructions to be completed,
and thus, only INSTQ[0] and INSTQ[1] can issue an instruction on a given cycle. The FPU INSTQ is not
required to be synchronized with the PowerPC 476FP CPU queue because the FPU only executes FP
instructions, and the I-cache ISD sends only FP instructions with designators.
The FA-pipe RACC (FARACC) stage is an RACC stage of the FP execution pipe to access the Floating-Point
Register (FPR), and is similar to the IRACC of the integer unit. The FARACCQ is functionally similar to the
BPQ, JPQ, and IPQ stages of the integer unit in that it holds RACC stage instructions in case of resource
conflicts in the later stages.
The next seven stages are for FP instruction execution. They are superpipelined for higher frequency with
extended division stages and extended operation stages for denormal operation. The FA pipe execution[1:6]
(FAEXE[1:6]) and FLRACC stages function as follows:
• FAEXE1 and FAEXE2 are for both recoding and operand aligning.
• FAEXE3 and FAEXE4 are for alignment and addition.
• FAEXE5 is for normalization.
• FAEXE6 is for rounding, and FAWB is for CR update and FPSCR update.
• The FLRACC stage is an RACC stage for the FP load/store pipe. FLRACCQ is similar to FARACCQ.
The FP load data is fetched by the CPU L-pipe. Store data is written by the CPU L-pipe into memory, and
thus, the FPU has only queues for the load/store pipe.
The FLEXE1 stage allocates an entry in LDQ (load queue) for an FP load instruction. There are eight entries
to accommodate load data from the CPU because the CPU fetches the load operand. It is a first-in first-out
(FIFO) queue and operates asynchronously between the CPU and the FPU.
FP store data is transmitted to the CPU from FLEXE1 if the CPU L-pipe is ready accept the data. Otherwise,
the FP L-pipe store stage is provided to hold the data an extra cycle.
C.2 Instruction Execution Latency and Penalty
The term, latency, refers to the number of cycles of execution required for a given instruction to produce its
result, typically the value to be written to the target GPR specified as part of the instruction. Most integer
instructions (such as the standard arithmetic and logical instructions) have one-cycle latency. One-cycle
latency means that their results are ready at the end of the first execution stage of the pipeline. Thus, the
results are available to be forwarded (delivered) to any subsequent instruction that might require that result as
one of the subsequent instruction source operands.
One significant exception to this is the load instruction category. These instructions have four-cycle latency
(assuming the target memory location is found in the D-cache). Their results become available at the end of
the fourth execution stage of the pipeline.
The term, penalty, refers to the number of processor cycles for which a given instruction cannot proceed
down the processor pipeline because of a dependency between itself and an immediately preceding instruction. In other words, if a source operand for a given instruction is the same as the target operand for the
preceding instruction, the given instruction might have to hold in the operand access pipeline stage for some
Page 284 of 322
Version 2.2
July 31, 2014
User’s Manual
number of cycles waiting for its source operand to become ready. The length of the wait depends on the
latency of the preceding instruction. For example, assume a source operand for a given instruction is the
same as the target operand of an immediately preceding load instruction, which has four-cycle latency. There
is a three-cycle penalty associated with the given instruction. This penalty is because the instruction waits at
the operand access stage of the pipeline for three extra cycles for the load instruction to reach the forth
execution stage and forward its result to the given instruction.
In contrast, if the earlier instruction has one-cycle latency, there is a zero-cycle penalty (no penalty) associated with the dependent instruction. This is because the dependent instruction can proceed down the pipeline
immediately after the earlier instruction, which forwards its result from the first execution stage of the pipeline.
In the PowerPC 476FP core, the processor integer execution unit has a special data forwarding mechanism
that is provided to minimize the N and N + 1 penalty (no penalty).
Because the PowerPC 476FP core has a four-issue microarchitecture, certain instruction sequences can
execute at a rate of more than one instruction per cycle, up to a maximum of five instructions per cycle
because of instruction latencies. Thus, the penalty associated with such an instruction stream can be viewed
as being less than zero, relative to the single-issue microarchitecture. Figure C-3 on page 286 through
Figure C-5 on page 289 illustrate these sequences of instructions.
Example One: Instruction Sequences Without a Dependency
An example of instruction sequences without a dependency is shown in Figure C-3 on page 286 In this
example add, sub, conditional branch (predicted correct), and fadd are simultaneously issued, and there are
no dependency penalties.
Version 2.2
July 31, 2014
Page 285 of 322
User’s Manual
Figure C-3. Instruction Sequence Without a Dependency
Clock
1
J-pipe
JRACC
add1
JEXE1
2
3
4
5
6
7
8
9
add1
JPGPR
add1
Write to
GPR
I-pipe
IRACC
sub1
IEXE1
sub1
IPGPR
sub1
Write to
GPR
B-pipe
bc1
BE1
bc1
BE2
bc1
BE3
bc1
BE4
FP-pipe
RACC
FEXE1
FEXE2
bc1
fadd1
fadd1
fadd1
FEXE3
FEXE4
FEXE5
FEXE6
FWB
fadd1
fadd1
fadd1
fadd1
fadd1
Write to
FPR
Page 286 of 322
Version 2.2
July 31, 2014
User’s Manual
Example Two: Instruction Sequence with a Dependency
The same instruction example is used; however, the sub1 operand is dependent on the add1 result.
All instructions are issued at the same time, but sub1 is held at the EXE1 stage until add1 is executed, and
the result is put into I-pipe pre-GPR Register 0 (IPGPR0).
Because multiple instructions are dispatched at the same time, a relative penalty on sub1 is zero cycles as
the result.
In the PowerPC 476FP integer execution unit, if the I-pipe is busy with other operations, the sub1 instruction
can be held at DISS instead of being issued to the IRACC. However, the relative penalty on sub1 is the same
(zero cycles). This is one of the advantages of the PowerPC 476FP core.
Figure C-4 on page 288 illustrates an instruction sequence with a dependency.
Version 2.2
July 31, 2014
Page 287 of 322
User’s Manual
Figure C-4. Instruction sequence with a dependency
Clock
1
J-pipe
JRACC
add1
JEXE1
2
3
4
5
6
7
8
9
add1
JPGPR
add1
Write to
GPR
I-pipe
IRACC
sub1
IEXE1
sub1
(hold)
sub1
IPGPR
sub1
Write to
GPR
B-pipe
bc1
BE1
bc1
BE2
bc1
BE3
bc1
BE4
FP-pipe
RACC
FEXE1
FEXE2
bc1
fadd1
fadd1
fadd1
FEXE3
FEXE4
FEXE5
FEXE6
FWB
fadd1
fadd1
fadd1
fadd1
fadd1
Write to
FPR
Example Three: Load Instruction Followed by an add with a Dependency on the Load
To simplify this example, only the CPU integer unit is shown in Figure C-5 on page 289. However, the FPU
operates independently, in parallel, but asynchronously, with the CPU.
Page 288 of 322
Version 2.2
July 31, 2014
User’s Manual
Because the J-pipe and the L-pipe share JRACC and LARCC, the J-pipe is not used in this stage. Thus, add
is issued to the I-pipe, branch is issued to the B-pipe, and load is issued to the L-pipe simultaneously. In this
example, one of the operands for the add instruction has a dependency on the load. Thus, it must wait until
the operand is fetched from the D-cache and returned on LWB. Then, the operand is forwarded to the stage
IEXE1 on the I-pipe. This is shown in Figure C-5.
Figure C-5. Load Instruction Followed by an add with a Dependency on the Load
Clock
1
L-pipe
LRACC
lwz1
AGEN
2
3
4
6
7
8
9
lwz1
CRD
lwz1
DST
lwz1
LWB
I-pipe
IRACC
5
lwz1
add1
IEXE1
add1
add1
add1
add1
IPGPR
add1
add1
Write to
GPR
B-pipe
BE1
BE2
bc1
bc1
bc1
BE3
BE4
bc1
bc1
C.3 Instruction Fetch and Decode
An I-cache access consists of the ICRD stage, IST stage, and ISD stage (see Figure C-1 on page 280).
Furthermore, just before the ICRD stage, there is a pseudo stage that arbitrates instruction fetch addresses
such as interrupt vectors, branch correction addresses, branch target addresses, subsequent sequential
fetch group addresses, and so on. This section describes all of these stages and the instruction predecoding
stage that resides before the I-cache.
Version 2.2
July 31, 2014
Page 289 of 322
User’s Manual
C.3.1 Instruction Fetch Address Arbitration and Fetch Process
The fetch unit contains the following functions:
• The instruction unit (IU) interface to provide fetch vectors for reset, refetch, interrupts, and branch mispredicts.
• The line-fill interface to provide addresses for reload dumps (line fills).
• Snoop addresses for remote I-cache entry invalidate (icbi) operations.
• Cache operation control address and control (icbt, ici, and so on).
• Branch-predict-taken target addresses.
• Next instruction fetch addresses.
The fetch unit manages the instruction fetch addressing before the ICRD stage and sets up the priority of
instruction fetching. The following list shows the PowerPC 476FP processor fetcher priority:
1. Snoop addressing from the L2 cache.
This is a remote icbi (see Power ISA Version 2.05 for the icbi function). There are four entry snoop
queues to ensure I-cache and L2 snoop interface timing.
2. Reload dump addressing to transfer line-fill buffer instructions (eight instructions) for I-cache miss to
I-cache. There are two line-fill buffers provided.
3. I-cache operation addressing for icbi (local), icbt, ici, and icread.
4. Instruction fetch addressing from the IU, such as reset vector addresses, interrupt vector addresses,
branch correction addresses; and instruction refetch addresses, such as context switch cases, and synchronization refetch addresses.
5. Branch-predict-taken target fetch addressing.
6. Instruction translation lookaside buffer (ITLB) miss refetch addressing.
7. Snoop and reload dump-induced refetch addressing.
8. Next instruction fetch group addressing.
In the PowerPC 476FP core, the instruction fetch address is normally generated in increments of 16
bytes. This is because four instructions (or 16 bytes) are fetched and submitted every clock cycle. However, branch targets might not be always in 16-byte boundaries. Thus, the arbiter has to adjust the
instruction fetch address to a 0 or 16-byte address.
The instruction fetcher output is directly connected to I-cache tag, instruction data array, branch history table
(BHT), ITLB, branch target address cache (BTAC), and least recently used (LRU) array to fetch a group of
instructions. This is the ICRD stage. See also Appendix C.4 Branch Prediction and Branch Instruction
Processing on page 292 for branch prediction addressing.
The next stage, IST, is mainly occupied by instruction group steering; one of the I-cache hit ways is selected
and instructions are left aligned (in the case of branch target fetches). Up to four instructions are fetched
simultaneously (an instruction fetch group).
Each instruction accompanies the corresponding predecoded designators. This stage also includes steering
instructions from the line-fill buffers if the fetch group is an I-cache miss.
Page 290 of 322
Version 2.2
July 31, 2014
User’s Manual
The ISD stage is provided to transfer instructions to DISS and FP INSTQ. This stage is more focused on
transfer distance and loading because both DISS and FP INSTQ are farther away, especially the FPU. This
stage is considered to be a submit stage; therefore, the instruction fetch group is sometimes called a submit
group.
C.3.2 Instruction Predecode, Instruction Field Adjust, and Endian Adjust
In the PowerPC 476FP core, the little-endian byte swapping, instruction field swapping based on instruction
types, and the instruction predecoding are performed at the L2 interface before instructions are stored into
the I-cache to alleviate the execution stage logic and timing.
The little-endian should be referred to ISA Book-III E for embedded. The instructions stored in the I-cache are
all big-endian.
C.3.2.1 Instruction Field Adjust
To improve GPR accesses, the PowerPC 476FP core assigns GPR ports according to the hardware instruction operation rather than on a software code basis. The GPR is designed as a 3-read/3-write array: simultaneous 3-read and 3-write array. The instruction cache controller (ICC) preswaps these GPR address fields
before the I-cache to reduce the logic level needed at the DISS issue point.
The RA, RB, RS, RT, and IMMED fields of the following instruction classes are swapped:
• Fixed point compare instructions
• Fixed point trap instructions
• Fixed point logical instructions
• Fixed point shift/rotate instructions
• Move-to-SPR class instructions
• Some TLB instructions
• DST class instructions
• Special class instructions
C.3.3 Instruction Predecode
Certain attributes of instructions are predecoded before the instructions are written into the I-cache and are
stored along with the instruction. The ICC is responsible for decoding this information from the instructions it
receives from the L2 cache.
These information bits are available in the I-cache and transmitted to DISS and FP INSTQ with the corresponding instructions. There are eight bits per instruction stored in the I-cache. Table C-1 on page 292
describes the bit assignments:
Version 2.2
July 31, 2014
Page 291 of 322
User’s Manual
Table C-1. Instruction Predecode Bit Definition
isdExtData1[0:1]
Selector
00:
Special
CPU
(Must issue DISS[0])
isdExtData[2]
isdExtData[3]
isdExtData[4]
isdExtData[5]
Synchronize
Type [0:1]:
00:
other
01:
str/multiple
10:
stwcx
11:
tlbivax
Sync:
0:
1:
isdExtData[6]
isdExtData[7]
ApRdEn2
BpRdEn 3
I- Pipe
L- Pipe
Store/Multiple:
0:
Load
1:
Store
01:
CPU
Normal
TarWrEn4
ApRdEn
BpRdEn
SpRdEn5
PipeCtrl[0:1]:
00:
Can use I-pipe or
J-pipe
01:
L-pipe only
10:
I-pipe only
11:
L-pipe only with RA
update
10:
FPU
FPU TarWrEn
Non L-Pipe:
FPU ApRdEn
Non L-Pipe:
FPU BpRdEn
L-Pipe:
CPU BpRdEn
Non L-Pipe:
FPU CpRdEn
L-Pipe:
FPU SpRdEn
PipeCtrl[0:1]:
00:
FPU Pipe only
01:
Needs L-pipe
10:
Needs I-pipe
11:
Needs L-pipe with RA
update
Unconditional
Branch
Static Predict Taken
Dynamic
L-Pipe:
CPU ApRdEn
11:
1.
2.
3.
4.
5.
Branch
Type[0:1]:
00:
Other
01:
blr
10:
bcctr
11:
bdnz
‘at’
ISD extended data
A-port read enable
B-port read enable
Target the GPR write enable
S-port read enable
C.4 Branch Prediction and Branch Instruction Processing
The PowerPC 476FP branch prediction mechanism includes a 4 K × 2-bit BHT, a 32-entry branch target
address content-addressable memory (CAM) (BTAC), link-stack, and Global History Register (GHR). This
section describes the details of these mechanisms and branch instruction processing.
Table C-2 on page 293 summarizes the branch-predict operations of branch instructions and BHT, BTAC,
GHR, and link-stack use.
Page 292 of 322
Version 2.2
July 31, 2014
User’s Manual
Table C-2. Branch Prediction and BHT, GHR, and BTAC Use
Branch Instruction Type
Operation
BHT, GHR, BTAC, and Link-Stack Used
Branch always
BTAC
bc with BO = 1z1zz
Branch-predict-taken always
BTAC
bc with BO = 1z1at
Dynamic branch-predict-taken
BHT/GHR and BTAC1
bc with BO = 10000
bdnz type
Branch-predict-taken if CTR > 1
BTAC
BHT/GHR and BTAC
Static branch-predict-taken
Link-stack
bclr with other than BO = 1z1zz
BHT/GHR and Link-stack
bcctr type
BHT/GHR and BTAC
b or ba type
bc with other BO cases
bclr with B) = 1z1zz
1. When the CCR2[SPC51] bit is set, the ‘at’ bit is honored.
The timing diagrams in Figure C-6, Figure C-7 on page 294, Figure C-8 on page 294 illustrate the advantages of BTAC/BHT and link-stack use.
Figure C-6. Typical Branch-Predict-Taken Timing Diagram (Branch Target Address is Computed at ISD)
Clock
1
ICRD
groupA
IST
ISD
DISS
BE0
BE1
BE3
BE4
Version 2.2
July 31, 2014
2
3
4
5
6
7
8
9
tgtX
groupA
tgtX
groupA
tgtX
groupA
tgtX
bc
bc
bc
bc
Page 293 of 322
User’s Manual
Figure C-7. TBTAC and BHT Based Branch-Predict-Taken Timing Diagram (BTAC Hit and BTAC Contains the
Branch Target Address)
Clock
1
ICRD
groupA
IST
2
3
4
5
6
7
8
9
tgtX
groupA
ISD
tgtX
groupA
DISS
tgtX
groupA
BE0
tgtX
bc
BE1
bc
BE3
bc
BE4
bc
Figure C-8. Link-Stack Based Branch-Predict-taken Timing Diagram (Link-Stack Pops the Branch Target
Address at Clock 3)
Clock
1
ICRD
groupA
IST
ISD
2
3
4
5
6
7
8
9
tgtX
groupA
tgtX
groupA
DISS
BE0
BE1
BE3
BE4
tgtX
groupA
tgtX
bc
bc
bc
bc
C.4.1 Branch History Table Operation
The BHT is used to maintain dynamic prediction of branches, and it is a history of actual codes and branches
executed. It is implemented in a 512 × (8 × 2) array and direct-mapped and shared, but indexed using a
combination of the branch address and a hash within the 6-bit GHR. The Gshare method of indexing is used
to reduce the aliasing of branches by keeping a history of branch activity. The history consists of a stream of
taken/not taken bits based on any branch in the instruction stream. A predetermined length of history is
XORed with the higher order bits of the BHT address index. The address index consists of enough lower
order bits of the fetch address to fully index into the chosen BHT size, or index into each instruction. That is,
the lower 3 bits of the branch instruction address are used to index into a memory of 2-bit predictors called a
Page 294 of 322
Version 2.2
July 31, 2014
User’s Manual
branch history of the corresponding branch instruction. Each prediction entry in the BHT contains a saturating
2-bit counter that indicates whether a branch is recently taken or not taken. This prediction is read from the
BHT to speculatively determine whether to begin fetching instructions from the branch target address.
The most significant bit of each of the corresponding 2-bit counter (BHT entry) is used to decide whether the
branch is predict-taken. When the most significant bit of the counter is 1, the branch is predicted to be taken.
Thus, when the counter has a value less than x‘10’, the branch is predicted not to be taken.
During a branch pipeline operation, when a branch outcome is determined, the 2-bit counter is updated by
incrementing the counter if the branch was taken or decrementing it if the branch was not taken. When the
counter saturates with x‘00’ or x‘11’, it remains at the saturation value until branch correction takes place,
when it is altered to the opposite direction. This allows branches to absorb some of the aliasing that will occur
as a result of different branches accessing the same counter in the BHT. Thus, it takes two iterations of a
branch to alter prediction of an already saturated counter. Note that the 2-bit prediction counter value is
recorded as an entry in the branch information queue (BIQ). Additionally, whether the value read from the
BHT was actually used for the prediction is also recorded as an entry in the BIQ.
When the BHT is accessed or read while it is written, the output of the BHT is assumed to be all 1’s (or forced
to predict-taken). Because branch correction frequency is predicted to be low, or the BHT update frequency
to be low, its impact is expected to be low.
The reason for setting up the 2-bit counter to ‘11’ and counter saturation to be branch-predict-taken is to not
write to the BHT, so that the BHT contents are preserved for the next read access.
The static prediction is based on the branch instruction BO field.
C.4.2 Global History Register Operation
The Gshare branch prediction scheme uses a recent global branch outcome history and the branch instruction address to index into the BHT. A GHR is used to capture the outcome history of branch activity (a series
of taken/not taken bits) when a branch is in the instruction stream.
The GHR uses 6 bits to perform the Gshare branch prediction. It contains the results (branch taken/not taken)
of the last six determined branches. The 6-bit history is XORed with the higher order bits of the address
index, which consists of the lower order bits of the fetch address to fully index into the chosen 4 K BHT size.
Indexing the BHT with the XOR of the branch history reduces hot usage spots in the BHT and improves BHT
usage. The calculation to index into the BHT is shown in Figure C-9 on page 296. Note that this BHT index
value is recorded as an entry in the BIQ also for BHT write indexing.
In an ideal setting, the GHR requires a minimum of 17 bits, which includes 6 bits (for the most recent and
valid determined branches) as previously discussed, and 11 bits (to track possible speculative/predicted
branch conditions in the (8)DISS, BE0, BPQ, and BE1 stages). The speculative bits contain the possible
prediction for each branch that is currently undetermined in the pipe. Typically these 11 bits are not used, but
they are necessary to provide support should an undetermined branch reside in every possible stage of the
pipeline.
However, in the PowerPC 476FP implementation, the GHR contains 6 bits of branch history for the last six
fetch groups that contain branches (determined branches). If a branch is predicted branch taken, a ‘1’ is
shifted into the GHR. If a branch is predicted branch not taken, a ‘0’ is shifted into the GHR. If a fetch group of
four instructions does not contain any branch instructions, the GHR is not changed or shifted. Additionally,
4 bits of branch history to track possible speculative or predicted branch conditions (because there are a
maximum of four entries in the BIQ) and 4 bits of branch-determined-taken history for a branch correction in
Version 2.2
July 31, 2014
Page 295 of 322
User’s Manual
the GHR are shifted back (shifted left). If the speculative bits are not eventually determined, a branch correction updates (shifts back to the previously determined point) the GHR before the next fetch, for a minimum of
14 bits total. On reset, the GHR is set to all 1’s.
Figure C-9. GHR use for BHT Lookup
Instruction EA
GHR (14 Bits)
1
0
1
1
0
0
1
0
1
0
0
(One shift per fetch group
if there is at least one branch)
1718
23 24
29
For branch correction
Gshare (most recently
determined branches)
XOR
GHR is reset to all 1’s at reset
[0:5] [6:11]
≥ [9:11] are used to index into a word
[0:8] for BHT index or 512 entries
[0:9] are used for BIQ entry because BIQ covers 4 words
C.4.3 Branch Target Address CAM (BTAC) Operation
The BTAC provides another way to access branch target addresses and improve the target instruction
access latency by two cycles. Because of this advantage, many branch predictions are performed using both
BTAC and BHT dynamic prediction in the PowerPC 476FP core (see Table C-1 on page 292).
The BTAC and BHT are accessed when the I-cache is looked up at the ICRD stage. If a branch instruction
address match and BTAC entry are valid, and the BHT entry indicates branch-predict-taken, the corresponding target address of the entry is used to fetch the target instructions. This is strictly based on the
instruction EA. Therefore, all BTAC entries that are BTAC-entry-valid are cleared with any context synchronizing instruction executions or operations. This BTAC-entry-valid flag is also cleared at the core reset and
POR to ensure the BTAC entries and branch instructions integrity and correlations.
The PowerPC 476FP core implements a 32-entry BTAC, and its replacement is performed by using a roundrobin method.
C.4.4 Branch Link-Stack Operation
When there are a significant amount of procedure calls, subroutine calls, or function calls, bclr type branch
instructions are used in codes. To efficiently handle these nested calls and returns, or to improve the latencies of instruction accesses, a link-stack is implemented in the PowerPC 476FP core. The link-stack is a
last-in first-out (LIFO) buffer used to maintain the ordering of consecutive subroutine calls and returns. On a
subroutine call or a branch (an instruction with the branch and link form), the address of the next instruction
(C + 4) is pushed onto the stack. While on a subroutine return (an instruction with the branch to link form), the
entry at the top of the stack (which is expected to contain the address of the instruction following the original
subroutine call) is popped from the stack and used as the branch target address fetch.
Page 296 of 322
Version 2.2
July 31, 2014
User’s Manual
In the PowerPC 476FP core, a four-entry link-stack is implemented to improve call and return codes. In some
cases, the link-stack can become misaligned or corrupted during the process of speculative instruction
fetching and execution. This means that the link-stack pointer was moved in the wrong spot or direction as a
result of a misprediction. Generally, any time one or more BCLR instructions (branch condition to link register)
are followed by one or more branch and link instructions in the speculated path, the link-stack becomes
corrupted if the speculation turns out to be predicted incorrectly. However, a corrupted link-stack can be
corrected after the first branch correction.
When the link-stack is empty, a copy of the LR value is used instead of popping the link-stack to calculate the
branch target address. Table C-3 illustrate the link-stack operation based on branch instruction types.
Table C-3. Link-Stack Operations
Instruction Type
Stack Entry
Stack Valid
Operation
bl, bla, bcl, bcla, bcctrl, bclrl
CIA + 4 (next IAR)
Validate
Push
bclr
—
Invalidate
Pop
bclrl
CIA + 4 (next IAR)
Invalidate ≥ Validate
Pop ≥ Push
C.4.5 Branch Instruction process
When a branch instruction is issued to the B-pipe (BE0 stage), a type of branch instruction (based on instruction decode), a branch-predict-taken or not-taken indicator, and a branch target address from the BIQ are
also sent to the execution unit. At the BE1 stage, the execution unit computes the branch target address
based on the branch instruction type (Link Register based, counter based, absolute address based, or relative address based), compares it against the branch target address given, and checks whether the predicted
address is correct.
If the branch prediction direction is correct and the predicted address is correct, no branch correction flush is
issued. If the predictions are incorrect, the execution unit generates a branch correction flush request to the
IU, and the IU broadcasts the request to the entire CPU and FPU.
C.4.6 Branch Information Queue Operation
The BIQ is a 4-entry FIFO queue that maintains information regarding branch instructions. It contains the
following of information:
• Branch instruction address
• Branch target address
• Branch instruction location in a submit group
• BHT index used
• BHT used indicator
• BTAC used indicator
• Branch-predict-taken indicator
The queue is split between the ICC and the IU, with the ICC portion containing the BHT data and the IU
portion containing the addressing. The ICC queue is considered the master in that it is responsible for the
controls for queue movement. The queue is split to reduce the wiring between units and to improve timing by
keeping addresses physically closer to the execution unit (EU).
Version 2.2
July 31, 2014
Page 297 of 322
User’s Manual
Each entry in the BIQ represents an instruction submit group: a group of instructions submitted from the ISD
to the DISS. This group might contain as few as one instruction, or as many as four, with any combination of
branches. Only one predicted-taken branch can be present in an instruction submit group because all instructions newer than the branch-predict-taken branch are invalidated as the instruction stream is redirected.
An entry is written into the BIQ as the submission occurs from the ISD to the DISS. In the event that the BIQ
is full, the ICC must block all ISD valid bits. An entry is removed from the BIQ when the last branch being
tracked in an entry is determined.
C.5 Instruction Issue Operation
The PowerPC 476FP core can generally issue four instructions in any given cycle to the RACC stage of the
pipeline. The four oldest instructions in the issue queue stage of the pipeline (DISS0, DISS1, and DISS2 of
the CPU integer unit and INSTQ0 and INSTQ1 of the FPU) are examined to determine which RACC stage
they require (LRACC for the L-pipe or the J-pipe, IRACC for the I-pipe, BE0 for B-pipe, FARACC for the FP
FA-pipe, and FLRACC for the FP FL-pipe). If they require (or can use) different RACC stages, they may issue
together. Conversely, if both instructions require the same RACC stage, they must issue in separate cycles,
with generally the older instruction issuing first, though instructions can be issued out-of order (or bypassed).
Certain instruction types must use the LRACC and the L-pipe (such as storage access instructions). Certain
instructions must use the IRACC and the I-pipe (such as multiply, divide, or SPR instructions). Branch instructions must be issued to the B-pipe, FP arithmetic instructions must be issued to the FA-pipe, and the rest of
the instructions can use either the LRACC/J-pipe or the IRACC/I-pipe. FP load/store instructions are issued
separately and share the operations between the integer unit and the FP unit because the GPR for operand
address generation is in the integer unit, and the FPR for FP operands are in the FPU.
This section summarizes the pipelines that may or must be used by each of the instruction categories and the
rules regarding the simultaneous issuing of instructions.
C.5.1 L-Pipe Instructions
The following categories of instructions must use the LRACC dispatch stage and be executed by the L-pipe:
• Storage access, including integer and load/store string operations.
• Floating-point load/store instructions.
Note: Update forms of load/store instructions, which update the base address register used for the
load/store operation, are executed simultaneously by both the L-pipe and the J-pipe. For such instructions, the storage access operation uses the L-pipe, and the base address update operation uses the
J-pipe.
• Cache management instructions including I-cache management.
• Storage synchronization being shared with the I-pipe, such as msync, mbar, and lwsync.
• The stwcx instruction being shared with the I-pipe.
• TLB management operations, such as tlbivax and tlbsync, being shared with the I-pipe.
• Allocated cache management instructions.
• Allocated D-cache debug instructions.
Page 298 of 322
Version 2.2
July 31, 2014
User’s Manual
C.5.2 I-Pipe Instructions
The following categories of instructions must use the IRACC dispatch stage and be executed by the I-pipe:
• Integer multiply
• Integer divide
• Allocated arithmetic (includes multiply-accumulate, negative multiply-accumulate, and multiply halfword)
• Integer trap
• Integer count leading zeros
• CR manipulating instructions
• Allocated logical (dlmzb)
• popcntb instruction
• TLB management operations, such as tlbivax and tlbsync, being shared with L-pipe
• Processor control (includes register management, system linkage)
• Storage synchronization, being shared with L-pipe, such as msync, mbar, and lwsync
• All instructions which use or update the CR (including integer and floating-point instructions)
Note: The stwcx. instruction is both a storage access and a CR-updating instruction, and hence it is
executed by both the L-pipe and the I-pipe. It simultaneously issues from DISS0 to both LRACC and
IRACC.
• All instructions that use or update the XER
Note: The load/store string indexed (lswx and stswx) instructions are exceptions to this rule in that they
use the XER[TBC] field but are executed by the L-pipe.
C.5.3 I-Pipe and J-Pipe Instructions
All other integer instructions can use either the IRACC dispatch stage and be executed by the I-pipe, or use
the LRACC dispatch stage and be executed by the J-pipe.
These instructions are as follows:
• Integer arithmetic instructions that do not use the CR or the XER, except for multiply and divide instructions (which must be executed by the I-pipe)
• Integer logical instructions that do not update the CR, except for count leading zeros instructions (which
must be executed by the I-pipe)
• Integer rotate instructions that do not update the CR
• Integer shift instructions that do not update the CR or the XER
C.5.4 B-Pipe Instructions
The following categories of instructions must use the BE0 dispatch stage and be executed by the B-pipe:
• Unconditional branch instructions
• Conditional branch instructions including bdnz
Version 2.2
July 31, 2014
Page 299 of 322
User’s Manual
• Branch to LR instructions
• Branch to CTR instructions
C.5.5 FA-pipe Instructions
All FP arithmetic instructions must use the FPU and be executed by the FP FA-pipe.
C.5.6 FP FL-pipe Instructions
All FP load/store instructions must use the FPU and be executed by the FP FL-pipe.
All FP load/store instructions must be shared with the integer L-pipe because the operand addressing is done
in the L-pipe and the operand is accessed by the L-pipe.
C.5.7 Special Issue Rules for System Synchronizing Instructions
Because of various architectural requirements regarding context synchronization, interrupt ordering, and
system synchronization, the following instructions are treated uniquely with regards to issuing:
• isync, mtmsr, rfi, rfci, rfmci, sc, wrtee, and wrtee
These instructions all must wait until they occupy DISS0 before being issued to the IRACC stage (they
occupy both the I-pipe and the L-pipe). Furthermore, these special instructions each block any subsequent instructions from issuing until the special instruction has completed execution, at which time all
subsequent instructions are flushed from the pipelines and refetched at the appropriate address according to the functional definition of the particular instruction. This behavior effectively increases the latency
associated with those instructions.
• msync, mbar, lwsync, and tlbsync
These instructions all must wait until they occupy DISS0 before being issued to the IRACC or LRACC
stage. Furthermore, these special instructions each block any subsequent instructions from issuing until
the special instruction has completed execution. The msync, mbar, and lwsync instructions are also
memory barrier instructions and ensure that a memory barrier is created. This behavior effectively
increases the latency associated with those instructions.
C.6 Instruction Execution and Penalties
As described previously, the PowerPC 476FP core is a superscalar processor core capable of issuing four
instructions, three integer instructions and one FP instruction, per cycle. The integer execution unit has a fivepipeline structure. The FP execution unit has a two-pipeline structure. The PowerPC 476FP core allows outof-order issue, execute, and complete, but requires in-order commitment for instructions to complete.
In general, most sequences of four nondependent instructions that do not require the same RACC stage can
be simultaneously issued, dispatched, executed, and completed. This results in a net execution performance
of four instructions in one cycle corresponding to a penalty of three cycles, or two cycles in the integer unit
alone, relative to the single-issue microarchitecture model of four instructions in four cycles. See the definition
of penalty in Section C.2 Instruction Execution Latency and Penalty on page 284.
There are, however, many instruction sequence scenarios where such parallel instruction processing is not
possible because of various factors such as dependencies between the instructions, contention for the same
RACC stage or execution pipeline (or both), load miss penalties, and so on.
Page 300 of 322
Version 2.2
July 31, 2014
User’s Manual
This section summarizes the exception cases: the instruction sequences for which simultaneous instruction
processing of one form or another is not possible, thereby leading to an increase in the number of cycles
required to process the instructions (a decrease in the instructions per cycle metric). These penalty cycles are
generally from one or the other of the four instructions having to hold in a given pipeline stage for more than a
cycle while a dependency or pipeline resource conflict is resolved.
For any given sequence of four instructions, if the sequence is not covered by one of the rules listed in this
section, it can be assumed that the four instructions can be processed simultaneously at the rate of four
instructions per cycle.
Exception: Calculating the total number of cycles to execute a sequence of greater than four instructions is
not simply a matter of adding up the number of cycles identified in these rules for each of the consecutive
instruction pairs in the sequence. Rather, the out-of-order issue, dispatch, execution, and completion capabilities of the PowerPC 476FP core make it possible, in most cases, for the cycles associated with any given
instruction pair to be overlapped to varying degrees with the cycles associated with other instruction pairs.
For example, a given sequence of two nondependent load instructions are followed by two nondependent
add instructions. And then, the add instructions are followed by two branch instructions. The pair of branch
instructions are subject to the previous exception that two instructions in each pair (loads pair, add pairs, and
branch pair) require the same execution pipeline, and thus each pair effectively requires two cycles to
complete. However, because the first add instruction can be issued with the first load instruction and the first
branch instruction, the net throughput for these six instructions are two cycles, not six.
This example illustrates how the theoretical maximum execution rate of three instructions per cycle in the
integer unit can be maintained.
Figure C-10 on page 302 illustrates this example.
Version 2.2
July 31, 2014
Page 301 of 322
User’s Manual
Figure C-10. Instruction Sequence Example with no Dependency on the Integer Unit
Clock
1
2
L-pipe
LRACC
ld1
ld2
AGEN
ld1
CRD
3
4
5
6
ld1
LWB
ld2
ld1
ld2
Write to
GPR
IEXE1
add1
add2
add1
add2
Write to
GPR
BE1
BE2
bc1
Write to
GPR
add2
IPGPR
B-pipe
9
ld2
ld1
add1
8
ld2
DST
I-pipe
IRACC
7
Write to
GPR
bc2
bc1
bc2
bc1
BE3
BE4
bc2
bc1
bc2
bc1
bc2
Also note that when considering the penalty associated with the execution of any given pair of instructions
where the second instruction has some form of dependency on the immediately preceding instruction, this
penalty can generally be reduced or avoided altogether by inserting other, nondependent instructions
between the pair.
For example, consider the previously mentioned case of a load instruction followed immediately by an
instruction dependent on the load result. These two instructions take four cycles to execute, for a penalty of
two cycles. As described previously, if the load instruction can be issued with the instruction preceding it, and
the second (dependent) instruction can be issued with the instruction following it, the net execution time is
four cycles for the four instructions, or a zero cycle penalty (a net of two cycles per two instructions, the same
as the single-issue microarchitecture default) in this example. However, if the software sequence is changed
such that four more nondependent instructions are inserted between the load and the dependent instruction,
the net execution performance is increased back to eight instructions for the four cycles (the load and its
preceding instruction, the four instructions after the load, the dependent instruction and its successor) in this
example.
Generally speaking, compilers should attempt to eliminate the penalties associated with the instruction pairings described in the following sections by inserting nondependent but useful instructions between the
penalty-inducing pair.
Page 302 of 322
Version 2.2
July 31, 2014
User’s Manual
Note: Some of the execution performance rules described in the following subsections are related to CR
dependencies. When considering these dependencies, there is a special case that should be noted. Specifically, Condition Register logical instructions specify two bits of the CR as source operands and specify a third
bit of the CR as the target operand. However, because the PowerPC 476FP core updates the CR on a field
(as opposed to a bit) basis, the field containing the target bit operand is actually considered a source operand
for the sake of any of the CR-related dependency rules described in the following subsections. This is necessary to source the old value of the three bits of the target field that are not being updated by the condition register logical instruction.
Note: Some of the execution performance rules described in the following subsections are related to XER
dependencies. Specifically, various “o” form instructions update XER[SO,OV], and various other instructions
read XER[SO], XER[OV], or both as a source operand. These instructions that use XER[SO] or XER[OV] as
a source operand are: mfspr (with the XER specified as the source SPR), mcrxr, compare instructions
(which copy XER[SO] into CR[CR0][35]), and all record form instructions (which copy XER[SO] into
CR[CR0][35]).
C.6.1 Contention for the Same RACC Stage
If the two instructions require the same RACC stage, they must be issued in separate cycles, and thus their
effective throughput is two cycles for the two instructions. This corresponds to a penalty of zero cycles, one
cycle worse than the negative one-cycle penalty for the default, noncontention case where the two instructions can be issued together.
C.6.2 GPR Operand Dependency
If the second instruction has a GPR source operand that is the same as one of the first instruction GPR target
operands (that is, a GPR read-after-write [RaW] hazard), in general, the second instruction must be executed
at least one cycle after the first instruction. This requires at least two cycles to execute the two instructions
(zero cycles of penalty). Depending on the instruction type of the first instruction (and the pipeline stage at
which it finishes calculating its result), the second instruction might have to wait more than one cycle for its
source operand to become ready, thereby increasing the penalty for the two-instruction sequence even
further. The circumstances that result in such additional delay are described in rules that are listed later in this
section.
Two exceptions to this general rule are for integer store instructions and the allocated integer multiply-accumulate (MAC) instructions. The exceptions are as follows:
• Exception 1: The second instruction of the sequence is a store instruction, and the source GPR operand
of the second instruction that matches the target GPR operand of the first instruction is specifically the
store data operand (that is, the RS operand shown in the store instruction description).
• Exception 2: The second instruction of the sequence is a MAC instruction, and the source GPR operand
of the second instruction that matches the target GPR operand of the first instruction is specifically the
MAC accumulate operand (that is, the RT operand shown in the MAC instruction description, which is
both a source and a target for MAC instructions).
For both of these exceptions, the two instructions can generally still be executed and completed in parallel
such that the effective throughput is still generally one cycle for the two instructions, which is equivalent to the
nondependent case.
Version 2.2
July 31, 2014
Page 303 of 322
User’s Manual
This parallel execution is generally possible because the store data operand and the MAC accumulate
operand are both accessed one cycle later in the execution pipeline than are other GPR source operands. As
is the case for the general GPR operand dependency rule described in the preceding paragraphs, however,
and depending on the instruction type of the first instruction, there might be additional delay in the calculation
of the first instruction result, and hence, additional penalty in the execution of the two-instruction sequence in
such cases, even for the special case of the second instruction being one of these two types. Again, the
circumstances under which this additional penalty applies are described in rules that are listed later in this
section. In general though, the GPR dependency-related penalty for the special case of the dependency
being for the store data or MAC accumulate operand is one cycle fewer than the standard GPR dependencyrelated penalty.
C.6.3 General CR Operand Dependency
There is no need for a separate general rule for execution penalty associated with CR operand dependencies, corresponding to the general rule for GPR dependencies. This is because all instructions that use the
CR (either as a source or as a target operand) must issue to the IRACC pipeline stage and be executed in the
I-pipe. Therefore, the RACC contention rule described in Section C.6.1 Contention for the Same RACC Stage
on page 303 applies to all instruction sequences involving such CR dependencies, leading to a default base
execution rate of two cycles for the two instructions with such a dependency. For example, the sequence of a
compare instruction (which writes a field of the CR as a target) followed by a conditional branch (which reads
a bit of the CR as a source) takes two cycles to execute. This is true whether the branch instruction is actually
conditional upon a CR bit that was updated by the compare instruction or whether the branch is conditional.
For branch instructions, there are other considerations related to the predicted outcome of the branch and the
latency with which the instructions subsequent to the branch may be executed. Also, as is the case with GPR
dependencies, there are other special cases involving instructions that do not calculate their CR results in the
first cycle of execution (IEXE1 pipeline stage), and hence, introduce additional cycles of penalty when the
subsequent instruction is dependent on those CR results.
These special cases are covered in the rules listed later in this section.
C.6.4 Multiply Dependency
Multiply instructions (including the Power ISA 32-bit × 32-bit multiply instructions and the allocated 16-bit ×
16-bit multiply-halfword instructions) calculate their results in the IWB pipeline stage (including the GPR
result, and the CR result for record forms of multiply that update the CR, and the XER result for “o” forms of
multiply that update XER[SO,OV]). Therefore, instruction sequences consisting of a multiply followed immediately by an instruction that uses the multiply result (either the GPR, CR, or XER result) as an input operand
take five cycles to complete. This corresponds to a three-cycle penalty, or three cycles more than the penalty
for the general GPR dependency rule described in Section C.6.2 GPR Operand Dependency on page 303.
Also, if the dependency involved in the sequence is specifically a store data GPR operand, the penalty is one
cycle less, or a total execution time of five cycles, not six. The same is true if the first instruction is a Power
ISA 32-bit × 32-bit multiply instruction and the second instruction is a MAC instruction, with the only dependency between the two being the MAC accumulate GPR operand (the total execution time is four cycles).
However, unlike what is described in Section C.6.2 GPR Operand Dependency, if the first instruction in the
sequence is specifically a multiply-halfword instruction and the second instruction is a MAC instruction using
the GPR result of the multiply-halfword instruction as the accumulate operand, the penalty associated with
the sequence is two cycles, or a total execution time of four cycles for the two instructions, not six.
Page 304 of 322
Version 2.2
July 31, 2014
User’s Manual
C.6.5 Multiply-Accumulate (MAC) Dependency
MAC instructions calculate their results in the IWB (or MWB) pipeline stage including the GPR result, the CR
result for record forms of MAC that update the CR, and the XER result for “o” forms of MAC that update
XER[SO,OV]. Therefore, instruction sequences consisting of a MAC instruction followed immediately by an
instruction that uses the MAC result (either the GPR, CR, or XER result) as an input operand generally take
five cycles to complete. This corresponds to a three-cycle penalty, or three cycles more than the penalty for
the general GPR dependency rule described in Section C.6.2 GPR Operand Dependency on page 303. Also,
if the dependency involved in the sequence is specifically a store data GPR operand, the penalty is one cycle
fewer, or a total execution time of four cycles, not five.
Unlike what is described in Section C.6.2 GPR Operand Dependency, if the second instruction in the
sequence is another MAC instruction using the same GPR accumulate operand (and there is no XER[SO]
dependency between the instructions), the penalty associated with the sequence is one cycle, or a total
execution time of three cycles for the two instructions, not five. However, MAC instructions with the only
dependency between them being the GPR accumulate operand can be executed with single-cycle
throughput because of a special forwarding path within the execution pipeline.
Lastly, because of a write-after-read (WaR) hazard, instruction sequences consisting of a MAC instruction
preceded immediately by an instruction that updates the same GPR as the MAC instruction updates generally take three cycles to complete, which corresponds to a one-cycle penalty.
C.6.6 Divide Dependency
The divider is based on radix-2 SRT division algorithm. The first two stages, MEXE1 and MEXE2, are used to
prepare to compute the leading zeros of the dividend to reduce its execution iterations. MEXE3 stage is used
for divide computation. The divide instructions reside in IWB for various cycles as they iteratively calculate
their results including the GPR result, the CR result for “record” forms of divide that update the CR, and the
XER result for “o” forms of divide that update XER[SO,OV]. Therefore, instruction sequences consisting of a
divide followed immediately by an instruction that uses the divide result (either the GPR, CR, or XER result)
as an input operand takes various cycles to complete. The average penalty is expected to be 10+ cycles. In
this case, 10+ cycles of the penalty for the general GPR dependency rule described in Section C.6.2 GPR
Operand Dependency applies.
Also note that as described in that section, if the dependency involved in the sequence is specifically a store
data or MAC accumulate GPR operand (and there is no XER[SO] dependency between the instructions), the
penalty is one cycle fewer, or a total execution time of (10+ cycles minus one) cycles.
Furthermore, because divide instructions occupy the IWB pipeline stage for a total of 10+ cycles (instead of
the standard one cycle), they impose an additional 10+ cycle penalty on any immediately succeeding instruction that also uses the I-pipe IWB stage. Otherwise, the I-pipe can be used to execute dot instructions. This is
one of advantages with IPGPR (temporary buffers) implementation in the PowerPC 476FP core.
On the other hand, instructions subsequent to the divide that use the L-pipe, J-pipe and I-pipe (because many
of I-pipe instructions can be completed in IEXE1 and IEXE2) are not dependent on the result of the divide,
and can be executed and completed while the divide is iterating in the IWB pipeline stage.
C.6.7 Move to Condition Register Fields (mtcrf) Instruction Dependency
Because of the nature of the mtcrf instruction, which can update any combination of the eight, 4-bit CR fields
at once, subsequent instructions that use any bit or field of the CR as a source must wait for the preceding
mtcrf instruction to complete before dispatching from the IRACC stage. Therefore, the total execution time
Version 2.2
July 31, 2014
Page 305 of 322
User’s Manual
for a mtcrf instruction followed by an instruction using the CR as a source operand is five cycles, or a penalty
of three cycles. Note that this penalty applies whether or not the mtcrf instruction is actually updating any of
the CR bits or fields being used as source operands by the subsequent instruction.
The following instructions use the CR as a source operand and hence are subject to this three-cycle penalty
when they immediately follow a mtcrf instruction:
• bc, bclr, bcctr (with BO[0] = ‘0’)
• mfcr
• mcrf
• Condition Register logical instructions (crand, cror, crnand, crnor, crandc, crorc, crxor, creqv)
C.6.8 Store Word Conditional Indexed (stwcx.) Instruction Dependency
Because of the nature of the stwcx. instruction, which conditionally performs a storage access in addition to
updating CR[CR0], subsequent instructions that use any bit of CR[CR0] as a source operand must wait for
the preceding stwcx. instruction to complete before dispatching from the IRACC stage. Therefore, the total
execution time for a stwcx. instruction followed by an instruction using any bit of CR[CR0] as a source
operand is >20 cycles, or a penalty of about 20 cycles.
This rather large latency is because of storage reservation being handled in L2 cache, and thus a minimum
latency of L2 cache access is added. This is the PowerPC 476FP storage reservation implementation that is
targeted for MP system performance. Also see Section C.6.19 lwarx and stwcx. Operations on page 311.
The following instructions potentially use CR[CR0] (either the whole field or a single bit of the field) as a
source operand, and if, so are subject to this 20-cycle penalty when they immediately follow a stwcx. instruction:
• bc, bclr, bcctr (with BO[0] = ‘0’)
• mfcr
• mcrf
• Condition Register logical instructions (crand, cror, crnand, crnor, crandc, crorc, crxor, creqv)
C.6.9 Move from Conditional Register (mfcr) Instruction Dependency
Because the mfcr instruction reads all eight CR fields at once, and because there can be multiple
CR-updating instructions in execution at one time, the mfcr instruction must wait until all preceding CR
updates have completed before beginning execution. Therefore, any two-instruction sequence involving a
CR-updating instruction followed immediately by a mfcr instruction takes four cycles to execute, or a penalty
of two cycles.
See Section B Instruction Summary on page 271 for CR updating dot form instructions.
Note that the actual penalty for the sequence of mtcrf followed immediately by mfcr is three cycles not two,
as described in Section C.6.7 Move to Condition Register Fields (mtcrf) Instruction Dependency on page 305.
Similarly, the penalty for the sequence of stwcx. followed immediately by mfcr is three cycles not two, as
described in Section C.6.8 Store Word Conditional Indexed (stwcx.) Instruction Dependency on page 306.
Page 306 of 322
Version 2.2
July 31, 2014
User’s Manual
C.6.10 Move from Special Purpose Register (mfspr) Dependency
The mfspr instruction provides its result in the IEXE2 pipeline stage. Therefore, instruction sequences
consisting of a mfspr followed immediately by an instruction that uses the target GPR of the mfspr instruction as an input operand generally takes six cycles to complete, which corresponds to a four-cycle penalty, or
four cycle more than the penalty for the general GPR dependency rule described in Section C.6.2 GPR
Operand Dependency on page 303. Also note that as described in that section, if the dependency involved in
the sequence is specifically a store data or MAC accumulate operand, the penalty is one cycle less, or a total
execution time of five cycles, not six.
In the PowerPC 476FP core, mfspr instructions have been put on the low priority because of their low
frequency usage in general coding practice. However, this rule applies only to SPRs other than the LR, CTR,
or XER. For these three SPRs, the results of the mfspr instructions are available in the IEXE1 stage and
therefore the general GPR dependency rule of Section C.6.2 GPR Operand Dependency applies.
C.6.11 Move from Machine State Register (mfmsr) Dependency
The mfmsr instruction provides its result in the IEXE2 pipeline stage. Therefore, the same rule described in
Section C.6.10 Move from Special Purpose Register (mfspr) Dependency for mfspr applies to the mfmsr
instruction as well.
C.6.12 Move to Special Purpose Register (mtspr) Dependency
mtspr instructions occupy the IWB stage for a total of four cycles, and do not perform the write of the target
SPR until this forth cycle to enforce various architectural rules regarding instruction ordering. Therefore,
instruction sequences consisting of a mtspr followed immediately by a mfspr that references the same SPR
takes seven cycles to complete, which corresponds to a five-cycle penalty. However, this penalty does not
apply to the LR, CTR, or XER registers. Special handling within the execution pipeline allows a mtspr/mfspr
sequence that involves one of these three registers to operate in two cycles, thereby incurring only the zerocycle penalty resulting from both instructions requiring the I-pipe.
Similarly, when a mtspr instruction that specifically targets the MMUCR is followed immediately by a tlbsx
instruction (which uses some fields of the MMUCR as input operands), the sequence also takes seven cycles
to complete.
Furthermore, because mtspr instructions occupy the IWB pipeline stage for a total of three cycles (instead of
the standard one cycle), they impose an additional two-cycle penalty on any immediately succeeding instruction that also uses the I-pipe, regardless of any dependency that might exist. That is, any instruction
sequence involving a mtspr instruction followed immediately by another instruction that uses the I-pipe takes
a minimum of five cycles to execute, or a total penalty of three cycles. However, this penalty again does not
apply to the LR, CTR, or XER, nor does it apply to the SPRG registers (SPRG0 - SPRG7 and USPRG0).
Special handling within the execution pipeline allows a mtspr instruction that targets one of these registers to
move through the pipeline in the normal fashion, occupying the IWB stage for only one cycle.
In the PowerPC 476FP core, mtspr instructions have been put on the low priority because of their low
frequency usage in general coding practice. Also, instructions subsequent to the mtspr that use the L-pipe or
J-pipe can be executed and completed while the mtspr is continuing to occupy the IWB pipeline stage.
Version 2.2
July 31, 2014
Page 307 of 322
User’s Manual
C.6.13 TLB Management Instruction Dependency
In addition to the dependency between a mtspr that targets the MMUCR and a subsequent tlbsx instruction,
for which the penalty is described in Section C.6.12 Move to Special Purpose Register (mtspr) Dependency
on page 307, there are four other special case dependencies involving TLB management instructions that
lead to execution penalties.
First, the tlbwe instruction occupies the IWB pipeline stage for a total of four cycles (similar to the mtspr
instruction). Therefore, any instruction sequence involving a tlbwe instruction followed immediately by
another instruction that uses the I-pipe takes a minimum of five cycles to execute, or a total penalty of three
cycles. However, instructions subsequent to the tlbwe that use the L-pipe or J-pipe can be executed and
completed while the tlbwe is continuing to occupy the IWB pipeline stage.
Second, instruction sequences involving a tlbre or tlbsx instruction followed immediately by a mfspr instruction (that targets any SPR except the LR, CTR, or XER) take five cycles to complete, corresponding to a
penalty of three cycles. This penalty is from conflicting use of pipeline resources between the two instructions.
Third, instruction sequences involving a tlbwe instruction followed immediately by a tlbre or tlbsx instruction
also take five cycles to complete, corresponding to a penalty of three cycles. This penalty is from conflicting
use of the TLB array between the two instructions.
Fourth, instruction sequences involving a tlbre or tlbsx instruction followed immediately by a load, store,
cache management (except dcba, which performs no-ops on the PowerPC 476FPcore), cache debug, or
storage synchronization instruction, take five cycles to complete, corresponding to a penalty of three cycles.
Similarly, if the first instruction is instead a tlbwe, the two-instruction sequence takes eight cycles to complete
because the tlbwe instruction is held in the IWB pipeline stage for one extra cycle. Conversely, if the order of
the two instructions is reversed, with the TLB management instruction coming immediately after a load, store,
cache management, or cache debug, the two-instruction sequence takes either four or ten cycles to complete
(it takes ten cycles if the first instruction is icbi, icbt, iccci, or icread, and four cycles otherwise). These
penalties are all from the potential for conflicting use of the TLB array or other pipeline resources between the
two instructions.
C.6.14 DCR Register Managing Instruction Operation Dependency
Because the DCR managing instructions (mtdcr, mtdcrx, mtdcrux, mfdcr, mfdcrx and mfdcrux) must
interact with the asynchronous and slow clock DCR interface of the PowerPC 476FP core, they stall temporarily within the I-pipe. Specifically, these instructions are held in the IEXE1 pipeline stage as they participate
in the asynchronous handshake protocol of the DCR interface. The number of cycles for which these instructions remain in the IEXE1 pipeline stage depends upon the speed with which the DCR device responds to the
transaction. In general, a DCR managing instruction occupies the IEXE1 pipeline stage for two cycles, plus
the number of CPU clock cycles associated with the DCR interface clock synchronization, and the transaction
itself. The number of these extra cycles beyond the base two cycles depends on the relative clock frequencies of the CPU clock and the DCR interface clock, and on the number of cycles of the DCR transaction itself.
Assume a CPU:DCR clock ratio of,
R = C/D,
where,
the frequency of C and D are both in MHz, with C > D (always C > D) and R is an integer.
Page 308 of 322
Version 2.2
July 31, 2014
User’s Manual
Because there is a DCR arbiter/arbitration latency, and DCR slave bus latency beyond DCR device transaction latency, the actual number can vary greatly, especially in an MP system. In 45 nm technology with MP
and many DCR devices in a system, >> 100 cycles (X factor or XR) is expected.
The DCR managing instructions occupy the IEXE1 pipeline stage for XR cycles, thereby leading to a penalty
of XR cycles for any immediately subsequent instruction that must use the I-pipe.
On the other hand, instructions subsequent to a DCR managing instruction that use the L-pipe or J-pipe can
be executed and completed while the DCR managing instruction is continuing to occupy the IEXE1 pipeline
stage.
Furthermore, the mfdcr instruction cannot forward its GPR result to a subsequent instruction until the IEXE2
pipeline stage. Therefore, instruction sequences consisting of a mfdcr followed immediately by an instruction
that uses the mfdcr, mfdcrx or mfdcrux target register as an input operand will generally take XR cycles to
complete, which corresponds to an XR-cycle penalty.
C.6.15 Processor Control Instruction Operation
Various processor control instructions require special handling within the PowerPC 476FP core because of
the context synchronization requirements of the Power ISA Version 2.05 Book-III E architecture. These
instructions include:
• sc
• mtmsr
• isync
• rfi
• rfci
• rfmci
Each of these instructions is issued from DISS0, and it requires that the instruction stream be flushed and refetched immediately after the instruction execution, either at the next sequential address (for mtmsr, and
isync), or at the System Call interrupt vector location (for sc), or at the interrupt return address (for rfi, rfci,
and rfmci). Because of the instruction refetching requirement and other instruction processing requirements,
the minimum execution time for a two-instruction sequence involving one of these instructions as the first
instruction is as follows:
thirteen cycles (for mtmsr, isync, sc, rfi, rfci, and rfmci)
Furthermore, none of these instructions can be issued together with any preceding instruction, which means
that the minimum execution time is two cycles (zero-cycle penalty) for any two-instruction sequence in which
the second instruction is one of these instructions.
The wrtee and wrteei instructions are also issued from DISS0 and hold off any subsequent instructions being
issued till they are completed. These instructions are not context synchronizing instructions, and therefore,
they do not flush any fetched instructions.
The minimum execution time for a two-instruction sequence involving one of these instructions as the first
instruction is four cycles (for wrtee and wrteei)
It will be confirmed at IRACC stage, committed at IEXE1 stage, and completed in the next cycle. The subsequent instructions can be issued next cycle.
Version 2.2
July 31, 2014
Page 309 of 322
User’s Manual
C.6.16 Load Instruction Dependency
Load instructions that obtain their data from the data cache generally provide their result in the LWB pipeline
stage. Therefore, instruction sequences consisting of a load instruction followed immediately by an instruction
that uses the target GPR of the load instruction as an input operand generally takes five cycles to complete,
which corresponds to a three-cycle penalty, or three cycles more than the penalty for the general GPR
dependency rule described in Section C.6.2 GPR Operand Dependency on page 303. Also note that as
described in that section, if the dependency involved in the sequence is specifically a store data or MAC
accumulate operand, the penalty is one cycle less, or a total execution time of four cycles, not five. The
dependency described by this section applies only to the target data operand of a load instruction, and not to
the target address operand of a load with update instruction for which the result is available from the JEXE1
pipeline stage, and hence, only the general GPR dependency rule applies.
Note that there are many other factors that affect the performance of load and other storage access instructions (such as whether their target location is in the data cache).
C.6.17 Load/Store Operations
A load that depends on the result of a previous store must obtain the store data as required by the sequential
execution model (SEM). To handle this type of read-after-write (RaW) hazard, a read instruction must hold in
RACC until all of its operands are available (that is, the results of all previous writes to the read operands are
known). It is not necessary that all of these earlier writes have actually been performed in the GPR file, or that
these writes have even been committed by the CS. Rather, it is only required that the results be known and
available such that the read operation can proceed into execution with the correct values.
Similarly, a WaR hazard must be handled such that the results of later writes are not erroneously forwarded
to earlier reads. Unlike the RaW hazard described previously, a write can leave a given RACC stage even if
the read is holding in the other RACC stage. Note that this behavior is identical to the LRACC WaR hazard
against a MAC in IRACC (described in Section C.6.5 Multiply-Accumulate (MAC) Dependency on page 305),
which has a zero-cycle penalty.
The stalls described previously are required to handle RaW and WaR hazards based on operand dependency between reads and writes. These stalls do not, however, handle various resource dependencies that
might exist lower in the pipe. One special case is described here. First, note that load/store hits can generally
flow through the pipe with an overall throughput of one cycle per instruction, assuming a load hit does not
immediately follow a store hit to the same address (a typical compiler will never emit such a scenario). In the
case that a load hit immediately follows a store hit to the same address, the load incurs a five-cycle penalty
(three cycles for the store to write to the RAM array and two cycles for the load to reaccess the cache). In the
special case of a load hit following two store hits where the load matches both stores, the load incurs a sixcycle penalty (four cycles of RAM writes plus two cycles for the load to reaccess the cache).
C.6.18 String and Multiple Operations
String load/store and multiple load/store instructions are issued from DISS0 and are operated by replication
of loads/stores, repeating the LRACC, AGEN, CRD, DST, and LWB stages.
To allow for simplifications in the hazard logic implementation for string/multiple operations, all load string
multiples are assumed to update all registers (this simplification is necessary because it is not known which
registers the string/multiple will access until the final piece is in AGEN). Thus, a load string or multiple that is
in LRACC must hold until all older register reads are either past IRACC or are leaving IRACC this cycle
(except for MAC, which must be leaving IEXE1). Conversely, a newer register write in IRACC or LRACC must
Page 310 of 322
Version 2.2
July 31, 2014
User’s Manual
wait until a store string or multiple is finished replicating and the last piece is leaving AGEN before it can
proceed. A further simplification in the PowerPC 476FP core avoids GPR hazards that cause newer load
string/multiple operations to hold in DISS0 for all older writes to be completely gone from all pipelines. Thus,
the associated penalty for any load or store string/multiple depends on the operations that exist in the pipeline
at the time the load or store is in LRACC.
C.6.19 lwarx and stwcx. Operations
Because the PowerPC 476FP core is designed for SMP support, the storage reservation instructions, such
as lwarx and stwcx. are different from lwarx and stwcx of the previous PowerPC 4xx processors.
The lwarx instruction is issued to the L-pipe as a normal load, but generates a D-cache miss by invalidating
the cache entry, though the first operand can be in the L1 cache if hit, and stwcx is issued to both the I-pipe
and the L-pipe through DISS0 and invalidates the cache entry. In other words, both instructions are operated
on the L2 cache.
A CR0 update, whether the reservation is success or not, by a stwcx instruction is after the response from
the L2 cache, and thus, a CR0 update by a stwxc instruction is slow latency, about 20 cycles. However, a
storage reservation operation is a system-level operation and an in-order operation. Thus, use storage reservation operations with care if performance is desired.
C.6.20 Storage Ordering and Synchronizing Operations
The msync, mbar, and lwsync instructions go down the L-pipe and are confirmed in LRACC. However,
these instructions are storage-ordering and synchronization instructions. All preceding instructions will be
completed before msync, mbar, and lwsync complete, and no subsequent instructions will be initiated. This
is done in IU after msync, mbar, and lwsync are issued from DISS0. The msync instruction is a heavy
instruction and waits until all other processors in the system acknowledge that they have processed or
completed all preceding instructions. The mbar instruction is handled similarly.
The lwsync instruction is a lighter version of msync and waits for the L2 cache to acknowledge that all
preceding operations are complete.
These instructions are performance limiting, especially msync and mbar, and therefore, use these instructions with care.
C.6.21 Special TLB Managing Operations
The tlbivax and tlbsync instructions are system-level instructions. The tlbivax instruction is broadcast to all
processors in the system and invalidates a matching TLB entry in each processor. The tlbsync instruction
ensures both storage ordering and synchronizing. The tlbivax instruction is issued from DISS0 to both the
L-pipe and the I-pipe and is broadcast to all processors through the PLB. This instruction, though, does not
hold off any subsequent instructions from being issued. On the other hand, the tlbsync instruction is issued
to the L-pipe only and holds off all subsequent instructions being issued until all processors acknowledge that
they have completed all visible instructions and are context synchronized. The tlbsync instruction is the
heaviest instruction, even heavier than msync.
Version 2.2
July 31, 2014
Page 311 of 322
User’s Manual
C.7 Interrupt Handling
In the PowerPC 476FP core, the interrupt process of taking an interrupt spans three cycles. This is required
for timing reasons and necessitated by the need to allow any outstanding, committed SPR updates to update
the SPR before any subsequent interrupt vector is taken.
During the first interrupt cycle, the interrupt logic detects the exception and latches the flush operation. In the
second cycle, the flush is sent out to each unit and the proper address is steered into SRR0, CSSR0, and
MCSSR0 if allowable (rfi, rfci, and rfmci do not update restore registers). In the third cycle, the MSR context
is swapped and a fetch request for the interrupt vector is made.
Refetch and stop requests are similarly handled in three cycles, with the next fetch request occurring in the
third cycle. Because interrupts span three cycles, all new interrupt, refetch, and stop requests are blocked
during the second and third cycle of processing.
If another interrupt request exists and it is not disabled by the new MSR value, this three-cycle interrupt
sequence is repeated one cycle later. Thus a new Instruction Address Register (IAR) value is captured in
SRR0, CSSR0, and MCSSR0.
Page 312 of 322
Version 2.2
July 31, 2014
User’s Manual
Glossary
BD
Branch displacement
BHT
Branch history table
BI
Branch index
BO
Branch option
BT
Branch Taken
BWB
Branch write-back
CR
Condition Register
CRD
Cache read
CRPE
Cache read parity enable
CS
Central scrutinizer
CTR
Count Register
DBDR
Debug Data Register
DBSR
DCC
Data cache control
DCDBTRH
DCDBTRL
DCDTRH
DCDTRL
DCESR
DCLFD
Data cache line fill data
DCR
Device Control Register
DCRIPR
Device Control Register Immediate Prefix Register
DCU
Data cache unit
DEC
Decrementer
DS
Data space
DSI
Data storage interrupt
DTLB
Data TLB
DVC
Data value comparison
ENW
Enable next watchdog
Version 2.2
July 31, 2014
Glossary
Page 313 of 322
User’s Manual
EPN
Effective page number
ERPN
ESR
EU
Execution unit
EXP
Exponent
FIT
Fixed-interval timer
FP
Floating-point
FPR
Floating-Point Register
FPSCR
Floating-Point Status and Control Register
FPU
Floating-point unit
FT
Freeze timers
GPR
General Purpose Register
ICC
Instruction cache controller
ICDBTRH
ICDBTRL
ICESR
Instruction Cache Error Syndrome Register
ICMP
Instruction complete
ICU
Instruction cache unit
IDE
Imprecise debug event
IOCCR
IR
Intermediate result
IRPT
Interrupt
ITLB
Instruction translation lookaside buffer
IU
Instruction unit
IVOR
Interrupt Vector Table
IVPR
IWB
Integer write-back unit
LFB
Line-fill-buffers
LK
Link bit
LMQ
Load miss queue
LR
Link Register
Glossary
Page 314 of 322
Version 2.2
July 31, 2014
User’s Manual
LRU
Least recently used
LWB
Load write-back unit
MCSR
MIPS
Million instructions per second
MMU
Memory management unit
MMUCR
Memory Management Unit Configuration Register
MP
Multiprocessor
MRR
Most-recent reset
MSB
Most significant byte
MSR
NH
Next higher in magnitude
NL
Next lower in magnitude
OV
Overflow
OX
Overflow exception
PC
Program counter
PD
Physical design
PGM
Program exception
PGPR
Pre-General Purpose Register
PID
Process ID Register
PIR
Processor Identification Register
PLB
Processor local bus
PVR
RET
Return
RLD
Reload dump
RMPD
Real Mode Page Description Register
RPN
Real page number
RSTCFG
Reset Configuration
SBQ
Store buffer queue
SEM
Sequential execution model
SO
Summary Overflow
SPR
Special Purpose Register
Version 2.2
July 31, 2014
Glossary
Page 315 of 322
User’s Manual
SPRG
Special Purpose Register General
SR
Supervisor read
SSPCR
STID
Set translation ID
SW
Supervisor write
SX
Supervisor execute
TBC
Transfer Byte Count
TBL
Time Base Lower
TBU
Time Base Upper
TCR
TCS
Timer clock select
TID
Translation ID
TLB
Translation lookaside buffer
UDE
Unconditional debug event
USPCR
UTLB
unified translation lookaside buffer
UX
Underflow exception
VX
Invalid operation exception
WIE
Watchdog timer interrupt enable
WP
Watchdog timer period
WRC
Watchdog timer reset control
WRS
Watchdog timer reset status
WS
Word select
XER
XI
Inexact exception
ZX
Zero divide exception
Glossary
Page 316 of 322
Version 2.2
July 31, 2014
User’s Manual
Index
A
addressing, 33
addressing modes, 35
data storage, 35
instruction storage, 35
Alignment interrupt, 192
alignment interrupts, 192
allocated instruction summary, 55
ANSI/IEEE Standard 754-1985, 85
arithmetic compare, 63
asynchronous interrupt class, 167
B
BI field on conditional branches, 57
big endian
defined, 36
structure mapping, 37
big endian mapping, 37
BO field on conditional branches, 57
branch instruction summary, 52
branch instructions, exception priorities for, 215
branch prediction, 58
branch processing, 56
branching control
BI field on conditional branches, 57
BO field on conditional branches, 57
branch addressing, 56
branch prediction, 58
registers, 59
byte ordering, 36
big endian, defined, 36
instructions, 38 , 39
little endian, defined, 36
structure mapping
big-endian mapping, 37
little endian mapping, 37
C
cache management instructions
summary
data cache, 144
caching inhibited, 116
CCR0, 69 , 150
code
self-modifying, 140
coherence
data cache, 144
compare
Version 2.2
July 31, 2014
arithmetic, 63
logical, 63
computional instructions, 96
Condition Register. See also CR
context synchronization, 76
control
data cache, 144
CR, 60
defined
CR updating instructions, 61
instructions
integer
CR, 62
Critical Input interrupt, 185
critical interrupts, 169
Critical Save/Restore Register 0, 175 , 176
Critical Save/Restore Register 1, 176 , 177
CSRR0, 175 , 176
CSRR1, 176 , 177
CTR, 60
D
data addressing modes, 35
data cache
coherency, 144
data cache array organization and operation, 133
data cache controller. See DCC
Data Cache Unit Overview, 31
data storage addressing modes, 35
Data Storage interrupt, 188
data storage interrupts, 188
Data TLB Error interrupt, 199
data TLB error interrupts, 199
dcbt
functional description, 151
dcbt and dcbtst operation, 151
dcbtst
DCC (data cache controller)
control, 144
debug, 144
features, 141
operations, 142
DCDBTRH, 151
DCDBTRL, 151
dcread
DCRs
defined, 46
debug
debug cache, 144
Debug Interrupt, 201
debug interrupts, 201
Decrementer Interrupt, 197
Index
Page 317 of 322
User’s Manual
decrementer interrupts, 197
device control registers, 46
Device Control Registers. See also DCRs
E
E storage attribute, 36 , 118
effective address
calculation, 34
endianness, 36 , 118
ESR, 179
exception
alignment exception, 192
critical input exception, 185
data storage exception, 188
external input exception, 191
floating-point, 85
illegal instruction exception, 193
inexact, 85 , 101
instruction storage exception, 190
instruction TLB miss exception, 201
machine check exception, 186
overflow, 85 , 101
privileged instruction exception, 193
program exception, 193
system call exception, 197
trap exception, 196
underflow, 85 , 101
zero divide, 85
exception priorities, 210
exception priorities for
all other instructions, 216
allocated load and store instructions, 212
branch instructions, 215
floating-point load and store instructions, 211
integer load, store, and cache management instructions, 211
other allocated instructions, 213
other floating-point instructions, 212
preserved instructions, 215
privileged instructions, 214
reserved instructions, 216
return from interrupt instructions, 215
system call instruction, 214
trap instructions, 214
Exception Syndrome Register, 179
exception syndrome register, 179
Exceptions, 167
execution synchronization, 78
External Input interrupt, 191
external input interrupts, 191
Index
Page 318 of 322
F
Facilities, Debug, 29
Facilities, Test, 29
features
DCC, 141
ICC, 134
Features, General, 26
Features, Power Control, 26
FEX, 87
fixed interval timer interrupt, 198
Fixed-Interval Timer interrupt, 198
floating point interrupt unavailable interrupts, 196
floating-point
denormalized number, 91
infinity, 91
Not a Number, 92
sign, 93
zero, 91
floating-point compare and select instruction set index,
102
floating-point compare instructions
comparison sets, 102
floating-point load and store instructions, exception priorities for, 211
floating-point multiply-add instructions, 100
floating-point operands, 96
double precision format, 96
single format, 96
floating-point rounding and conversion instruction set
index, 101
floating-point status and control register, 102
instruction set index, 102
Floating-Point Unavailable interrupt, 196
Floating-Point Unit Overview, 30
freezing the timer facilities, 165
G
G storage attribute, 117
General, 26
General Purpose Registers. See also GPRs
GPRs
defined, 45
GPRs, illustrated, 63
guarded, 117
H, I, J, K
I storage attribute, 116
ICC (instruction cache controller)
features, 134
operations, 134
implemented instruction set summary, 49
implicit update, 62
Version 2.2
July 31, 2014
User’s Manual
imprecise interrupts, 168
instruction
partially executed, 172
instruction addressing modes, 35
instruction cache array organization and operation, 133
instruction cache controller. See ICC
Instruction Cache Overview, 30
instruction classes, 47
Instruction Set, 29
instruction set
brief summaries by category, 87
classes, 47
summary
allocated instructions, 55
branch, 52
cache management, 54
CR logical, 53
integer arithmetic, 50
integer compare, 51
integer logical, 51
integer rotate, 51
integer shift, 52
integer storage access, 50
integer trap, 51
processor synchronization, 54
register management, 53
system linkage, 53
TLB management, 54
instruction set summary, 49
instruction storage addressing modes, 35
Instruction Storage interrupt, 190
instruction storage interrupts, 190
Instruction TLB Error Interrupt, 201
instruction TLB error interrupts, 201
instructions
all other, exception priorities for, 216
allocated (other), exception priorities for, 213
allocated load and store, exception priorities for, 212
branch, exception priorities for, 215
by category, 96
byte ordering, 38 , 39
byte-reverse, 40
categories
allocated instruction summary, 55
branch, 52
integer, 49
processor control, 52
storage control, 54
storage synchronization, 55
classes
defined, 47 , 48
preserved, 48
computational, 96
CR updating, 61
data cache management instruction summary, 144
floating-point (other), exception priorities for, 212
Version 2.2
July 31, 2014
floating-point load and store, exception priorities for,
211
implemented instruction set summary, 49
integer compare
CR update, 63
integer load, store, and cache management, exception
priorities for, 211
mfmsr, 173
mtmsr, 173
noncomputational, 96
preserved, exception priorities for, 215
privileged, 75
privileged instructions, exception priorities for, 214
reserved, exception priorities for, 216
return from interrupt, exception priorities for, 215
rfi, 174
system call, exception priorities for, 214
trap, exception priorities for, 214
integer instructions
arithmetic, 50
compare, 51
logical, 51
rotate, 51
shift, 52
storage access, 50
trap, 51
integer load, store, and cache management instructions,
exception priorities for, 211
integer processing, 63
interrupt
alignment interrupt, 192
data storage interrupt, 188
external input interrupt, 191
instruction
Instruction Storage, 190
instruction storage interrupt, 190
instruction TLB miss interrupt, 201
machine check interrupt, 186
masking, 207
guidelines for system software, 209
ordering, 207 , 209
guidelines for system software, 209
program interrupt, 193
illegal instruction exception, 193
privileged instruction exception, 193
trap exception, 196
system call interrupt, 197
type
Alignment, 192
Critical Input, 185
Data Storage, 188
Data TLB Error, 199
Debug, 201
Decrementer, 197
Index
Page 319 of 322
User’s Manual
External Input, 191
Fixed-Interval Timer, 198
Floating-Point Unavailable, 196
Instruction TLB Error, 201
Machine Check, 186
Program interrupt, 193
System Call, 197
Watchdog Timer, 199
interrupt and exception handling registers
ESR, 179
interrupt classes
asynchronous, 167
critical and non-critical, 169
machine check, 169
synchronous, 168
interrupt vector, 170
Interrupts, 167
interrupts
definitions, 182
imprecise, 168
order, 209
ordering and masking, 207
ordering and software, 208
partially executed instructions, 172
precise, 168
registers, processing, 173
synchronous and imprecise, 168
synchronous and precise, 168
types
alignment, 192
data storage, 188
data TLB error, 199
debug, 201
decrementer, 197
definitions, 182
external inputs, 191
fixed interval timer, 198
floating point unavailable, 196
instruction storage, 190
instruction TLB error, 201
machine check, 186
program, 193
watchdog timer, 199
vectors, 170
invalid operation exception bit, 101
L
little endian
structure mapping, 37
little endian mapping, 37
little endian, defined, 36
load operations, 142
logical compare, 63
LR, 59
Index
Page 320 of 322
M
M storage attribute, 117
Machine Check, 169
Machine Check interrupt, 186
machine check interrupts, 169 , 186
Machine State Register. See also MSR
masking and ordering interrupts, 207
memory coherence required, 117
memory map, 33
memory organization, 33
mfmsr, 173
MSR, 173
defined, 46
mtmsr, 173
N
noncomputational instructions, 96
non-critical interrupts, 169
O
operands
storage, 33
operations
DCC, 142
ICC, 134
line flush, 143
load, 142
store, 142
ordering
storage access, 143
ordering and masking interrupts, 207
Overview, 25
Overview, Instruction Cache, 30
P
partially executed instructions, 172
PIR, 68
precise interrupts, 168
preserved instructions, exception priorities for, 215
priorities, exception, 210
privileged instructions, 75
privileged mode, 75
privileged operation, 75
privileged SPRs, 75
problem state, 75
processor control instruction summary, 52
processor control instructions
CR logical, 53
register management, 53
Version 2.2
July 31, 2014
User’s Manual
synchronization, 54
system linkage, 53
processor control registers, 66
Program interrupt, 193
program interrupts, 193
PVR, 68
R
register
CSRR0, 175 , 176
CSRR1, 176 , 177
ESR, 179
SRR0, 174
SRR1, 175
registers, 40 , 86
branching control, 59
CCR0, 69 , 150
CR, 46 , 60
CTR, 60
DCDBTRH, 151
DCDBTRL, 151
ESR, 179
GPRs, 63
interrupt processing, 173
LR, 59
MSR, 46 , 173
PIR, 68
processor control, 66
PVR, 68
RSTCFG, 74
SPRG0 SPRG7, 67
SPRG0-SPRG3, 67
TCR, 161
TSR, 162
types, 45 , 86
CR, 46
DCR, 46
GPR, 45
MSR, 46
SPR, 46
USPRG0, 67
XER, 64
registers, device control, 46
registers, summary, 40 , 86
requirements
software
interrupt ordering, 208
reserved instructions, exception priorities for, 216
return from interrupt instructions, exception priorities for,
215
rfi, 174
RSTCFG, 74
Version 2.2
July 31, 2014
S
Save/Restore Register 0, 174
Save/Restore Register 1, 175
self-modifying code, 140
software
interrupt ordering requirements, 208
Special Purpose Registers. See also SPRs
speculative fetching, 76
SPRG0 SPRG7, 67
SPRG0-SPRG3, 67
SPRs
defined, 46
SRR0, 174
SRR1, 175
storage access ordering, 143
storage attributes
caching inhibited, 116
endian, 118
guarded, 117
Memory Coherence Required, 117
supported combinations, 118
user-definable (U0–U3), 118
write-through required, 116
storage control instruction summary, 54
storage control instructions
cache management, 54
TLB management, 54
storage operands, 33
storage synchronization, 78
storage synchronization instruction summary, 55
store gathering, 142
store operations, 142
structure mapping
big endian, 37
little endian, 37
supervisor state, 75
synchronization
architectural references, 76
context, 76
execution, 78
storage, 78
synchronous interrupt class, 168
system call instruction, exception priorities for, 214
System Call interrupt, 197
T
TCR, 161
Test, 29
Test and Debug Facilities, 29
time base
writing, 159
timers
freezing the timer facilities, 165
watchdog timer, 161
Index
Page 321 of 322
User’s Manual
watchdog timer state machine, 163
trap instructions
exception priorities for, 214
TSR, 162
U, V, W
U0–U3 storage attributes, 118
user mode, 75
USPRG0, 67
W storage attribute, 116
Watchdog Timer interrupt, 199
watchdog timer interrupts, 199
write-through required, 116
writing the time base, 159
X
XER, 64
carry (CA) field, 66
overflow (OV) field, 65
summary overflow (SO) field, 65
transfer byte count (TBC) field, 66
Index
Page 322 of 322
Version 2.2
July 31, 2014

PowerPC 476FP Embedded Processor Core User’s Manual Version 2.2

Transcription

Similar documents

THE HOOK (AND PATTERN INTERRUPT)

Final Sample Questions Question 1

MU-TEST

geocaching - Pulaski County Tourism Bureau

1.What is Microprocessor ?

Computer Architecture, Lecture 1:

EE380 Spring 2004 Sample Final Exam

tilera

DUKUNGAN SISTEM OPERASI OPERATING SYSTEM SUPPORT