ISBN: 3-540-64798-8
TITLE: Processor Architecture
AUTHOR: Silc, Jurij; Robic, Borut; Ungerer, Theo
TOC:

1. Basic Pipelining and Simple RISC Processors 1 
1.1 The RISC Movement in Processor Architecture 1 
1.2 Instruction Set Architecture 5 
1.3 Examples of RISC ISAs 10 
1.4 Basic Structure of a RISC Processor 
and Basic Cache MMU Organization 15 
1.5 Basic Pipeline Stages 18 
1.6 Pipeline Hazards and Solutions 22 
1.6.1 Data Hazards and Forwarding 23 
1.6.2 Structural Hazards 27 
1.6.3 Control Hazards, Delayed Branch Technique, 
and Static Branch Prediction 28 
1.6.4 Multicycle Execution 30 
1.7 RISC Processors 32 
1.7.1 Early Scalar RISC Processors 33 
1.7.2 Sun microSPARC-II 34 
1.7.3 MIPS R3000 38 
1.7.4 MIPS R4400 40 
1.7.5 Other Scalar RISC Processors 43 
1.7.6 Sun picoJava-I 46 
1.8 Lessons learned from RISC 53 
2. Dataflow Processors 55 
2.1 Dataflow Versus Control-Flow 55 
2.2 Pure Dataflow 58 
2.2.1 Static Dataflow 59 
2.2.2 Dynamic Dataflow 63 
2.2.3 Explicit Token Store Approach 72 
2.3 Augmenting Dataflow with Control-Flow 77 
2.3.1 Threaded Dataflow 78 
2.3.2 Large-Grain Dataflow 85 
2.3.3 Dataflow with Complex Machine Operations 88 
2.3.4 RISC Dataflow 90 
2.3.5 Hybrid Dataflow 93 
2.4 Lessons learned from Dataflow 95 
3. CISC Processors 99 
3.1 A Brief Look at CISC Processors 99 
3.2 Out-of-Order Execution 100 
3.3 Dynamic Scheduling 101 
3.3.1 Scoreboarding 101 
3.3.2 Tomasulo's Scheme 109 
3.3.3 Scoreboarding versus Tomasulo's Scheme 117 
3.4 Some CISC Microprocessors 118 
3.5 Conclusions 120 
4. Multiple-Issue Processors 123 
4.1 Overview of Multiple-Issue Processors 123 
4.2 I-Cache Access and Instruction Fetch 129 
4.3 Dynamic Branch Prediction and Control Speculation 130 
4.3.1 Branch-Target Buffer or Branch-Target Address Cache 132 
4.3.2 Static Branch Prediction Techniques 133 
4.3.3 Dynamic Branch Prediction Techniques 134 
4.3.4 Predicated Instructions and Multipath Execution 146 
4.3.5 Prediction of Indirect Branches 150 
4.3.6 High-Bandwidth Branch Prediction 151 
4.4 Decode 152 
4.5 Rename 153 
4.6 Issue and Dispatch 155 
4.7 Execution Stages 159 
4.8 Finalizing Pipelined Execution 164 
4.8.1 Completion, Commitment, Retirement and Write-Back 164 
4.8.2 Precise Interrupts 165 
4.8.3 Reorder Buffers 166 
4.8.4 Checkpoint Repair Mechanism and History Buffer 167 
4.8.5 Relaxing In-order Retirement 167 
4.9 State-of-the-Art Superscalar Processors 168 
4.9.1 Intel Pentium family 168 
4.9.2 AMD-K5, K6 and K7 families 175 
4.9.3 Cyrix M II and M 3 Processors 178 
4.9.4 DEC Alpha 21x64 family 178 
4.9.5 Sun UltraSPARC family 184 
4.9.6 HAL SPARC64 family 187 
4.9.7 HP PA-7000 family and PA-8000 family 190 
4.9.8 MIPS R10000 and descendants 195 
4.9.9 IBM Power family 199 
4.9.10 IBM/Motorola/Apple PowerPC family 199 
4.9.11 Summary 203 
4.10 VLIW and EPIC Processors 203 
4.10.1 TI TMS320C6x VLIW Processors 207 
4.10.2 EPIC Processors, Intel's IA-64 ISA 
and Merced Processor 212 
4.11 Conclusions on Multiple-Issue Processors 217 
5. Future Processors to use Fine-Grain Parallelism 221 
5.1 Trends and Principles in the Giga Chip Era 221 
5.1.1 Technology Trends 221 
5.1.2 Application- and Economy-Related Trends 223 
5.1.3 Architectural Challenges and Implications 224 
5.2 Advanced Superscalar Processors 227 
5.3 Superspeculative Processors 231 
5.4 Multiscalar Processors 234 
5.5 Trace Processors 239 
5.6 DataScalar Processors 242 
5.7 Conclusions 245 
6. Future Processors to use Coarse-Grain Parallelism 247 
6.1 Utilization of more Coarse-Grain Parallelism 247 
6.2 Chip Multiprocessors 248 
6.2.1 Principal Chip Multiprocessor Alternatives 248 
6.2.2 TI TMS320C8x Multimedia Video Processors 252 
6.2.3 Hydra Chip Multiprocessor 254 
6.3 Multithreaded Processors 257 
6.3.1 Multithreading Approach for Tolerating Latencies 257 
6.3.2 Comparison of Multithreading and Non-Multithreading 
Approaches 260 
6.3.3 Cycle-by-Cycle Interleaving 262 
6.3.4 Block Interleaving 269 
6.3.5 Nanothreading and Microthreading 280 
6.4 Simultaneous Multithreading 281 
6.4.1 SMT at the University of Washington 282 
6.4.2 Karlsruhe Multithreaded Superscalar 284 
6.4.3 Other Simultaneous Multithreading Processors 292 
6.5 Simultaneous Multithreading versus Chip Multiprocessor 293 
6.6 Conclusions 297 
7. Processor-in-Memory, Reconfigurable, and Asynchronous 
Processors 299 
7.1 Processor-in-Memory 299 
7.1.1 The Processor-in-Memory Principle 299 
7.1.2 Processor-in-Memory approaches 303 
7.1.3 The Vector IRAM approach 305 
7.1.4 The Active Page model 306 
7.2 Reconfigurable Computing 307 
7.2.1 Concepts of Reconfigurable Computing 307 
7.2.2 The MorphoSys system 313 
7.2.3 Raw Machine 315 
7.2.4 Xputers and KressArrays 318 
7.2.5 Other Projects 321 
7.3 Asynchronous Processors 323 
7.3.1 Asynchronous Logic 325 
7.3.2 Projects 328 
7.4 Conclusions 333 
Acronyms 335 
Glossary 343 
References 361 
Index 379 
END
