Out-of-order executionIn computer engineering, out-of-order execution (or more formally dynamic execution) is a paradigm used in most high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a processor executes instructions in an order governed by the availability of input data and execution units, rather than by their original order in a program. In doing so, the processor can avoid being idle while waiting for the preceding instruction to complete and can, in the meantime, process the next instructions that are able to run immediately and independently.
Instruction-level parallelismInstruction-level parallelism (ILP) is the parallel or simultaneous execution of a sequence of instructions in a computer program. More specifically ILP refers to the average number of instructions run per step of this parallel execution. ILP must not be confused with concurrency. In ILP there is a single specific thread of execution of a process. On the other hand, concurrency involves the assignment of multiple threads to a CPU's core in a strict alternation, or in true parallelism if there are enough CPU cores, ideally one core for each runnable thread.
Execution unitIn computer engineering, an execution unit (E-unit or EU) is a part of the central processing unit (CPU) or graphics processing unit (GPU) that performs the operations and calculations forwarded from the instruction unit. It may have its own internal control sequence unit (not to be confused with the CPU's main control unit), some registers, and other internal units such as an arithmetic logic unit, address generation unit, floating-point unit, load–store unit, branch execution unit or some smaller and more specific components.
Floating-point unitA floating-point unit (FPU, colloquially a math coprocessor) is a part of a computer system specially designed to carry out operations on floating-point numbers. Typical operations are addition, subtraction, multiplication, division, and square root. Some FPUs can also perform various transcendental functions such as exponential or trigonometric calculations, but the accuracy can be very low, so that some systems prefer to compute these functions in software.
Single instruction, multiple dataSingle instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel) computations, but each unit performs the exact same instruction at any given moment (just with different data).
Parallel computingParallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling.
Pentium ProThe Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel and introduced on November 1, 1995. It introduced the P6 microarchitecture (sometimes termed i686) and was originally intended to replace the original Pentium in a full range of applications. While the Pentium and Pentium MMX had 3.1 and 4.5 million transistors, respectively, the Pentium Pro contained 5.5 million transistors.
P6 (microarchitecture)The P6 microarchitecture is the sixth-generation Intel x86 microarchitecture, implemented by the Pentium Pro microprocessor that was introduced in November 1995. It is frequently referred to as i686. It was planned to be succeeded by the NetBurst microarchitecture used by the Pentium 4 in 2000, but was revived for the Pentium M line of microprocessors. The successor to the Pentium M variant of the P6 microarchitecture is the Core microarchitecture which in turn is also derived from P6.
Register renamingIn computer architecture, register renaming is a technique that abstracts logical registers from physical registers. Every logical register has a set of physical registers associated with it. When a machine language instruction refers to a particular logical register, the processor transposes this name to one specific physical register on the fly. The physical registers are opaque and cannot be referenced directly but only via the canonical names.
Very long instruction wordVery long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to execute in sequence only, a VLIW processor allows programs to explicitly specify instructions to execute in parallel. This design is intended to allow higher performance without the complexity inherent in some other designs.