In computer architecture, register renaming is a technique that abstracts logical registers from physical registers. Every logical register has a set of physical registers associated with it. When a machine language instruction refers to a particular logical register, the processor transposes this name to one specific physical register on the fly. The physical registers are opaque and cannot be referenced directly but only via the canonical names. This technique is used to eliminate false data dependencies arising from the reuse of registers by successive instructions that do not have any real data dependencies between them. The elimination of these false data dependencies reveals more instruction-level parallelism in an instruction stream, which can be exploited by various and complementary techniques such as superscalar and out-of-order execution for better performance. In a register machine, programs are composed of instructions which operate on values. The instructions must name these values in order to distinguish them from one another. A typical instruction might say: “add and and put the result in ”. In this instruction, , and are the names of storage locations. In order to have a compact instruction encoding, most processor instruction sets have a small set of special locations which can be referred to by special names: registers. For example, the x86 instruction set architecture has 8 integer registers, x86-64 has 16, many RISCs have 32, and IA-64 has 128. In smaller processors, the names of these locations correspond directly to elements of a . Different instructions may take different amounts of time; for example, a processor may be able to execute hundreds of instructions while a single load from the main memory is in progress. Shorter instructions executed while the load is outstanding will finish first, thus the instructions are finishing out of the original program order. Out-of-order execution has been used in most recent high-performance CPUs to achieve some of their speed gains.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (7)
CS-470: Advanced computer architecture
The course studies techniques to exploit Instruction-Level Parallelism (ILP) statically and dynamically. It also addresses some aspects of the design of domain-specific accelerators. Finally, it explo
CS-307: Introduction to multiprocessor architecture
Multiprocessors are a core component in all types of computing infrastructure, from phones to datacenters. This course will build on the prerequisites of processor design and concurrency to introduce
EE-490(b): Lab in EDA based design
The goal of this lab is to get a working knowledge on the use of industrial state-of-the-art EDA (Electronic Design Automation) tools and design kits for the design of analog and digital integrated ci
Show more
Related lectures (34)
Dynamic Scheduling
Explores dynamic scheduling in processor design to increase parallelism by executing instructions out of order, improving performance and efficiency.
Dynamic Scheduling: Extending Pipelining for Parallelism
Delves into dynamic scheduling to enhance pipelining and enable out-of-order execution for improved performance.
Show more
Related publications (38)

WR2A: A Very-Wide-Register Reconfigurable-Array Architecture for Low-Power Embedded Devices

David Atienza Alonso, Miguel Peon Quiros, Benoît Walter Denkinger

Edge-computing requires high-performance energy-efficient embedded systems. Fixed-function or custom accelerators, such as FFT or FIR filter engines, are very efficient at implementing a particular functionality for a given set of constraints. However, the ...
2022

Exploiting Flow Graph of System of ODEs to Accelerate the Simulation of Biologically-Detailed Neural Networks

Felix Schürmann, Michael Lee Hines, Bruno Ricardo Da Cunha Magalhães

Exposing parallelism in scientific applications has become a core requirement for efficiently running on modern distributed multicore SIMD compute architectures. The granularity of parallelism that can be attained is a key determinant for the achievable ac ...
IEEE2019

LTRF: Enabling High-Capacity Register Files for GPUs via Hardware/Software Cooperative Register Prefetching

Babak Falsafi, Mario Paulo Drumond Lages De Oliveira, Hamid Sarbazi-Azad, Seyed Borna Ehsani

Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, ...
2018
Show more
Related concepts (19)
Out-of-order execution
In computer engineering, out-of-order execution (or more formally dynamic execution) is a paradigm used in most high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a processor executes instructions in an order governed by the availability of input data and execution units, rather than by their original order in a program. In doing so, the processor can avoid being idle while waiting for the preceding instruction to complete and can, in the meantime, process the next instructions that are able to run immediately and independently.
Processor register
A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. In computer architecture, registers are typically addressed by mechanisms other than main memory, but may in some cases be assigned a memory address e.g. DEC PDP-10, ICT 1900.
CPU cache
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels (L1, L2, often L3, and rarely even L4), with different instruction-specific and data-specific caches at level 1.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.