Adding limited reconfigurability to superscalar processors

For the last thirty years, electronics, at first built with discrete components, and then as Integrated Circuits (IC), have brought diverse and lasting improvements to our quality of life. Examples might include digital calculators, automotive and airplane control assistance, almost all electrical household appliances, and the almost ubiquitous Personal Computer. Application-Specific Integrated Circuits (ASICs) were traditionally used for their high performance and low manufacturing cost, and were designed specifically for a single application with large volumes. But as lower product lifetimes and the pressures of fast marketing increased, ASICs' high design cost pushed for their replacement by Microprocessors. These processors, capable of implementing any functionality through a change in software, are thus often called General Purpose Processors. General purpose processors are used for everyday computing tasks, and found in all personal computers. They are also often used as building blocks for scientific supercomputers. Superscalar processors such as these require ever more processing power to run complex simulations, video games or versatile telecoms services. In the case of embedded applications, e.g. for portable devices, both performance and power consumption must be taken into account. In a bid to adapt a processor to some extent to select applications, fully reconfigurable logic can greatly improve the performance of a processor, since it is shaped for the best possible execution with the available resources. However, as reconfigurable logic is far slower than custom logic, this gain is possible only for some specific applications with large parallelism, after a detailed study of the algorithm. Even though this process can be automated, it still requires large computing resources, and cannot be performed at run time. To reduce the loss in speed compared to custom logic, it is possible to limit the reconfigurability to increase the breadth of applications where performance can be improved. However, as the application space increases, a careful analysis and design of the reconfigurability is required to minimize the speed loss, notably when dynamic reconfiguration is considered. As a case study, we analyze the feasibility of adding limited reconfigurability to the Floating Point Units (FPUs) of a general purpose processor. These rather large units execute all floating point operations, and may also be used for integer multiplication. If an application contains few or infrequent instructions that must be executed by the FPU, this idle hardware only increases power consumption without enhancing performance. This is often the case in non-scientific applications and even many recent and detailed video games which make heavy use of hardware display accelerators for 3D graphics. In a fast multiplier such as can be found in the FPU of a high performance processor, the logic to perform multiplication is a large tree of compressors to add all the partial products together. It is possible to add logic to allow the reconfiguration of part of this tree as several extra Arithmetic and Logic Units (ALU). This requires a detailed timing analysis for both the reconfigurable FPU and the extra ALUs, taking into account effects such as added wires and longer critical paths. Finally, the algorithm to decide when and how to reconfigure must be studied, in terms of eciency and complexity. The results of adding this limited reconfigurability to a mainstream superscalar processor over a large set of compute intensive benchmarks show gains of up to 56% in the best case, with an average gain of 11%. The application to an idealized huge top processor still shows slightly positive average gains, as the limits of available parallelism are reached, bounded by both the application and many of the characteristics of the processor. In all cases, binary compatibility is maintained, allowing the re-use of all existing software. We show that adding limited reconfigurability to a general purpose superscalar processor can produce interesting gains over a wide range of applications while maintaining binary compatibility, and without large modifications to the original design. Limited reconfigurability is worthwhile as it increases the design space, allowing gains to apply to a larger set of applications. These gains are achieved through careful study and optimization of the reconfigurable logic and the decision algorithm.

Adding limited reconfigurability to superscalar processors

Graph Chatbot

Chattez avec Graph Search

EdgeAI-Aware Design of In-Memory Computing Architectures

HEEPocrates: An Ultra-Low-Power RISC-V Microcontroller for Edge-Computing Healthcare Applications

Exploring High-Performance and Energy-Efficient Architectures for Edge AI-Enabled Applications

EdgeAI-Aware Design of In-Memory Computing Architectures

HEEPocrates: An Ultra-Low-Power RISC-V Microcontroller for Edge-Computing Healthcare Applications

Exploring High-Performance and Energy-Efficient Architectures for Edge AI-Enabled Applications