Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The slowdown of Moore's law, which has been the driving force of the electronics industry over the last 5 decades, is causing serious problem to Integrated Circuits (ICs) improvements. Technology scaling is becoming more and more complex and fabrication costs are growing exponentially. Furthermore, the energy gains associated to technology scaling are slowing down. Meanwhile, the expected boom of Internet of Things (IoT) devices requires ultra-low power ICs to be able to operate for several years without any user intervention, and energy-efficient computing system on the server side to treat all the gathered data. Approximate computing has emerged as an alternative way to improve energy-efficiency of both, high-performance and low-power computing systems by tolerating small and occasional errors. This energy-accuracy tradeoff can be applied on a wide range of over-engineered applications, particularly those involving human senses such as video and image processing. This thesis first presents an approximate circuit design technique called Gate-Level Pruning, which consists in selectively removing logic gates from any conventional circuit in order to reduce energy consumption, critical path delay, and area occupied on silicon. A Computer Aided Design (CAD) tool has been developed and integrated in the standard digital flow and has been evaluated on several arithmetic circuits, achieving up to 78% energy-delay-area savings. It is then shown how this methodology can be applied on more complex systems made of multiple arithmetic blocks but also memory: the discrete Cosine Transform(DCT), which is a key building block for image and video processing applications. Then, the speculative adder technique is presented. It consists in cutting carry chains to significantly relax the circuit timing constraints', and therefore drastically reduce energy consumption, area and delay. It is shown that this technique leads to errors of different nature than those produced by gate-level pruning. It is therefore worth combining GLP and speculative adders to obtain even higher savings. This has been verified on IEEE-754 floating point units integrated in a 65nm process within a low-power multi-core processor. Silicon measurements show up to 27% power, 36% area and 53% power-area savings. The second part of this thesis introduces software techniques to achieve similar energy-accuracy tradeoffs on commercially available processors. By switching from double precision to single precision floating-point data type and by exploiting vectorization capabilities of modern processors, a factor 2 energy can be saved on a Newton method for solving nonlinear equations. To further investigate the origins of these savings, an energy model based on Energy Per Instructions (EPI) has been built. It turns out that less than 6% of the total energy is consumed by arithmetic operations and that savings are achieved mainly by reducing the amount of data transferred between registers, cache and main memory. One way to reduce those power-hungry data movements is to use application specific hardware accelerators. Unfortunately, a commercial processor cannot embark accelerators for all the possible applications. To that extent, hardware accelerators are implemented on a Field Programmable Gate Array (FPGA) interconnected with a general-purpose processor to further reduce the energy consumption.
Dirk Grundler, Thomas Yu, Ping Che, Qi Wang, Wei Zhang, Benedetta Flebus