**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Concept# Approximate computing

Résumé

Approximate computing is an emerging paradigm for energy-efficient and/or high-performance design. It includes a plethora of computation techniques that return a possibly inaccurate result rather than a guaranteed accurate result, and that can be used for applications where an approximate result is sufficient for its purpose. One example of such situation is for a search engine where no exact answer may exist for a certain search query and hence, many answers may be acceptable. Similarly, occasional dropping of some frames in a video application can go undetected due to perceptual limitations of humans. Approximate computing is based on the observation that in many scenarios, although performing exact computation requires large amount of resources, allowing bounded approximation can provide disproportionate gains in performance and energy, while still achieving acceptable result accuracy. For example, in k-means clustering algorithm, allowing only 5% loss in classification accuracy ca

Source officielle

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Publications associées

Chargement

Personnes associées

Chargement

Unités associées

Chargement

Concepts associés

Chargement

Cours associés

Chargement

Séances de cours associées

Chargement

Concepts associés

Aucun résultat

Publications associées (20)

Chargement

Chargement

Chargement

Personnes associées (4)

Cours associés (8)

ME-323: Chemical process control

Apporter aux étudiants les connaissances de base nécessaires à la modélisation et à l'analyse des systèmes dynamiques. Leur apprendre à concevoir des régulateurs et à analyser la performance des systèmes commandés.

CH-353: Introduction to electronic structure methods

Repetition of the basic concepts of quantum mechanics and main numerical algorithms used for practical implementions. Basic principles of electronic structure methods:Hartree-Fock, many body perturbation theory, configuration interaction, coupled-cluster theory, density functional theory.

EE-512: Applied biomedical signal processing

The goal of this course is twofold: (1) to introduce physiological basis, signal acquisition solutions (sensors) and state-of-the-art signal processing techniques, and (2) to propose concrete examples of applications for vital sign monitoring and diagnosis purposes.

Mattia Cacciotti, Vincent Frédéric Camus, Christian Enz, Yu Jiang, Xun Jiao

Worst-case design is used in IoT devices and high performance data centers to ensure reliability, leading to a power efficiency loss. Recently, approximate computing has been proposed to trade off accuracy for efficiency. In this paper, we use Inexact Speculative Adders, which redesign the adder architecture to shorten its critical path and improve performance, but introduces controlled structural errors. On the other hand, overclocking is used to reduce conservative timing guardbands but could normally introduce catastrophic timing errors, we thus apply a supervised learning model to overclock speculative adders and predict their timing errors. We build a methodology to combine both structural and timing errors and analyze how they interplay with each other to limit the overal errors.

Unités associées (4)

Séances de cours associées (9)

Jérémy Lucien Maurice Schlachter

The slowdown of Moore's law, which has been the driving force of the electronics industry over the last 5 decades, is causing serious problem to Integrated Circuits (ICs) improvements. Technology scaling is becoming more and more complex and fabrication costs are growing exponentially. Furthermore, the energy gains associated to technology scaling are slowing down. Meanwhile, the expected boom of Internet of Things (IoT) devices requires ultra-low power ICs to be able to operate for several years without any user intervention, and energy-efficient computing system on the server side to treat all the gathered data. Approximate computing has emerged as an alternative way to improve energy-efficiency of both, high-performance and low-power computing systems by tolerating small and occasional errors. This energy-accuracy tradeoff can be applied on a wide range of over-engineered applications, particularly those involving human senses such as video and image processing. This thesis first presents an approximate circuit design technique called Gate-Level Pruning, which consists in selectively removing logic gates from any conventional circuit in order to reduce energy consumption, critical path delay, and area occupied on silicon. A Computer Aided Design (CAD) tool has been developed and integrated in the standard digital flow and has been evaluated on several arithmetic circuits, achieving up to 78% energy-delay-area savings. It is then shown how this methodology can be applied on more complex systems made of multiple arithmetic blocks but also memory: the discrete Cosine Transform(DCT), which is a key building block for image and video processing applications. Then, the speculative adder technique is presented. It consists in cutting carry chains to significantly relax the circuit timing constraints', and therefore drastically reduce energy consumption, area and delay. It is shown that this technique leads to errors of different nature than those produced by gate-level pruning. It is therefore worth combining GLP and speculative adders to obtain even higher savings. This has been verified on IEEE-754 floating point units integrated in a 65nm process within a low-power multi-core processor. Silicon measurements show up to 27% power, 36% area and 53% power-area savings. The second part of this thesis introduces software techniques to achieve similar energy-accuracy tradeoffs on commercially available processors. By switching from double precision to single precision floating-point data type and by exploiting vectorization capabilities of modern processors, a factor 2 energy can be saved on a Newton method for solving nonlinear equations. To further investigate the origins of these savings, an energy model based on Energy Per Instructions (EPI) has been built. It turns out that less than 6% of the total energy is consumed by arithmetic operations and that savings are achieved mainly by reducing the amount of data transferred between registers, cache and main memory. One way to reduce those power-hungry data movements is to use application specific hardware accelerators. Unfortunately, a commercial processor cannot embark accelerators for all the possible applications. To that extent, hardware accelerators are implemented on a Field Programmable Gate Array (FPGA) interconnected with a general-purpose processor to further reduce the energy consumption.

Density, speed and energy efficiency of integrated circuits have been increasing exponentially for the last four decades following Moore's law. However, power and reliability pose several challenges to the future of technology scaling. Approximate computing has emerged as a promising candidate to improve performance and energy efficiency beyond scaling. Approximate circuits explore a new trade-off by intentionally introducing errors to overcome the limitations of traditional designs. This paradigm has led to another opportunity to minimize energy at run time with precision-scalable circuits, which can dynamically configure their accuracy or precision. This thesis investigates several approaches for the design of approximate and precision-scalable circuits for multimedia and deep-learning applications.
This thesis first introduces architectural techniques for designing approximate arithmetic circuits, in particular, two techniques called Inexact Speculative Adder (ISA) and Gate-Level Pruning (GLP). The ISA slices the addition operation into multiple shorter sub-blocks executed in parallel. It features a shorter speculative overhead and a novel error correction-reduction scheme. The second technique, GLP, consists in a CAD tool that removes the least-significant logic gates from a circuit in order to reduce energy consumption and silicon area. These conventional techniques have been successfully combined together or with overclocking.
The second part of this thesis introduces a novel concept to optimize approximate circuits by fabrication of false timing paths, i.e. critical paths that cannot be logically activated. Co-designing circuit timing together with functionality, this method proposes to monitor and cut critical paths to transform them into false paths. This technique is applied to an approximate adder, called the Carry Cut-Back Adder (CCBA), in which high-significance stages can cut the carry propagation chain at lower-significance positions, guaranteeing a high accuracy.
The third part of this thesis investigates approximate circuits within bigger datapaths and applications. The ISA concept is extended to a novel Inexact Speculative Multiplier (ISM). ISM, ISA and GLP techniques are then used to build approximate Floating-Point Units (FPU) taped-out in a 65nm quad-core processor. Approximate FPU circuits are validated through a High-Dynamic Range (HDR) image tone-mapping application. HDR imaging is a rapidly growing area in mobile phones and cameras extensively using floating-point computations. Results of the application show no visible quality loss, with image PSNR ranging from 76dB using the pruned FPU to 127dB using the speculative FPU.
The final part of this thesis reviews and complements scalable-precision Multiply-Accumulate (MAC) accelerators for deep learning applications. Deep learning has come with an enormous computational need for billions of MAC operations. Fortunately, reduced precision has demonstrated benefits with minimal loss in accuracy. Many works have recently shown configurable MAC architectures optimized for neural-network processing, either with parallelization or bit-serial approaches. In this thesis, the most prominent ones are reviewed, implemented and compared in a fair way. A hybrid precision-scalable MAC design is also proposed. Finally, an analysis of power consumption and throughput is carried out to figure out the key trends for reducing computation costs in neural-network processors.