**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Design of approximate and precision-scalable circuits for embedded multimedia and neural-network processing

Résumé

Density, speed and energy efficiency of integrated circuits have been increasing exponentially for the last four decades following Moore's law. However, power and reliability pose several challenges to the future of technology scaling. Approximate computing has emerged as a promising candidate to improve performance and energy efficiency beyond scaling. Approximate circuits explore a new trade-off by intentionally introducing errors to overcome the limitations of traditional designs. This paradigm has led to another opportunity to minimize energy at run time with precision-scalable circuits, which can dynamically configure their accuracy or precision. This thesis investigates several approaches for the design of approximate and precision-scalable circuits for multimedia and deep-learning applications.

This thesis first introduces architectural techniques for designing approximate arithmetic circuits, in particular, two techniques called Inexact Speculative Adder (ISA) and Gate-Level Pruning (GLP). The ISA slices the addition operation into multiple shorter sub-blocks executed in parallel. It features a shorter speculative overhead and a novel error correction-reduction scheme. The second technique, GLP, consists in a CAD tool that removes the least-significant logic gates from a circuit in order to reduce energy consumption and silicon area. These conventional techniques have been successfully combined together or with overclocking.

The second part of this thesis introduces a novel concept to optimize approximate circuits by fabrication of false timing paths, i.e. critical paths that cannot be logically activated. Co-designing circuit timing together with functionality, this method proposes to monitor and cut critical paths to transform them into false paths. This technique is applied to an approximate adder, called the Carry Cut-Back Adder (CCBA), in which high-significance stages can cut the carry propagation chain at lower-significance positions, guaranteeing a high accuracy.

The third part of this thesis investigates approximate circuits within bigger datapaths and applications. The ISA concept is extended to a novel Inexact Speculative Multiplier (ISM). ISM, ISA and GLP techniques are then used to build approximate Floating-Point Units (FPU) taped-out in a 65nm quad-core processor. Approximate FPU circuits are validated through a High-Dynamic Range (HDR) image tone-mapping application. HDR imaging is a rapidly growing area in mobile phones and cameras extensively using floating-point computations. Results of the application show no visible quality loss, with image PSNR ranging from 76dB using the pruned FPU to 127dB using the speculative FPU.

The final part of this thesis reviews and complements scalable-precision Multiply-Accumulate (MAC) accelerators for deep learning applications. Deep learning has come with an enormous computational need for billions of MAC operations. Fortunately, reduced precision has demonstrated benefits with minimal loss in accuracy. Many works have recently shown configurable MAC architectures optimized for neural-network processing, either with parallelization or bit-serial approaches. In this thesis, the most prominent ones are reviewed, implemented and compared in a fair way. A hybrid precision-scalable MAC design is also proposed. Finally, an analysis of power consumption and throughput is carried out to figure out the key trends for reducing computation costs in neural-network processors.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Publications associées (5)

Chargement

Chargement

Chargement

Concepts associés (18)

Approximate computing

Approximate computing is an emerging paradigm for energy-efficient and/or high-performance design. It includes a plethora of computation techniques that return a possibly inaccurate result rather than

Apprentissage profond

L'apprentissage profond ou apprentissage en profondeur (en anglais : deep learning, deep structured learning, hierarchical learning) est un sous-domaine de l’intelligence artificiel

Silicon Valley

Silicon Valley (littéralement « vallée du silicium ») désigne le pôle des industries de pointe situé dans la partie sud-est de la région de la baie de San Francisco dans l'État de Californie, sur la c

The von Neumann architecture was first expressed in 1945 and has largely dominated in many variants and refinements computer science for more than half a century. Alternative architectures always occupied a marginal place only, despite a growing need for new concepts and paradigms in computer science. Biologically-inspired engineering applies biological concepts to the design of novel computing machines and algorithms. This can lead to the creation of new machines, endowed with properties usually associated with the living world: adaptation, evolution, growth and development, fault-tolerance, self-replication or cloning, reproduction, etc. Most of these approaches are based on well established theories such as artificial neural networks, evolutionary algorithms, and cellular automata. The work presented in this thesis takes an alternative path and proposes concepts for novel and unconventional biologically-inspired machines. The approach is mainly motivated by the insight that tomorrow's computational substrates and environments might be very different from what we know today. Some of tomorrow's computers might be embedded in the paint that covers your desk or printed on a sheet of paper by means of a special ink. Most of such pervasive computing concepts have some common elements: (1) the computer's basic elements are very simple, identical, and available in a huge number, (2) the interactions between the elements are purely local, (3) the elements as well as the interconnections are unreliable, and (4) there is no global control mechanism available. This thesis is mainly based on the unification of the following three domains of research: (1) amorphous computing, (2) membrane systems, and (3) blending. An amorphous computer is a massive parallel machine made up of myriads of simple, unreliable, and identical elements, distributed randomly on a surface and interconnected locally by unreliable connections. Membrane systems are theoretical models inspired by biochemistry-based on regions bounded by membranes. The hierarchical membrane structures contain artificial chemistries, consisting in objects and reactions, which allow to do computations. Blending is a framework of cognitive science which tries to explain how we deal with mental concepts and how creative thinking emerges. First, an introduction of traditional bio-inspired machines and hardware is provided. This part also includes the presentation of a first implementation of a membrane system on reconfigurable hardware and a description of the cellular automata machine entitled BioWall, with its applications. Random boolean networks as well as several theoretical considerations and practical results are then used to introduce irregular computational structures. The C-Blending approach represents an novel computational blending method intended for membrane systems and artificial chemistries. In order to implement membrane systems on amorphous computers, the Circuit Amorphous Computer as well as special membrane systems, termed aP and aB membrane systems, are proposed. The ultimate concept proposed and studied consists in a unification of membrane systems, amorphous computers, and computational C-Blending. The unification of the three concepts results in several interesting properties. The cellular structures allow to create dynamical hierarchies and growing systems whereas the artificial chemistries represent an ideal mean to compute on the potentially imperfect and irregular hardware of an amorphous computer. Finally, the computational blending proposed describes an inventive method to create, organize, and adapt membrane systems. The characteristics and limits of the concepts proposed are analyzed and validated using various examples and toy applications. The thesis concludes with the definition of the Circuit Amorphous Computer and the Amorphon architecture, which might constitute the minimal element of tomorrow's computing machines.

Stéphane Bouquet, Tatjana Chavdarova, François Fleuret, Pascal Fua, Cijo Jose, Andrii Maksai

People detection methods are highly sensitive to occlusions between pedestrians, which are extremely frequent in many situations where cameras have to be mounted at a limited height. The reduction of camera prices allows for the generalization of static multi-camera set-ups. Using joint visual information from multiple synchronized cameras gives the opportunity to improve detection performance. In this paper, we present a new large-scale and high-resolution dataset. It has been captured with seven static cameras in a public open area, and unscripted dense groups of pedestrians standing and walking. Together with the camera frames, we provide an accurate joint (extrinsic and intrinsic) calibration, as well as 7 series of 400 annotated frames for detection at a rate of 2 frames per second. This results in over 40 000 bounding boxes delimiting every person present in the area of interest, for a total of more than 300 individuals. We provide a series of benchmark results using baseline algorithms published over the recent months for multi-view detection with deep neural networks, and trajectory estimation using a non-Markovian model.

We report on the use of deep learning algorithms to perform depth recovery in multiview imaging. We show that if enough training data are provided, a neural network such as multilayer perceptron can be trained to recover the depth in multiview imaging as a regression problem. Such a method can replace camera calibration since no knowledge on the camera configuration is required during training. Another advantage of deep learning for this problem, is the speed of testing; typically a few microseconds per point in the scene. This is a lot better than state-of-art algorithms that require to solve a full optimization problem. In a second part, we have studied a related problem: detecting changes in the camera setting. We have shown that deep learning classifiers can recognize amongst a few (4 or 5) camera settings based only on the projections of points on the cameras, with less than 1% classification error. This is a promising step towards the SLAM problem.

2016