Publication

Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference

David Atienza Alonso, Giovanni Ansaloni, Alexandre Sébastien Julien Levisse, Marco Antonio Rios, Flavio Ponzina
2023
Article

Résumé

By supporting the access of multiple memory words at the same time, Bit-line Computing (BC) architectures allow the parallel execution of bit-wise operations in-memory. At the array periphery, arithmetic operations are then derived with little additional overhead. Such a paradigm opens novel opportunities for Artificial Intelligence (AI) at the edge, thanks to the massive parallelism inherent in memory arrays and the extreme energy efficiency of computing in-situ, hence avoiding data transfers. Previous works have shown that BC brings disruptive efficiency gains when targeting AI workloads, a key metric in the context of emerging edge AI scenarios. This manuscript builds on these findings by proposing an end-to-end framework that leverages BC-specific optimizations to enable high parallelism and aggressive compression of AI models. Our approach is supported by a novel hardware module performing real-time decoding, as well as new algorithms to enable BC-friendly model compression. Our hardware/software approach results in a 91% energy savings (for a 1% accuracy degradation constraint) regarding state-of-the-art BC computing approaches.

Source officielle

https://infoscience.epfl.ch/record/299208?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Proximité ontologique

Génie informatique

Architecture matérielle: Système embarqué

Calcul intensif: Parallélisme (informatique)

Matériel informatique: Microprocesseur

Psychologie

Basic psychology: Psychologie cognitive

Concepts associés (38)

In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its generality: how well a range of different problems can be expressed for a variety of different architectures, and its performance: how efficiently the compiled programs can execute. The implementation of a parallel programming model can take the form of a library invoked from a sequential language, as an extension to an existing language, or as an entirely new language.

vignette|Le MMN80CPU (clone du Zilog Z80), un processeur 8 bits En architecture des ordinateurs, les unités 8 bits d'entiers, d'adresses mémoire ou d'autres données sont celles qui ont une largeur de , c'est-à-dire . Aussi, les architectures 8 bits de processeurs et d'unités arithmétiques et logiques sont celles qui sont fondées sur des registres, des bus d'adresse, ou des bus de données de cette taille. « 8 bits » est aussi un terme donné à une génération de calculateurs dans lesquels les processeurs 8 bits étaient la norme.

Computer memory stores information, such as data and programs for immediate use in the computer. The term memory is often synonymous with the term primary storage or main memory. An archaic synonym for memory is store. Computer memory operates at a high speed compared to storage which is slower but less expensive and higher in capacity. Besides storing opened programs, computer memory serves as disk cache and write buffer to improve both reading and writing performance.

Afficher plus

Publications associées (49)

Afficher plus

MOOCs associés (4)

Afficher plus

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Connectez-vous pour utiliser Chat avec Graph Search

Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference

Graph Chatbot

Chattez avec Graph Search

EdgeAI-Aware Design of In-Memory Computing Architectures

2D Nanosystems: Applications of 2D Semiconductors for In-Memory Computing

How to Achieve Large-Area Ultra-Fast Operation of MoS 2 Monolayer Flash Memories?

EdgeAI-Aware Design of In-Memory Computing Architectures

2D Nanosystems: Applications of 2D Semiconductors for In-Memory Computing

How to Achieve Large-Area Ultra-Fast Operation of MoS 2 Monolayer Flash Memories?