Publication

In-Memory Hardware and Architectural Extensions for Workloads Acceleration

William Andrew Simon
2022
Thèse EPFL
Résumé

Utilization of edge devices has exploded in the last decade, with such use cases as wearable devices, autonomous driving, and smart homes. As their ubiquity grows, so do expectations of their capabilities. Simultaneously, their formfactor and use cases limit power availability. Thus, improving performance while limiting area and power consumption is paramount. In this vein, in-SRAM Computing (iSC) moves computation from the CPU into the SRAM memory hierarchy. This has multiple benefits. First, reduced data movement mitigates power consumption and latency. Second, the entire memory array can be utilized to perform hundreds of concurrent operations. This thesis exploits iSC while addressing the aforementioned challenges via a BitLine Accelerator for Devices on the Edge (BLADE). BLADE can be implemented in any SRAM system and utilizes local wordline groups to perform computations at a frequency 2.8x higher than state-of-the-art iSC architectures. BLADE is thoroughly simulated, fabricated, and benchmarked at the transistor, architecture, and software abstraction levels. Experimental results demonstrate performance/energy gains over an equivalent NEON accelerated processor for a variety of edge device workloads, namely, cryptography (4x performance gain/6x energy reduction), video encoding (6x/2x), and convolutional neural networks (3x/1.5x), while maintaining the highest frequency/energy ratio (up to 2.2Ghz@1V) of any conventional iSC computing architecture, and a low area overhead of less than 8%. With BLADE implemented, the possibilities for enhancement are manifold, with one such example being approximate computing. To this end, a CArryless Partial Product InExact Multiplier (CAPPIEM) halves multiplication latency while incurring negligible area overhead. As a standalone multiplier, CAPPIEM reduces the area/power-delay-product by 73/43%, respectively. Further, CAPPIEM has the unique property of computing exact results when one input is a Fibonacci encoded value. This property is exploited via a retraining strategy which quantizes neural network weights to Fibonacci values, ensuring exact computation during inference. Benchmarking on Squeezenet 1.0, DenseNet-121, and ResNet-18 demonstrate accuracy degradations of only 0.4/1.1/1.7%, while improving training time by up to 300x. A second BLADE enhancement is the use of Hybrid Caches (HCs) consisting of both SRAM and eNVRAM bitcells. HCs increase capacity and power savings via eNVRAM's small area footprint and low leakage energy. However, eNVRAMs also incur long write latency and limited endurance. To mitigate these drawbacks, this thesis presents SHyCache, an HC architecture and supporting programming model. By explicitly allocating variables with high read/write access ratios to the eNVRAM array, SHyCache reduces access time, power consumption, and area overhead, while maintaining maximal utilization efficiency and ease of programming. Benchmarks on a range of cache hierarchy variations using three deep neural networks demonstrate a design space that can be exploited to optimize performance, power consumption, or endurance, while demonstrating maximum performance gains of 1.7/1.4/1.3x and power consumption reductions of 5.1/5.2/5.4x.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Graph Chatbot

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.