Survey of Precision-Scalable Multiply-Accumulate Units for Neural-Network Processing

The current trend for deep learning has come with an enormous computational need for billions of Multiply-Accumulate (MAC) operations per inference. Fortunately, reduced precision has demonstrated large benefits with low impact on accuracy, paving the way towards processing in mobile devices and IoT nodes. Precision-scalable MAC architectures optimized for neural networks have recently gained interest thanks to their subword parallel or bit-serial capabilities. Yet, it has been hard to make a fair judgment of their relative benefits as they have been implemented with different technologies and performance targets. In this work, run-time configurable MAC units from ISSCC 2017 and 2018 are implemented and compared objectively under diverse precision scenarios. All circuits are synthesized in a 28nm commercial CMOS process with precision ranging from 2 to 8 bits. This work analyzes the impact of scalability and compares the different MAC units in terms of energy, throughput and area, aiming to understand the optimal architectures to reduce computation costs in neural-network processing.

Survey of Precision-Scalable Multiply-Accumulate Units for Neural-Network Processing

Graph Chatbot

Chat with Graph Search

Deep learning approach for identification of H II regions during reionization in 21-cm observations - II. Foreground contamination

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Task-driven neural network models predict neural dynamics of proprioception: Neural network model weights

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Deep learning approach for identification of H II regions during reionization in 21-cm observations - II. Foreground contamination

Task-driven neural network models predict neural dynamics of proprioception: Neural network model weights