Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
The increasing size of Convolutional Neural Networks (CNNs) and the high computational workload required for inference pose major challenges for their deployment on resource-constrained edge devices. In this paper, we address them by proposing a novel In-Memory Computing (IMC) architecture. Our IMC strategy allows us to efficiently perform arithmetic operations based on bitline computing, enabling a high degree of parallelism while reducing energy-costly data transfers. Moreover, it features a hybrid memory structure, where a portion of each subarray, dedicated to storing CNN weights, is implemented as high-density, zero-standby-power Resistive RAM. Finally, it exploits an innovative method for storing quantized weights based on their value, named Weight Data Mapping (WDM), which further increases efficiency. Compared to state-of-the-art IMC alternatives, our solution provides up to 93% improvements in energy efficiency and up to 6x less run-time when performing inference on Mobilenet and AlexNet neural networks.
David Atienza Alonso, Tomas Teijeiro Campo, Lara Orlandic
Joshua Alexander Harrison Klein