Publication

Accurate deep neural network inference using computational phase-change memory

Abstract

In-memory computing using resistive memory devices is a promising non-von Neumann approach for making energy-efficient deep learning inference hardware. However, due to device variability and noise, the network needs to be trained in a specific way so that transferring the digitally trained weights to the analog resistive memory devices will not result in significant loss of accuracy. Here, we introduce a methodology to train ResNet-type convolutional neural networks that results in no appreciable accuracy loss when transferring weights to phase-change memory (PCM) devices. We also propose a compensation technique that exploits the batch normalization parameters to improve the accuracy retention over time. We achieve a classification accuracy of 93.7% on CIFAR-10 and a top-1 accuracy of 71.6% on ImageNet benchmarks after mapping the trained weights to PCM. Our hardware results on CIFAR-10 with ResNet-32 demonstrate an accuracy above 93.5% retained over a one-day period, where each of the 361,722 synaptic weights is programmed on just two PCM devices organized in a differential configuration. Designing deep learning inference hardware based on in-memory computing remains a challenge. Here, the authors propose a strategy to train ResNet-type convolutional neural networks which results in reduced accuracy loss when transferring weights to in-memory computing hardware based on phase-change memory.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (36)
Phase-change memory
Phase-change memory (also known as PCM, PCME, PRAM, PCRAM, OUM (ovonic unified memory) and C-RAM or CRAM (chalcogenide RAM)) is a type of non-volatile random-access memory. PRAMs exploit the unique behaviour of chalcogenide glass. In PCM, heat produced by the passage of an electric current through a heating element generally made of titanium nitride is used to either quickly heat and quench the glass, making it amorphous, or to hold it in its crystallization temperature range for some time, thereby switching it to a crystalline state.
Non-volatile memory
Non-volatile memory (NVM) or non-volatile storage is a type of computer memory that can retain stored information even after power is removed. In contrast, volatile memory needs constant power in order to retain data. Non-volatile memory typically refers to storage in semiconductor memory chips, which store data in floating-gate memory cells consisting of floating-gate MOSFETs (metal–oxide–semiconductor field-effect transistors), including flash memory storage such as NAND flash and solid-state drives (SSD).
Memory cell (computing)
The memory cell is the fundamental building block of computer memory. The memory cell is an electronic circuit that stores one bit of binary information and it must be set to store a logic 1 (high voltage level) and reset to store a logic 0 (low voltage level). Its value is maintained/stored until it is changed by the set/reset process. The value in the memory cell can be accessed by reading it. Over the history of computing, different memory cell architectures have been used, including core memory and bubble memory.
Show more
Related publications (33)

2D Nanosystems: Applications of 2D Semiconductors for In-Memory Computing

Guilherme Migliato Marega

Machine learning and data processing algorithms have been thriving in finding ways of processing and classifying information by exploiting the hidden trends of large datasets. Although these emerging computational methods have become successful in today's ...
EPFL2023

HetCache: Synergising NVMe Storage and GPU acceleration for Memory-Efficient Analytics

Anastasia Ailamaki, Periklis Chrysogelos, Hamish Mcniece Hill Nicholson, Syed Mohammad Aunn Raza

Accessing input data is a critical operation in data analytics: i) slow data access significantly degrades performance, and ii) storing everything in the fastest medium, i.e., memory, incurs high operational and hardware costs. Further, while GPUs offer in ...
2023

Chaosity: Understanding Contemporary NUMA-architectures

Anastasia Ailamaki, Viktor Sanca, Hamish Mcniece Hill Nicholson, Andreea Nica, Syed Mohammad Aunn Raza

Modern hardware is increasingly complex, requiring increasing effort to understand in order to carefully engineer systems for optimal performance and effective utilization. Moreover, established design principles and assumptions are not portable to modern ...
2023
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.