A Residual Neural Network (a.k.a. Residual Network, ResNet) is a deep learning model in which the weight layers learn residual functions with reference to the layer inputs. A Residual Network is a network with skip connections that perform identity mappings, merged with the layer outputs by addition. It behaves like a Highway Network whose gates are opened through strongly positive bias weights. This enables deep learning models with tens or hundreds of layers to train easily and approach better accuracy when going deeper. The identity skip connections, often referred to as "residual connections", are also used in the 1997 LSTM networks, Transformer models (e.g., BERT, GPT models such as ChatGPT), the AlphaGo Zero system, the AlphaStar system, and the AlphaFold system.
Residual Networks were developed by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, which won the 2015 competition.
The AlexNet model developed in 2012 for ImageNet was an 8-layer convolutional neural network.
The neural networks developed in 2014 by the Visual Geometry Group (VGG) at the University of Oxford approached a depth of 19 layers by stacking 3-by-3 convolutional layers.
But stacking more layers led to a quick reduction in training accuracy, which is referred to as the "degradation" problem.
A deeper network should not produce a higher training loss than its shallower counterpart, if this deeper network can be constructed by its shallower counterpart stacked with extra layers. If the extra layers can be set as identity mappings, the deeper network would represent the same function as the shallower counterpart. It is hypothesized that the optimizer is not able to approach identity mappings for the parameterized layers.
In a multi-layer neural network model, consider a subnetwork with a certain number (e.g., 2 or 3) of stacked layers. Denote the underlying function performed by this subnetwork as , where is the input to this subnetwork.
The idea of "Residual Learning" re-parameterizes this subnetwork and lets the parameter layers represent a residual function .