**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Concept# Residual neural network

Summary

A Residual Neural Network (a.k.a. Residual Network, ResNet) is a deep learning model in which the weight layers learn residual functions with reference to the layer inputs. A Residual Network is a network with skip connections that perform identity mappings, merged with the layer outputs by addition. It behaves like a Highway Network whose gates are opened through strongly positive bias weights. This enables deep learning models with tens or hundreds of layers to train easily and approach better accuracy when going deeper. The identity skip connections, often referred to as "residual connections", are also used in the 1997 LSTM networks, Transformer models (e.g., BERT, GPT models such as ChatGPT), the AlphaGo Zero system, the AlphaStar system, and the AlphaFold system.
Residual Networks were developed by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, which won the 2015 competition.
The AlexNet model developed in 2012 for ImageNet was an 8-layer convolutional neural network.
The neural networks developed in 2014 by the Visual Geometry Group (VGG) at the University of Oxford approached a depth of 19 layers by stacking 3-by-3 convolutional layers.
But stacking more layers led to a quick reduction in training accuracy, which is referred to as the "degradation" problem.
A deeper network should not produce a higher training loss than its shallower counterpart, if this deeper network can be constructed by its shallower counterpart stacked with extra layers. If the extra layers can be set as identity mappings, the deeper network would represent the same function as the shallower counterpart. It is hypothesized that the optimizer is not able to approach identity mappings for the parameterized layers.
In a multi-layer neural network model, consider a subnetwork with a certain number (e.g., 2 or 3) of stacked layers. Denote the underlying function performed by this subnetwork as , where is the input to this subnetwork.
The idea of "Residual Learning" re-parameterizes this subnetwork and lets the parameter layers represent a residual function .

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related people (1)

Related concepts (11)

Related courses (49)

Related lectures (411)

Transformer (machine learning model)

A transformer is a deep learning architecture that relies on the parallel multi-head attention mechanism. The modern transformer was proposed in the 2017 paper titled 'Attention Is All You Need' by Ashish Vaswani et al., Google Brain team. It is notable for requiring less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models on large (language) datasets, such as the Wikipedia corpus and Common Crawl, by virtue of the parallelized processing of input sequence.

Residual neural network

A Residual Neural Network (a.k.a. Residual Network, ResNet) is a deep learning model in which the weight layers learn residual functions with reference to the layer inputs. A Residual Network is a network with skip connections that perform identity mappings, merged with the layer outputs by addition. It behaves like a Highway Network whose gates are opened through strongly positive bias weights. This enables deep learning models with tens or hundreds of layers to train easily and approach better accuracy when going deeper.

Vanishing gradient problem

In machine learning, the vanishing gradient problem is encountered when training artificial neural networks with gradient-based learning methods and backpropagation. In such methods, during each iteration of training each of the neural networks weights receives an update proportional to the partial derivative of the error function with respect to the current weight. The problem is that in some cases, the gradient will be vanishingly small, effectively preventing the weight from changing its value.

DH-406: Machine learning for DH

This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple

EE-311: Fundamentals of machine learning

Ce cours présente une vue générale des techniques d'apprentissage automatique, passant en revue les algorithmes, le formalisme théorique et les protocoles expérimentaux.

PHYS-467: Machine learning for physicists

Machine learning and data analysis are becoming increasingly central in sciences including physics. In this course, fundamental principles and methods of machine learning will be introduced and practi

Neural Networks: Deep Neural NetworksMATH-212: Analyse numérique et optimisation

Explores the basics of neural networks, with a focus on deep neural networks and their architecture and training.

Neural Networks: Two Layers Neural NetworkMATH-212: Analyse numérique et optimisation

Covers the basics of neural networks, focusing on the development from two layers neural networks to deep neural networks.

Kernel Methods: Neural NetworksPHYS-467: Machine learning for physicists

Covers the fundamentals of neural networks, focusing on RBF kernels and SVM.