Summary
In machine learning, diffusion models, also known as diffusion probabilistic models or score-based generative models, are a class of latent variable models. They are Markov chains trained using variational inference. The goal of diffusion models is to learn the latent structure of a dataset by modeling the way in which data points diffuse through the latent space. In computer vision, this means that a neural network is trained to denoise images blurred with Gaussian noise by learning to reverse the diffusion process. It mainly consists of three major components: the forward process, the reverse process, and the sampling procedure. Three examples of generic diffusion modeling frameworks used in computer vision are denoising diffusion probabilistic models, noise conditioned score networks, and stochastic differential equations. Diffusion models were introduced in 2015 with a motivation from non-equilibrium thermodynamics. Diffusion models can be applied to a variety of tasks, including , inpainting, super-resolution, and . For example, an image generation model would start with a random noise image and then, after having been trained reversing the diffusion process on natural images, the model would be able to generate new natural images. Announced on 13 April 2022, OpenAI's text-to-image model DALL-E 2 is a recent example. It uses diffusion models for both the model's prior (which produces an image embedding given a text caption) and the decoder that generates the final image. Consider the problem of image generation. Let represent an image, and let be the probability distribution over all possible images. If we have itself, then we can say for certain how likely a certain image is. However, this is intractable in general. Most often, we are uninterested in knowing the absolute probability that a certain image is — when, if ever, are we interested in how likely an image is in the space of all possible images? Instead, we are usually only interested in knowing how likely a certain image is compared to its immediate neighbors — how more likely is this image of cat, compared to some small variants of it? Is it more likely if the image contains two whiskers, or three, or with some Gaussian noise added? Consequently, we are actually quite uninterested in itself, but rather, .
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (6)
CS-552: Modern natural language processing
Natural language processing is ubiquitous in modern intelligent technologies, serving as a foundation for language translators, virtual assistants, search engines, and many more. In this course, stude
EE-608: Deep Learning For Natural Language Processing
The Deep Learning for NLP course provides an overview of neural network based methods applied to text. The focus is on models particularly suited to the properties of human language, such as categori
DH-406: Machine learning for DH
This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple
Show more
Related lectures (37)
Variational Auto-Encoders and NVIB
Explores Variational Auto-Encoders, Bayesian inference, attention-based latent spaces, and the effectiveness of Transformers in language processing.
Deep Learning for NLP
Introduces deep learning concepts for NLP, covering word embeddings, RNNs, and Transformers, emphasizing self-attention and multi-headed attention.
Model Finetuning for NLI
Covers the second assignment for the CS-552: Modern NLP course, focusing on transfer learning and data augmentation.
Show more
Related publications (52)
Related concepts (2)
Transformer (machine learning model)
A transformer is a deep learning architecture that relies on the parallel multi-head attention mechanism. The modern transformer was proposed in the 2017 paper titled 'Attention Is All You Need' by Ashish Vaswani et al., Google Brain team. It is notable for requiring less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models on large (language) datasets, such as the Wikipedia corpus and Common Crawl, by virtue of the parallelized processing of input sequence.
OpenAI
OpenAI is an American artificial intelligence (AI) research laboratory consisting of the non-profit OpenAI, Inc. and its for-profit subsidiary corporation OpenAI, L.P.. OpenAI conducts research on artificial intelligence with the declared intention of developing "safe and beneficial" artificial general intelligence, which it defines as "highly autonomous systems that outperform humans at most economically valuable work".