Résumé
In machine learning, diffusion models, also known as diffusion probabilistic models or score-based generative models, are a class of latent variable models. They are Markov chains trained using variational inference. The goal of diffusion models is to learn the latent structure of a dataset by modeling the way in which data points diffuse through the latent space. In computer vision, this means that a neural network is trained to denoise images blurred with Gaussian noise by learning to reverse the diffusion process. It mainly consists of three major components: the forward process, the reverse process, and the sampling procedure. Three examples of generic diffusion modeling frameworks used in computer vision are denoising diffusion probabilistic models, noise conditioned score networks, and stochastic differential equations. Diffusion models were introduced in 2015 with a motivation from non-equilibrium thermodynamics. Diffusion models can be applied to a variety of tasks, including , inpainting, super-resolution, and . For example, an image generation model would start with a random noise image and then, after having been trained reversing the diffusion process on natural images, the model would be able to generate new natural images. Announced on 13 April 2022, OpenAI's text-to-image model DALL-E 2 is a recent example. It uses diffusion models for both the model's prior (which produces an image embedding given a text caption) and the decoder that generates the final image. Consider the problem of image generation. Let represent an image, and let be the probability distribution over all possible images. If we have itself, then we can say for certain how likely a certain image is. However, this is intractable in general. Most often, we are uninterested in knowing the absolute probability that a certain image is — when, if ever, are we interested in how likely an image is in the space of all possible images? Instead, we are usually only interested in knowing how likely a certain image is compared to its immediate neighbors — how more likely is this image of cat, compared to some small variants of it? Is it more likely if the image contains two whiskers, or three, or with some Gaussian noise added? Consequently, we are actually quite uninterested in itself, but rather, .
À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.