**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Lecture# Topic Models

Description

This lecture covers the basics of topic models, starting with clustering and density estimation, then delving into Gaussian Mixture Models (GMM) and the Latent Dirichlet Allocation (LDA) model. The instructor explains the algorithms, learning processes, and limitations of GMM and LDA, as well as the concepts of Dirichlet distribution and variational inference. The lecture concludes with extensions to the LDA model and the application of variational inference to Gaussian Mixture Models.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

In course

Instructor

DH-406: Machine learning for DH

This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple

Related concepts (224)

Latent Dirichlet allocation

In natural language processing, Latent Dirichlet Allocation (LDA) is a Bayesian network (and, therefore, a generative statistical model) that explains a set of observations through unobserved groups, and each group explains why some parts of the data are similar. The LDA is an example of a Bayesian topic model. In this, observations (e.g., words) are collected into documents, and each word's presence is attributable to one of the document's topics. Each document will contain a small number of topics.

Topic model

In statistics and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats, and "the" and "is" will appear approximately equally in both.

Probability distribution

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events (subsets of the sample space). For instance, if X is used to denote the outcome of a coin toss ("the experiment"), then the probability distribution of X would take the value 0.5 (1 in 2 or 1/2) for X = heads, and 0.

Dirichlet-multinomial distribution

In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribution (DCM) or multivariate Pólya distribution (after George Pólya). It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector , and an observation drawn from a multinomial distribution with probability vector p and number of trials n.

Mixture model

In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population.

Related lectures (425)

Topic Models: Latent Dirichlet AllocationDH-406: Machine learning for DH

Introduces Latent Dirichlet Allocation for topic modeling in documents, discussing its process, applications, and limitations.

Topic Models: Understanding Latent StructuresDH-406: Machine learning for DH

Explores topic models, Gaussian mixture models, Latent Dirichlet Allocation, and variational inference in understanding latent structures within data.

Gaussian Mixture Models: Data ClassificationCOM-500: Statistical signal and data processing through applications

Explores denoising signals with Gaussian mixture models and EM algorithm, EMG signal analysis, and image segmentation using Markovian models.

Quantum Random Number GenerationPHYS-758: Advanced Course on Quantum Communication

Explores quantum random number generation, discussing the challenges and implementations of generating good randomness using quantum devices.

Probability and StatisticsMATH-232: Probability and statistics

Covers p-quantile, normal approximation, joint distributions, and exponential families in probability and statistics.