**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.

Lecture# Document Analysis: Topic Modeling

Description

This lecture covers various techniques for document analysis, focusing on topic modeling using mixtures of multinomials and Latent Dirichlet Allocation (LDA). It explains how these models generate new documents and discusses deep generative models, autoencoders, and their role as generative models. The lecture also introduces the concept of Variational Autoencoders (VAE) and Generative Adversarial Networks (GANs) for generating data samples. Additionally, it addresses the challenges posed by heterogeneous data and the importance of model selection and cross-validation in machine learning.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

In course

Instructor

Related concepts (414)

DH-406: Machine learning for DH

This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple

Artificial neural network

Artificial neural networks (ANNs, also shortened to neural networks (NNs) or neural nets) are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains. An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons.

Deep learning

Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Convolutional neural network

Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters (or kernel) optimization. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer 10,000 weights would be required for processing an image sized 100 × 100 pixels.

Nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa) itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

Support vector machine

In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories by Vladimir Vapnik with colleagues (Boser et al., 1992, Guyon et al., 1993, Cortes and Vapnik, 1995, Vapnik et al., 1997) SVMs are one of the most robust prediction methods, being based on statistical learning frameworks or VC theory proposed by Vapnik (1982, 1995) and Chervonenkis (1974).

Related lectures (1,000)

Introduction to Machine Learning: Supervised LearningCS-233(a): Introduction to machine learning (BA3)

Introduces supervised learning, covering classification, regression, model optimization, overfitting, and kernel methods.

Neural Networks: Training and ActivationCIVIL-226: Introduction to machine learning for engineers

Explores neural networks, activation functions, backpropagation, and PyTorch implementation.

Machine Learning FundamentalsDH-406: Machine learning for DH

Introduces fundamental machine learning concepts, covering regression, classification, dimensionality reduction, and deep generative models.

Data Representation: PCADH-406: Machine learning for DH

Covers data representation using PCA for dimensionality reduction, focusing on signal preservation and noise removal.

Machine Learning ReviewDH-406: Machine learning for DH

Covers a review of machine learning concepts, including supervised learning, classification vs regression, linear models, kernel functions, support vector machines, dimensionality reduction, deep generative models, and cross-validation.