**Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?**

Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur GraphSearch.

Publication# Mixture models for unsupervised and supervised learning

Thèse EPFL

Résumé

In a society which produces and consumes an ever increasing amount of information, methods which can make sense out of al1 this data become of crucial importance. Machine learning tries to develop models which can make the information load accessible. Three important questions one can ask when constructing such models are: What is the structure of the data? This is especially relevant for high-dimensional data which cannot be visualized anymore. Which features are most characteristic? How to predict whether a pattern belongs to one class or to another? This thesis investigates these three questions by trying to construct complex models from simple ones. The decomposition into simpler parts can also be found in the methods used for estimating the parameter values of these models. The algorithms for the simple models constitute the core of the algorithms for the complex ones. The above questions are addressed in three stages: Unsupervised learning This part deals with the problem of probability density estimation with the goal of finding a good probabilistic representation of the data. One of the most popular density estimation methods is the Gaussian mixture model (GMM). A promising alternative to GMMs are the recently proposed mixtures of latent variable models. Examples of the latter are principal component analysis (PCA) and factor analysis. The advantage of these models is that they are capable of representing the covariance structure with less parameters by choosing the dimension of a subspace in a suitable way. An empirical evaluation on a large number of data sets shows that mixtures of latent variable models almost always outperform GMMs. To avoid having to choose a value for the dimension of the subspace by a computationally expensive search technique such as cross-validation, a Bayesian treatment of mixtures of latent variable models is proposed. This framework makes it possible to determine the appropriate dimension during training and experiments illustrate its viability. Feature extraction PCA is also (and foremost) a classic method for feature extraction. However, PCA is limited to linear feature extraction by a projection onto a subspace. Kernel PCA is a recent method which allows non-linear feature extraction. Applying kernel PCA to a data set with N patterns requires finding the eigenvectors of an N×N matrix. An Expectation-Maximization (EM) algorithm for PCA which does not need to store this matrix is adapted to kernel PCA and applied to large data sets with more than 10,000 examples. The experiments confirm that this approach is feasible and that the extracted features lead to good performance when used as pre-processed data for a linear classifier. A new on-line variant of the EM algorithm for PCA is presented and shown to speed up the learning process. Supervised learning This part illustrates two ways of constructing complex models from simple ones for classification problems. The first approach is inspired by unsupervised mixture models and extends them to supervised learning. The resulting model, called a mixture of experts, tries to decompose a complex problem into subproblems treated by several simpler models. The division of the data space is effectuated by an input-dependent gating network. After a review of the model and existing training methods, different possible gating networks are proposed and compared. Unsupervised mixture models are one of the evaluated options. The experiments show that a standard mixture of experts with a neural network gate gives the best results. The second approach is a constructive algorithm called boosting which creates a committee of simple models by emphasizing patterns which have been frequently misclassified by the preceding classifiers. A new model has been developed which lies between a mixture of experts and a boosted committee. It adds an input-dependent combiner (like a gating network) to standard boosting. This has the advantage that with a considerably smaller committee results are obtained which are comparable to those of boosting. Finally, some of the investigated models have been evaluated on two problems of machine vision. The results confirm the potential of mixtures of latent variable models which lead to good performance when incorporated in a Bayes classifier.

Official source

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Concepts associés

Chargement

Publications associées

Chargement

Concepts associés (33)

Publications associées (132)

Modèle de mélange

En statistiques, un modèle de mélange est un modèle statistique permettant de modéliser différentes sous-populations dans la population globale sans que ces sous-populations soient identifiées dans

Apprentissage

L’apprentissage est un ensemble de mécanismes menant à l'acquisition de savoir-faire, de savoirs ou de connaissances. L'acteur de l'apprentissage est appelé apprenant. On peut opposer l'apprentissag

Résolution de problème

vignette|Résolution d'un problème mathématique.
La résolution de problème est le processus d'identification puis de mise en œuvre d'une solution à un problème.
Méthodologie
Dans l'ind

Chargement

Chargement

Chargement

In a society which produces and consumes an ever increasing amount of information, methods which can make sense out of all this data become of crucial importance. Machine learning tries to develop models which can make the information load accessible. Three important questions one can ask when constructing such models are: - What is the structure of the data? This is especially relevant for high-dimensional data which cannot be visualized anymore. - Which features are most characteristic? -How to predict whether a pattern belongs to one class or to another? This thesis investigates these three questions by trying to construct complex models from simple ones. The decomposition into simpler parts can also be found in the methods used for estimating the parameter values of these models. The algorithms for the simple models constitute the core of the algorithms for the complex ones. The above questions are addressed in three stages: Unsupervised learning: This part deals with the problem of probability density estimation with the goal of finding a good probabilistic representation of the data. One of the most popular density estimation methods is the Gaussian mixture model (GMM). A promising alternative to GMMs are the recently proposed mixtures of latent variable models. Examples of the latter are principal component analysis (PCA) and factor analysis. The advantage of these models is that they are capable of representing the covariance structure with less parameters by choosing the dimension of a subspace in a suitable way. An empirical evaluation on a large number of data sets shows that mixtures of latent variable models almost always outperform GMMs. To avoid having to choose a value for the dimension of the subspace by a computationally expensive search technique such as cross-validation, a Bayesian treatment of mixtures of latent variable models is proposed. This framework makes it possible to determine the appropriate dimension during training and experiments illustrate its viability. Feature extraction: PCA is also (and foremost) a classic method for feature extraction. However, PCA is limited to linear feature extraction by a projection onto a subspace. Kernel PCA is a recent method which allows non-linear feature extraction. Applying kernel PCA to a data set with N patterns requires finding the eigenvectors of an N*N matrix. An Expectation-Maximization (EM) algorithm for PCA which does not need to store this matrix is adapted to kernel PCA and applied to large data sets with more than 10,000 examples. The experiments confirm that this approach is feasible and that the extracted features lead to good performance when used as pre-processed data for a linear classifier. A new on-line variant of the EM algorithm for PCA is presented and shown to speed up the learning process. Supervised learning: This part illustrates two ways of constructing complex models from simple ones for classification problems. The first approach is inspired by unsupervised mixture models and extends them to supervised learning. The resulting model, called a mixture of experts, tries to decompose a complex problem into subproblems treated by several simpler models. The division of the data space is effectuated by an input-dependent gating network. After a review of the model and existing training methods, different possible gating networks are proposed and compared. Unsupervised mixture models are one of the evaluated options. The experiments show that a standard mixture of experts with a neural network gate gives the best results. The second approach is a constructive algorithm called boosting which creates a committee of simple models by emphasizing patterns which have been frequently misclassified by the preceding classifiers. A new model has been developed which lies between a mixture of experts and a boosted committee. It adds an input-dependent combiner (like a gating network) to standard boosting. This has the advantage that with a considerably smaller committee results are obtained which are comparable to those of boosting. Finally, some of the investigated models have been evaluated on two problems of machine vision. The results confirm the potential of mixtures of latent variable models which lead to good performance when incorporated in a Bayes classifier.

In a society which produces and consumes an ever increasing amount of information, methods which can make sense out of all this data become of crucial importance. Machine learning tries to develop models which can make the information load accessible. Three important questions one can ask when constructing such models are: - What is the structure of the data? This is especially relevant for high-dimensional data which cannot be visualized anymore. - Which features are most characteristic? -How to predict whether a pattern belongs to one class or to another? This thesis investigates these three questions by trying to construct complex models from simple ones. The decomposition into simpler parts can also be found in the methods used for estimating the parameter values of these models. The algorithms for the simple models constitute the core of the algorithms for the complex ones. The above questions are addressed in three stages: Unsupervised learning: This part deals with the problem of probability density estimation with the goal of finding a good probabilistic representation of the data. One of the most popular density estimation methods is the Gaussian mixture model (GMM). A promising alternative to GMMs are the recently proposed mixtures of latent variable models. Examples of the latter are principal component analysis (PCA) and factor analysis. The advantage of these models is that they are capable of representing the covariance structure with less parameters by choosing the dimension of a subspace in a suitable way. An empirical evaluation on a large number of data sets shows that mixtures of latent variable models almost always outperform GMMs. To avoid having to choose a value for the dimension of the subspace by a computationally expensive search technique such as cross-validation, a Bayesian treatment of mixtures of latent variable models is proposed. This framework makes it possible to determine the appropriate dimension during training and experiments illustrate its viability. Feature extraction: PCA is also (and foremost) a classic method for feature extraction. However, PCA is limited to linear feature extraction by a projection onto a subspace. Kernel PCA is a recent method which allows non-linear feature extraction. Applying kernel PCA to a data set with N patterns requires finding the eigenvectors of an N*N matrix. An Expectation-Maximization (EM) algorithm for PCA which does not need to store this matrix is adapted to kernel PCA and applied to large data sets with more than 10,000 examples. The experiments confirm that this approach is feasible and that the extracted features lead to good performance when used as pre-processed data for a linear classifier. A new on-line variant of the EM algorithm for PCA is presented and shown to speed up the learning process. Supervised learning: This part illustrates two ways of constructing complex models from simple ones for classification problems. The first approach is inspired by unsupervised mixture models and extends them to supervised learning. The resulting model, called a mixture of experts, tries to decompose a complex problem into subproblems treated by several simpler models. The division of the data space is effectuated by an input-dependent gating network. After a review of the model and existing training methods, different possible gating networks are proposed and compared. Unsupervised mixture models are one of the evaluated options. The experiments show that a standard mixture of experts with a neural network gate gives the best results. The second approach is a constructive algorithm called boosting which creates a committee of simple models by emphasizing patterns which have been frequently misclassified by the preceding classifiers. A new model has been developed which lies between a mixture of experts and a boosted committee. It adds an input-dependent combiner (like a gating network) to standard boosting. This has the advantage that with a considerably smaller committee results are obtained which are comparable to those of boosting. Finally, some of the investigated models have been evaluated on two problems of machine vision. The results confirm the potential of mixtures of latent variable models which lead to good performance when incorporated in a Bayes classifier.

This paper deals with the problem of probability density estimation with the goal of finding a good probabilistic representation of the data. One of the most popular density estimation methods is the Gaussian mixture model (GMM). A promising alternative to GMMs are the recently proposed mixtures of latent variable models. Examples of the latter are principal component analysis and factor analysis. The advantage of these models is that they are capable of representing the covariance structure with less parameters by choosing the dimension of a subspace in a suitable way. An empirical evaluation on a large number of data sets shows that mixtures of latent variable models almost always outperform various GMMs both in density estimation and Bayes classifiers. To avoid having to choose a value for the dimension of the latent subspace by a computationally expensive search technique such as cross-validation, a Bayesian treatment of mixtures of latent variable models is proposed. This framework makes it possible to determine the appropriate dimension during training and experiments illustrate its viability.