Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable (hard to control or deal with). Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics.
Methods are commonly divided into linear and nonlinear approaches. Approaches can also be divided into feature selection and feature extraction. Dimensionality reduction can be used for noise reduction, data visualization, cluster analysis, or as an intermediate step to facilitate other analyses.
Feature selectionCombinatorial optimization
Feature selection approaches try to find a subset of the input variables (also called features or attributes). The three strategies are: the filter strategy (e.g. information gain), the wrapper strategy (e.g. search guided by accuracy), and the embedded strategy (selected features are added or removed while building the model based on prediction errors).
Data analysis such as regression or classification can be done in the reduced space more accurately than in the original space.
Feature extraction
Feature projection (also called feature extraction) transforms the data from the high-dimensional space to a space of fewer dimensions. The data transformation may be linear, as in principal component analysis (PCA), but many nonlinear dimensionality reduction techniques also exist. For multidimensional data, tensor representation can be used in dimensionality reduction through multilinear subspace learning.
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
This course focuses on software security fundamentals, secure coding guidelines and principles, and advanced software security concepts. Students learn to assess and understand threats, learn how to d
Hands-on introduction to data science and machine learning. We explore recommender systems, generative AI, chatbots, graphs, as well as regression, classification, clustering, dimensionality reduction
This course provides in-depth understanding of the most fundamental algorithms in statistical pattern recognition or machine learning (including Deep Learning) as well as concrete tools (as Python sou
Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa) itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.
L'analyse canonique des corrélations, parfois aussi nommé analyse des corrélations canoniques, (canonical-correlation analysis en anglais) permet de comparer deux groupes de variables quantitatives appliqués tous deux sur les mêmes individus. Le but de l'analyse canonique est de comparer ces deux groupes de variables pour savoir s'ils décrivent un même phénomène, auquel cas on pourra se passer d'un des deux groupes de variables. Un exemple parlant est celui des analyses médicales effectuées sur les mêmes échantillons par deux laboratoires différents.
Le positionnement multidimensionnel est un ensemble de techniques statistiques utilisées dans le domaine de la visualisation d'information pour explorer les similarités dans les données. Le positionnement multidimentionnel est un cas particulier d'analyse multivariée. Typiquement, un algorithme de positionnement multidimensionnel part d'une matrice de similarité entre tous les points pour affecter à chaque point une position dans un espace à dimensions. Pour = 2 ou = 3, les positions peuvent être visualisées sur un plan ou dans un volume par un nuage de points.
Couvre le théorème de Johnson-Lindenstrauss, qui intègre des points de haute dimension dans l'espace de dimension inférieure tout en préservant les distances.
Bayesian Optimization (BO) is typically used to optimize an unknown function f that is noisy and costly to evaluate, by exploiting an acquisition function that must be maximized at each optimization step. Even if provably asymptotically optimal BO algorith ...
The project introduces an innovative visual method for analysing libraries and archives, with a focus on Bibliotheca Hertziana’s library collection. This collection, which dates back over a century, is examined by integrating user loan data with deep mappi ...
2024
, ,
Predicting the evolution of systems with spatio-temporal dynamics in response to external stimuli is essential for scientific progress. Traditional equations-based approaches leverage first principles through the numerical approximation of differential equ ...