Feature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Stylometry and DNA microarray analysis are two cases where feature selection is used. It should be distinguished from feature extraction. Feature selection techniques are used for several reasons: simplification of models to make them easier to interpret by researchers/users, shorter training times, to avoid the curse of dimensionality, improve data's compatibility with a learning model class, encode inherent symmetries present in the input space. The central premise when using a feature selection technique is that the data contains some features that are either redundant or irrelevant, and can thus be removed without incurring much loss of information. Redundant and irrelevant are two distinct notions, since one relevant feature may be redundant in the presence of another relevant feature with which it is strongly correlated. Feature extraction creates new features from functions of the original features, whereas feature selection returns a subset of the features. Feature selection techniques are often used in domains where there are many features and comparatively few samples (or data points). A feature selection algorithm can be seen as the combination of a search technique for proposing new feature subsets, along with an evaluation measure which scores the different feature subsets. The simplest algorithm is to test each possible subset of features finding the one which minimizes the error rate. This is an exhaustive search of the space, and is computationally intractable for all but the smallest of feature sets. The choice of evaluation metric heavily influences the algorithm, and it is these evaluation metrics which distinguish between the three main categories of feature selection algorithms: wrappers, filters and embedded methods. Wrapper methods use a predictive model to score feature subsets. Each new subset is used to train a model, which is tested on a hold-out set.

À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Cours associés (30)
EE-613: Machine Learning for Engineers
The objective of this course is to give an overview of machine learning techniques used for real-world applications, and to teach how to implement and use them in practice. Laboratories will be done i
CH-242(b): Statistical thermodynamics
The course covers two topics: an introduction to interfacial chemistry, and statistical thermodynamics. The second part includes concepts like the Boltzmann distribution law, partition functions, ense
DH-406: Machine learning for DH
This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple
Afficher plus
Séances de cours associées (157)
Thermodynamique statistique : densité des états
Explore la densité des états en thermodynamique statistique et l'utilisation des fonctions heavisides pour les probabilités de niveau d'énergie.
Thermodynamique statistique : particules et niveaux
Couvre la thermodynamique statistique des particules et des niveaux, des permutations et de l'entropie dans l'élasticité des bandes de caoutchouc.
Thermodynamique statistique : Fonction de partition et Approximation de Stirling
Explore la fonction de partition et l'approximation de Stirling en thermodynamique statistique, en soulignant l'importance de reconnaître les termes d'ordre supérieur.
Afficher plus
Publications associées (543)

Reducing Annotation Efforts in Electricity Theft Detection Through Optimal Sample Selection

Wenlong Liao, Zhe Yang

Supervised machine learning models are receiving increasing attention in electricity theft detection due to their high detection accuracy. However, their performance depends on a massive amount of labeled training data, which comes from time-consuming and ...
Piscataway2024

GANDALF: Graph-based transformer and Data Augmentation Active Learning Framework with interpretable features for multi-label chest Xray classification

Informative sample selection in an active learning (AL) setting helps a machine learning system attain optimum performance with minimum labeled samples, thus reducing annotation costs and boosting performance of computer-aided diagnosis systems in the pres ...
Amsterdam2024

Euclid preparation XXXVII. Galaxy colour selections with Euclid and ground photometry for cluster weak-lensing analyses

Frédéric Courbin, Georges Meylan, Gianluca Castignani, Maurizio Martinelli, Matthias Wiesmann, Yi Wang, Richard Massey, Fabio Finelli, Marcello Farina

Aims. We derived galaxy colour selections from Euclid and ground-based photometry, aiming to accurately define background galaxy samples in cluster weak-lensing analyses. These selections have been implemented in the Euclid data analysis pipelines for gala ...
Edp Sciences S A2024
Afficher plus
Concepts associés (23)
Feature (machine learning)
In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern recognition, classification and regression. Features are usually numeric, but structural features such as strings and graphs are used in syntactic pattern recognition. The concept of "feature" is related to that of explanatory variable used in statistical techniques such as linear regression.
Méthode des k plus proches voisins
En intelligence artificielle, plus précisément en apprentissage automatique, la méthode des k plus proches voisins est une méthode d’apprentissage supervisé. En abrégé KPPV ou k-PPV en français, ou plus fréquemment k-NN ou KNN, de l'anglais k-nearest neighbors. Dans ce cadre, on dispose d’une base de données d'apprentissage constituée de N couples « entrée-sortie ». Pour estimer la sortie associée à une nouvelle entrée x, la méthode des k plus proches voisins consiste à prendre en compte (de façon identique) les k échantillons d'apprentissage dont l’entrée est la plus proche de la nouvelle entrée x, selon une distance à définir.
Arbre de décision (apprentissage)
L’apprentissage par arbre de décision désigne une méthode basée sur l'utilisation d'un arbre de décision comme modèle prédictif. On l'utilise notamment en fouille de données et en apprentissage automatique. Dans ces structures d'arbre, les feuilles représentent les valeurs de la variable-cible et les embranchements correspondent à des combinaisons de variables d'entrée qui mènent à ces valeurs. En analyse de décision, un arbre de décision peut être utilisé pour représenter de manière explicite les décisions réalisées et les processus qui les amènent.
Afficher plus