Labeled data is a group of samples that have been tagged with one or more labels. Labeling typically takes a set of unlabeled data and augments each piece of it with informative tags. For example, a data label might indicate whether a photo contains a horse or a cow, which words were uttered in an audio recording, what type of action is being performed in a video, what the topic of a news article is, what the overall sentiment of a tweet is, or whether a dot in an X-ray is a tumor.
Labels can be obtained by asking humans to make judgments about a given piece of unlabeled data. Labeled data is significantly more expensive to obtain than the raw unlabeled data.
In 2006 Fei-Fei Li, the co-director of the Stanford Human-Centered AI Institute, set out to improve the artificial intelligence models and algorithms for image recognition by significantly enlarging the training data. The researchers downloaded millions of images from the World Wide Web and a team of undergraduates started to apply labels for objects to each image. In 2007 Li outsourced the data labelling work on Amazon Mechanical Turk, an online marketplace for digital piece work. The 3.2 million images that were labelled by more than 49,000 workers formed the basis for , one of the largest hand-labeled database for outline of object recognition.
After obtaining a labeled dataset, machine learning models can be applied to the data so that new unlabeled data can be presented to the model and a likely label can be guessed or predicted for that piece of unlabeled data.
Algorithmic decision-making is subject to programmer-driven bias as well as data-driven bias. Training data that relies on bias labeled data will result in prejudices and omissions in a predictive model, despite the machine learning algorithm being legitimate. The labelled data used to train a specific machine learning algorithm needs to be a statistically representative sample to not bias the results. Because the labeled data available to train facial recognition systems has not been representative of a population, underrepresented groups in the labeled data are later often misclassified.
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
This seminar course covers principles and recent advancements in machine learning methods that have the ability to solve multiple tasks and generalize to new domains in which training and test distrib
Deep learning offers potential to transform biomedical research. In this course, we will cover recent deep learning methods and learn how to apply these methods to problems in biomedical domain.
In this course, students learn to design and master algorithms and core concepts related to inference and learning from data and the foundations of adaptation and learning theories with applications.
L'apprentissage profond ou apprentissage en profondeur (en anglais : deep learning, deep structured learning, hierarchical learning) est un sous-domaine de l’intelligence artificielle qui utilise des réseaux neuronaux pour résoudre des tâches complexes grâce à des architectures articulées de différentes transformations non linéaires. Ces techniques ont permis des progrès importants et rapides dans les domaines de l'analyse du signal sonore ou visuel et notamment de la reconnaissance faciale, de la reconnaissance vocale, de la vision par ordinateur, du traitement automatisé du langage.
thumb|Reconnaissance de forme à partir de modélisation en 3D La reconnaissance de formes (ou parfois reconnaissance de motifs) est un ensemble de techniques et méthodes visant à identifier des régularités informatiques à partir de données brutes afin de prendre une décision dépendant de la catégorie attribuée à ce motif. On considère que c'est une branche de l'intelligence artificielle qui fait largement appel aux techniques d'apprentissage automatique et aux statistiques.
L'apprentissage automatique (en anglais : machine learning, « apprentissage machine »), apprentissage artificiel ou apprentissage statistique est un champ d'étude de l'intelligence artificielle qui se fonde sur des approches mathématiques et statistiques pour donner aux ordinateurs la capacité d'« apprendre » à partir de données, c'est-à-dire d'améliorer leurs performances à résoudre des tâches sans être explicitement programmés pour chacune. Plus largement, il concerne la conception, l'analyse, l'optimisation, le développement et l'implémentation de telles méthodes.
Couvre les bases de l'apprentissage automatique pour les physiciens et les chimistes, en mettant l'accent sur la classification des images et l'étiquetage des ensembles de données.
Explore l'apprentissage autosupervisé pour les véhicules autonomes, en dérivant des étiquettes de données elles-mêmes et en discutant de ses applications et de ses défis.
Informative sample selection in an active learning (AL) setting helps a machine learning system attain optimum performance with minimum labeled samples, thus reducing annotation costs and boosting performance of computer-aided diagnosis systems in the pres ...
Amsterdam2024
Modern neuroscience research is generating increasingly large datasets, from recording thousands of neurons over long timescales to behavioral recordings of animals spanning weeks, months, or even years. Despite a great variety in recording setups and expe ...
Supervised machine learning models are receiving increasing attention in electricity theft detection due to their high detection accuracy. However, their performance depends on a massive amount of labeled training data, which comes from time-consuming and ...