Similarity learning is an area of supervised machine learning in artificial intelligence. It is closely related to regression and classification, but the goal is to learn a similarity function that measures how similar or related two objects are. It has applications in ranking, in recommendation systems, visual identity tracking, face verification, and speaker verification.
There are four common setups for similarity and metric distance learning.
Regression similarity learning
In this setup, pairs of objects are given together with a measure of their similarity . The goal is to learn a function that approximates for every new labeled triplet example . This is typically achieved by minimizing a regularized loss .
Classification similarity learning
Given are pairs of similar objects and non similar objects . An equivalent formulation is that every pair is given together with a binary label that determines if the two objects are similar or not. The goal is again to learn a classifier that can decide if a new pair of objects is similar or not.
Ranking similarity learning
Given are triplets of objects whose relative similarity obey a predefined order: is known to be more similar to than to . The goal is to learn a function such that for any new triplet of objects , it obeys (contrastive learning). This setup assumes a weaker form of supervision than in regression, because instead of providing an exact measure of similarity, one only has to provide the relative order of similarity. For this reason, ranking-based similarity learning is easier to apply in real large-scale applications.
Locality sensitive hashing (LSH)
Hashes input items so that similar items map to the same "buckets" in memory with high probability (the number of buckets being much smaller than the universe of possible input items). It is often applied in nearest neighbor search on large-scale high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases.
A common approach for learning similarity is to model the similarity function as a bilinear form.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Explores model evaluation with K-Nearest Neighbor, covering optimal k selection, similarity metrics, and performance metrics for classification models.
In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression. In both cases, the input consists of the k closest training examples in a data set. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership.
In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such measures are in some sense the inverse of distance metrics: they take on large values for similar objects and either zero or a negative value for very dissimilar objects. Though, in more broad terms, a similarity function may also satisfy metric axioms.
Crop maps are crucial for agricultural monitoring and food management and can additionally support domain-specific applications, such as setting cold supply chain infrastructure in developing countries. Machine learning (ML) models, combined with freely-av ...
Over the last two decades, data-powered machine learning (ML) tools have profoundly transformed numerous scientific fields. In computational chemistry, machine learning applications have permitted faster predictions of chemical properties and provided powe ...
EPFL2022
, , ,
This work studies the class of algorithms for learning with side-information that emerges by extending generative models with embedded context-related variables. Using finite mixture models (FMMs) as the prototypical Bayesian network, we show that maximum- ...