**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Concept# Nearest neighbor search

Summary

Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values.
Formally, the nearest-neighbor (NN) search problem is defined as follows: given a set S of points in a space M and a query point q ∈ M, find the closest point in S to q. Donald Knuth in vol. 3 of The Art of Computer Programming (1973) called it the post-office problem, referring to an application of assigning to a residence the nearest post office. A direct generalization of this problem is a k-NN search, where we need to find the k closest points.
Most commonly M is a metric space and dissimilarity is expressed as a distance metric, which is symmetric and satisfies the triangle inequality. Even more common, M is taken to be the d-dimensional vector space where dissimilarity is measured using the Euclidean distance, Manhattan distance or other distance metric. However, the dissimilarity function can be arbitrary. One example is asymmetric Bregman divergence, for which the triangle inequality does not hold.
The nearest neighbour search problem arises in numerous fields of application, including:
Pattern recognition – in particular for optical character recognition
Statistical classification – see k-nearest neighbor algorithm
Computer vision – for point cloud registration
Computational geometry – see Closest pair of points problem
Cryptanalysis – for lattice problem
Databases – e.g.
Coding theory – see maximum likelihood decoding
Semantic Search
Data compression – see MPEG-2 standard
Robotic sensing
Recommendation systems, e.g. see Collaborative filtering
Internet marketing – see contextual advertising and behavioral targeting
DNA sequencing
Spell checking – suggesting correct spelling
Plagiarism detection
Similarity scores for predicting career paths of professional athletes.

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (167)

Related people (32)

Related units (6)

Related concepts (16)

Related courses (20)

Related lectures (63)

In this course we will define rigorous mathematical models for computing on large datasets, cover main algorithmic techniques that have been developed for sublinear (e.g. faster than linear time) data

This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple

A course on statistical machine learning for supervised and unsupervised learning

In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller than the universe of possible input items.) Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized.

In computer science, a k-d tree (short for k-dimensional tree) is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbor searches) and creating point clouds. k-d trees are a special case of binary space partitioning trees. The k-d tree is a binary tree in which every node is a k-dimensional point.

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression. In both cases, the input consists of the k closest training examples in a data set. The output depends on whether k-NN is used for classification or regression: In k-NN classification, the output is a class membership.

Multiclass Classification

Covers the concept of multiclass classification and the challenges of linearly separating data with multiple classes.

Shared-Work Systems: Optimization and Execution Scalability

Explores scalability challenges in shared-work systems, emphasizing optimization and execution, experimental setups, data-query operators, and the impact of schema on learning.

Feature Expansion and Kernel Methods

Explores feature expansion, kernel methods, SVM, and nonlinear classification in machine learning.

Alfio Quarteroni, Francesco Regazzoni, Stefano Pagani

Predicting the evolution of systems with spatio-temporal dynamics in response to external stimuli is essential for scientific progress. Traditional equations-based approaches leverage first principles through the numerical approximation of differential equ ...

Mathieu Salzmann, Zheng Dang, Yu Guo

In this work, we tackle the task of estimating the 6D pose of an object from point cloud data. While recent learning-based approaches have shown remarkable success on synthetic datasets, we have observed them to fail in the presence of real-world data. We ...

Christoph Koch, Peter Lindner, Zhekai Jiang, Sachin Basil John

We present Sudokube, a novel system that supports interactive speed querying on high-dimensional data using partially materialized data cubes. Given a storage budget, it judiciously chooses what projections to precompute and materialize during cube constru ...