Cluster-and-Conquer: When Randomness Meets Graph Locality

K-Nearest-Neighbors (KNN) graphs are central to many emblematic data mining and machine-learning applications. Some of the most efficient KNN graph algorithms are incremental and local: they start from a random graph, which they incrementally improve by traversing neighbors-of-neighbors links. Unfortunately, the initial random graph exhibits a poor graph locality, leading to many unnecessary similarity computations. In this paper, we remove this drawback with Cluster-and-Conquer (C-2 for short). Cluster-and-Conquer boosts the starting configuration of greedy algorithms thanks to a novel lightweight clustering mechanism, dubbed FastRandomHash. FastRandomHash leverages randomness and recursion to precluster similar nodes at a very low cost. Our extensive evaluation on real datasets shows that Cluster-and-Conquer significantly outperforms existing approaches, including LSH, yielding speedups of up to x4.42 and even improving the KNN quality.

Chattez avec Graph Search

Posez n’importe quelle question sur les cours, conférences, exercices, recherches, actualités, etc. de l’EPFL ou essayez les exemples de questions ci-dessous.

AVERTISSEMENT : Le chatbot Graph n'est pas programmé pour fournir des réponses explicites ou catégoriques à vos questions. Il transforme plutôt vos questions en demandes API qui sont distribuées aux différents services informatiques officiellement administrés par l'EPFL. Son but est uniquement de collecter et de recommander des références pertinentes à des contenus que vous pouvez explorer pour vous aider à répondre à vos questions.

Cluster-and-Conquer: When Randomness Meets Graph Locality

Graph Chatbot

Chattez avec Graph Search

Sketches, metrics and fast algorithms

The Role of Adaptivity in Source Identification with Time Queries

Quantization for Distributed Processing and Learning of Structured Data

Sketches, metrics and fast algorithms

The Role of Adaptivity in Source Identification with Time Queries

Quantization for Distributed Processing and Learning of Structured Data