**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Concept# DBSCAN

Summary

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996.
It is a density-based clustering non-parametric algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away).
DBSCAN is one of the most common, and most commonly cited, clustering algorithms.
In 2014, the algorithm was awarded the test of time award (an award given to algorithms which have received substantial attention in theory and practice) at the leading data mining conference, ACM SIGKDD. , the follow-up paper "DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN" appears in the list of the 8 most downloaded articles of the prestigious ACM Transactions on Database Systems (TODS) journal.
The popular follow-up HDBSCAN* was initially published by Ricardo J. G. Campello, David Moulavi, and Jörg Sander in 2013, then expanded upon with Arthur Zimek in 2015. It revises some of the original decisions such as the border points and produces a hierarchical instead of a flat result.
In 1972, Robert F. Ling published a closely related algorithm in "The Theory and Construction of k-Clusters" in The Computer Journal with an estimated runtime complexity of O(n3). DBSCAN has a worst-case of O(n2), and the database-oriented range-query formulation of DBSCAN allows for index acceleration. The algorithms slightly differ in their handling of border points.
Consider a set of points in some space to be clustered. Let ε be a parameter specifying the radius of a neighborhood with respect to some point. For the purpose of DBSCAN clustering, the points are classified as core points, (directly-) reachable points and outliers, as follows:
A point p is a core point if at least points are within distance ε of it (including p).

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related courses (13)

CIVIL-226: Introduction to machine learning for engineers

Machine learning is a sub-field of Artificial Intelligence that allows computers to learn from data, identify patterns and make predictions. As a fundamental building block of the Computational Thinki

CS-401: Applied data analysis

This course teaches the basic techniques, methodologies, and practical skills required to draw meaningful insights from a variety of data, with the help of the most acclaimed software tools in the dat

MICRO-455: Applied machine learning

Real-world engineering applications must cope with a large dataset of dynamic variables, which cannot be well approximated by classical or deterministic models. This course gives an overview of method

Ontological neighbourhood

Related publications (44)

Related concepts (3)

Related people (4)

Related lectures (42)

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances.

Cluster analysis

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.

Hierarchical clustering

In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two categories: Agglomerative: This is a "bottom-up" approach: Each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. Divisive: This is a "top-down" approach: All observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

Covers challenges and techniques for stock return prediction using supervised learning.

Explains k-means clustering, assigning data points to clusters based on proximity and minimizing squared distances within clusters.

Covers unsupervised learning focusing on clustering methods and the challenges faced in clustering algorithms like K-means and DBSCAN.

Characterizing the genetic structure of large cohorts has become increasingly important as genetic studies extend to massive, increasingly diverse biobanks. Popular methods decompose individual genomes into fractional cluster assignments with each cluster ...

This article focuses on spectral methods for recovering communities in temporal networks. In the case of fixed communities, spectral clustering on the simple time-aggregated graph (i.e., the weighted graph formed by the sum of the interactions over all tem ...

,

In this paper, the recommended implementation of the post-quantum key exchange SIKE for Cortex-M4 is attacked through power analysis with a single trace by clustering with the k-means algorithm the power samples of all the invocations of the elliptic curve ...