Summary
In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two categories: Agglomerative: This is a "bottom-up" approach: Each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. Divisive: This is a "top-down" approach: All observations start in one cluster, and splits are performed recursively as one moves down the hierarchy. In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram. Hierarchical clustering has the distinct advantage that any valid measure of distance can be used. In fact, the observations themselves are not required: all that is used is a matrix of distances. On the other hand, except for the special case of single-linkage distance, none of the algorithms (except exhaustive search in ) can be guaranteed to find the optimum solution. The standard algorithm for hierarchical agglomerative clustering (HAC) has a time complexity of and requires memory, which makes it too slow for even medium data sets. However, for some special cases, optimal efficient agglomerative methods (of complexity ) are known: SLINK for single-linkage and CLINK for complete-linkage clustering. With a heap, the runtime of the general case can be reduced to , an improvement on the aforementioned bound of , at the cost of further increasing the memory requirements. In many cases, the memory overheads of this approach are too large to make it practically usable. Divisive clustering with an exhaustive search is , but it is common to use faster heuristics to choose splits, such as k-means. In order to decide which clusters should be combined (for agglomerative), or where a cluster should be split (for divisive), a measure of dissimilarity between sets of observations is required.
About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (32)
PHYS-512: Statistical physics of computation
This course covers the statistical physics approach to computer science problems ranging from graph theory and constraint satisfaction to inference and machine learning. In particular the replica and
CS-401: Applied data analysis
This course teaches the basic techniques, methodologies, and practical skills required to draw meaningful insights from a variety of data, with the help of the most acclaimed software tools in the dat
DH-406: Machine learning for DH
This course aims to introduce the basic principles of machine learning in the context of the digital humanities. We will cover both supervised and unsupervised learning techniques, and study and imple
Show more
Related lectures (185)
Supervised Learning: k-NN and Decision Trees
Introduces supervised learning with k-NN and decision trees, covering techniques, examples, and ensemble methods.
Statistical Physics of Clusters
Explores the statistical physics of clusters, focusing on complexity and equilibrium behavior.
Graph Coloring II
Explores advanced graph coloring concepts, including planted coloring, rigidity threshold, and frozen variables in BP fixed points.
Show more
Related publications (286)

Interpret3C: Interpretable Student Clustering Through Individualized Feature Selection

Vinitra Swamy, Paola Mejia Domenzain, Julian Thomas Blackwell, Isadora Alves de Salles

Clustering in education, particularly in large-scale online environments like MOOCs, is essential for understanding and adapting to diverse student needs. However, the effectiveness of clustering depends on its interpretability, which becomes challenging w ...
2024
Show more
Related MOOCs (14)
Selected chapters form winterschool on multi-scale brain
Understanding the brain requires an integrated understan­ding of different scales of organisation of the brain. This Massive Open Online Course (MOOC) will take the you through the latest data, models
Selected chapters form winterschool on multi-scale brain
Understanding the brain requires an integrated understan­ding of different scales of organisation of the brain. This Massive Open Online Course (MOOC) will take the you through the latest data, models
Neuroscience Reconstructed: Cell Biology
This course will provide the fundamental knowledge in neuroscience required to understand how the brain is organised and how function at multiple scales is integrated to give rise to cognition and beh
Show more