Cluster analysisCluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.
K-means clusteringk-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances.
Star clusterStar clusters are large groups of stars held together by self-gravitation. Two main types of star clusters can be distinguished: globular clusters are tight groups of ten thousand to millions of old stars which are gravitationally bound, while open clusters are more loosely clustered groups of stars, generally containing fewer than a few hundred members, and are often very young.
Open clusterAn open cluster is a type of star cluster made of tens to a few thousand stars that were formed from the same giant molecular cloud and have roughly the same age. More than 1,100 open clusters have been discovered within the Milky Way galaxy, and many more are thought to exist. They are loosely bound by mutual gravitational attraction and become disrupted by close encounters with other clusters and clouds of gas as they orbit the Galactic Center.
Globular clusterA globular cluster is a spheroidal conglomeration of stars. Globular clusters are bound together by gravity, with a higher concentration of stars towards their centers. They can contain anywhere from tens of thousands to many millions of member stars. Their name is derived from Latin globulus (small sphere). Globular clusters are occasionally known simply as "globulars". Although one globular cluster, Omega Centauri, was observed in antiquity and long thought to be a star, recognition of the clusters' true nature came with the advent of telescopes in the 17th century.
Galaxy clusterA galaxy cluster, or a cluster of galaxies, is a structure that consists of anywhere from hundreds to thousands of galaxies that are bound together by gravity, with typical masses ranging from 1014 to 1015 solar masses. They are the second-largest known gravitationally bound structures in the universe after galaxy filaments and were believed to be the largest known structures in the universe until the 1980s, when superclusters were discovered. One of the key features of clusters is the intracluster medium (ICM).
Correlation clusteringClustering is the problem of partitioning data points into groups based on their similarity. Correlation clustering provides a method for clustering a set of objects into the optimum number of clusters without specifying that number in advance. Cluster analysis In machine learning, correlation clustering or cluster editing operates in a scenario where the relationships between the objects are known instead of the actual representations of the objects.
Mean squared errorIn statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive (and not zero) is because of randomness or because the estimator does not account for information that could produce a more accurate estimate.
Galaxy groups and clustersGalaxy groups and clusters are the largest known gravitationally bound objects to have arisen thus far in the process of cosmic structure formation. They form the densest part of the large-scale structure of the Universe. In models for the gravitational formation of structure with cold dark matter, the smallest structures collapse first and eventually build the largest structures, clusters of galaxies. Clusters are then formed relatively recently between 10 billion years ago and now.
Single-linkage clusteringIn statistics, single-linkage clustering is one of several methods of hierarchical clustering. It is based on grouping clusters in bottom-up fashion (agglomerative clustering), at each step combining two clusters that contain the closest pair of elements not yet belonging to the same cluster as each other. This method tends to produce long thin clusters in which nearby elements of the same cluster have small distances, but elements at opposite ends of a cluster may be much farther from each other than two elements of other clusters.