Cluster analysisCluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.
K-means clusteringk-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances (squared Euclidean distances), but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances.
Determining the number of clusters in a data setDetermining the number of clusters in a data set, a quantity often labelled k as in the k-means algorithm, is a frequent problem in data clustering, and is a distinct issue from the process of actually solving the clustering problem. For a certain class of clustering algorithms (in particular k-means, k-medoids and expectation–maximization algorithm), there is a parameter commonly referred to as k that specifies the number of clusters to detect.
Correlation clusteringClustering is the problem of partitioning data points into groups based on their similarity. Correlation clustering provides a method for clustering a set of objects into the optimum number of clusters without specifying that number in advance. Cluster analysis In machine learning, correlation clustering or cluster editing operates in a scenario where the relationships between the objects are known instead of the actual representations of the objects.
Compact spaceIn mathematics, specifically general topology, compactness is a property that seeks to generalize the notion of a closed and bounded subset of Euclidean space. The idea is that a compact space has no "punctures" or "missing endpoints", i.e., it includes all limiting values of points. For example, the open interval (0,1) would not be compact because it excludes the limiting values of 0 and 1, whereas the closed interval [0,1] would be compact.
TetrahedronIn geometry, a tetrahedron (plural: tetrahedra or tetrahedrons), also known as a triangular pyramid, is a polyhedron composed of four triangular faces, six straight edges, and four vertex corners. The tetrahedron is the simplest of all the ordinary convex polyhedra. The tetrahedron is the three-dimensional case of the more general concept of a Euclidean simplex, and may thus also be called a 3-simplex. The tetrahedron is one kind of pyramid, which is a polyhedron with a flat polygon base and triangular faces connecting the base to a common point.
Clustering high-dimensional dataClustering high-dimensional data is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions. Such high-dimensional spaces of data are often encountered in areas such as medicine, where DNA microarray technology can produce many measurements at once, and the clustering of text documents, where, if a word-frequency vector is used, the number of dimensions equals the size of the vocabulary.
Locally compact spaceIn topology and related branches of mathematics, a topological space is called locally compact if, roughly speaking, each small portion of the space looks like a small portion of a compact space. More precisely, it is a topological space in which every point has a compact neighborhood. In mathematical analysis locally compact spaces that are Hausdorff are of particular interest; they are abbreviated as LCH spaces. Let X be a topological space. Most commonly X is called locally compact if every point x of X has a compact neighbourhood, i.
Relatively compact subspaceIn mathematics, a relatively compact subspace (or relatively compact subset, or precompact subset) Y of a topological space X is a subset whose closure is compact. Every subset of a compact topological space is relatively compact (since a closed subset of a compact space is compact). And in an arbitrary topological space every subset of a relatively compact set is relatively compact. Every compact subset of a Hausdorff space is relatively compact.
5-cellIn geometry, the 5-cell is the convex 4-polytope with Schläfli symbol {3,3,3}. It is a 5-vertex four-dimensional object bounded by five tetrahedral cells. It is also known as a C5, pentachoron, pentatope, pentahedroid, or tetrahedral pyramid. It is the 4-simplex (Coxeter's polytope), the simplest possible convex 4-polytope, and is analogous to the tetrahedron in three dimensions and the triangle in two dimensions. The 5-cell is a 4-dimensional pyramid with a tetrahedral base and four tetrahedral sides.