Cluster analysisCluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.
Measure-preserving dynamical systemIn mathematics, a measure-preserving dynamical system is an object of study in the abstract formulation of dynamical systems, and ergodic theory in particular. Measure-preserving systems obey the Poincaré recurrence theorem, and are a special case of conservative systems. They provide the formal, mathematical basis for a broad range of physical systems, and, in particular, many systems from classical mechanics (in particular, most non-dissipative systems) as well as systems in thermodynamic equilibrium.
Heavy-tailed distributionIn probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy. There are three important subclasses of heavy-tailed distributions: the fat-tailed distributions, the long-tailed distributions, and the subexponential distributions.
Social dynamicsSocial dynamics (or sociodynamics) is the study of the behavior of groups that results from the interactions of individual group members as well to the study of the relationship between individual interactions and group level behaviors. The field of social dynamics brings together ideas from economics, sociology, social psychology, and other disciplines, and is a sub-field of complex adaptive systems or complexity science. The fundamental assumption of the field is that individuals are influenced by one another's behavior.
BiclusteringBiclustering, block clustering, Co-clustering or two-mode clustering is a data mining technique which allows simultaneous clustering of the rows and columns of a matrix. The term was first introduced by Boris Mirkin to name a technique introduced many years earlier, in 1972, by John A. Hartigan. Given a set of samples represented by an -dimensional feature vector, the entire dataset can be represented as rows in columns (i.e., an matrix). The Biclustering algorithm generates Biclusters.
Pearson correlation coefficientIn statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations.
Satellite laser rangingIn satellite laser ranging (SLR) a global network of observation stations measures the round trip time of flight of ultrashort pulses of light to satellites equipped with retroreflectors. This provides instantaneous range measurements of millimeter level precision which can be accumulated to provide accurate measurement of orbits and a host of important scientific data. The laser pulse can also be reflected by the surface of a satellite without a retroreflector, which is used for tracking space debris.
Determining the number of clusters in a data setDetermining the number of clusters in a data set, a quantity often labelled k as in the k-means algorithm, is a frequent problem in data clustering, and is a distinct issue from the process of actually solving the clustering problem. For a certain class of clustering algorithms (in particular k-means, k-medoids and expectation–maximization algorithm), there is a parameter commonly referred to as k that specifies the number of clusters to detect.
Fitness landscapeIn evolutionary biology, fitness landscapes or adaptive landscapes (types of evolutionary landscapes) are used to visualize the relationship between genotypes and reproductive success. It is assumed that every genotype has a well-defined replication rate (often referred to as fitness). This fitness is the "height" of the landscape. Genotypes which are similar are said to be "close" to each other, while those that are very different are "far" from each other.
Long tailIn statistics and business, a long tail of some distributions of numbers is the portion of the distribution having many occurrences far from the "head" or central part of the distribution. The distribution could involve popularities, random numbers of occurrences of events with various probabilities, etc. The term is often used loosely, with no definition or an arbitrary definition, but precise definitions are possible. In statistics, the term long-tailed distribution has a narrow technical meaning, and is a subtype of heavy-tailed distribution.