In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two random variables.
Distance correlation can be used to perform a statistical test of dependence with a permutation test. One first computes the distance correlation (involving the re-centering of Euclidean distance matrices) between two random vectors, and then compares this value to the distance correlations of many shuffles of the data.
The classical measure of dependence, the Pearson correlation coefficient, is mainly sensitive to a linear relationship between two variables. Distance correlation was introduced in 2005 by Gábor J. Székely in several lectures to address this deficiency of Pearson's correlation, namely that it can easily be zero for dependent variables. Correlation = 0 (uncorrelatedness) does not imply independence while distance correlation = 0 does imply independence. The first results on distance correlation were published in 2007 and 2009. It was proved that distance covariance is the same as the Brownian covariance. These measures are examples of energy distances.
The distance correlation is derived from a number of other quantities that are used in its specification, specifically: distance variance, distance standard deviation, and distance covariance. These quantities take the same roles as the ordinary moments with corresponding names in the specification of the Pearson product-moment correlation coefficient.
Let us start with the definition of the sample distance covariance. Let (Xk, Yk), k = 1, 2, ..., n be a statistical sample from a pair of real valued or vector valued random variables (X, Y).
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range. For instance, when the variance of data in a set is large, the data is widely scattered. On the other hand, when the variance is small, the data in the set is clustered. Dispersion is contrasted with location or central tendency, and together they are the most used properties of distributions.
In statistics, Spearman's rank correlation coefficient or Spearman's ρ, named after Charles Spearman and often denoted by the Greek letter (rho) or as , is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be described using a monotonic function. The Spearman correlation between two variables is equal to the Pearson correlation between the rank values of those two variables; while Pearson's correlation assesses linear relationships, Spearman's correlation assesses monotonic relationships (whether linear or not).
In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values (that is, the variables tend to show similar behavior), the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other, (that is, the variables tend to show opposite behavior), the covariance is negative.
Multivariate statistics focusses on inferring the joint distributional properties of several random variables, seen as random vectors, with a main focus on uncovering their underlying dependence struc
This course is an introduction to quantitative risk management that covers standard statistical methods, multivariate risk factor models, non-linear dependence structures (copula models), as well as p
This course provides an introduction to experimental statistics, including use of population statistics to characterize experimental results, use of comparison statistics and hypothesis testing to eva
The state-of-the-art methods for estimating high-dimensional covariance matrices all shrink the eigenvalues of the sample covariance matrix towards a data-insensitive shrinkage target. The underlying shrinkage transformation is either chosen heuristically ...
We present an extended validation of semi-analytical, semi-empirical covariance matrices for the two-point correlation function (2PCF) on simulated catalogs representative of luminous red galaxies (LRGs) data collected during the initial 2 months of operat ...
Functional connectomes (FCs) containing pairwise estimations of functional couplings between pairs of brain regions are commonly represented by correlation matrices. As symmetric positive definite matrices, FCs can be transformed via tangent space projecti ...