Concept

Distance correlation

Résumé
In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two random variables. Distance correlation can be used to perform a statistical test of dependence with a permutation test. One first computes the distance correlation (involving the re-centering of Euclidean distance matrices) between two random vectors, and then compares this value to the distance correlations of many shuffles of the data. The classical measure of dependence, the Pearson correlation coefficient, is mainly sensitive to a linear relationship between two variables. Distance correlation was introduced in 2005 by Gábor J. Székely in several lectures to address this deficiency of Pearson's correlation, namely that it can easily be zero for dependent variables. Correlation = 0 (uncorrelatedness) does not imply independence while distance correlation = 0 does imply independence. The first results on distance correlation were published in 2007 and 2009. It was proved that distance covariance is the same as the Brownian covariance. These measures are examples of energy distances. The distance correlation is derived from a number of other quantities that are used in its specification, specifically: distance variance, distance standard deviation, and distance covariance. These quantities take the same roles as the ordinary moments with corresponding names in the specification of the Pearson product-moment correlation coefficient. Let us start with the definition of the sample distance covariance. Let (Xk, Yk), k = 1, 2, ..., n be a statistical sample from a pair of real valued or vector valued random variables (X, Y).
À propos de ce résultat
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.