**Are you an EPFL student looking for a semester project?**

Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.

Concept# Distance correlation

Summary

In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is zero if and only if the random vectors are independent. Thus, distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect linear association between two random variables.
Distance correlation can be used to perform a statistical test of dependence with a permutation test. One first computes the distance correlation (involving the re-centering of Euclidean distance matrices) between two random vectors, and then compares this value to the distance correlations of many shuffles of the data.
The classical measure of dependence, the Pearson correlation coefficient, is mainly sensitive to a linear relationship between two variables. Distance correlation was introduced in 2005 by Gábor J. Székely in several lectures to address this deficiency of Pearson's correlation, namely that it can easily be zero for dependent variables. Correlation = 0 (uncorrelatedness) does not imply independence while distance correlation = 0 does imply independence. The first results on distance correlation were published in 2007 and 2009. It was proved that distance covariance is the same as the Brownian covariance. These measures are examples of energy distances.
The distance correlation is derived from a number of other quantities that are used in its specification, specifically: distance variance, distance standard deviation, and distance covariance. These quantities take the same roles as the ordinary moments with corresponding names in the specification of the Pearson product-moment correlation coefficient.
Let us start with the definition of the sample distance covariance. Let (Xk, Yk), k = 1, 2, ..., n be a statistical sample from a pair of real valued or vector valued random variables (X, Y).

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications (65)

Related concepts (8)

Related courses (8)

Related people (11)

Related lectures (33)

Statistical dispersion

In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range. For instance, when the variance of data in a set is large, the data is widely scattered. On the other hand, when the variance is small, the data in the set is clustered. Dispersion is contrasted with location or central tendency, and together they are the most used properties of distributions.

Spearman's rank correlation coefficient

In statistics, Spearman's rank correlation coefficient or Spearman's ρ, named after Charles Spearman and often denoted by the Greek letter (rho) or as , is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be described using a monotonic function. The Spearman correlation between two variables is equal to the Pearson correlation between the rank values of those two variables; while Pearson's correlation assesses linear relationships, Spearman's correlation assesses monotonic relationships (whether linear or not).

Covariance

In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values (that is, the variables tend to show similar behavior), the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other, (that is, the variables tend to show opposite behavior), the covariance is negative.

MATH-444: Multivariate statistics

Multivariate statistics focusses on inferring the joint distributional properties of several random variables, seen as random vectors, with a main focus on uncovering their underlying dependence struc

FIN-417: Quantitative risk management

This course is an introduction to quantitative risk management that covers standard statistical methods, multivariate risk factor models, non-linear dependence structures (copula models), as well as p

MICRO-110: Design of experiments

This course provides an introduction to experimental statistics, including use of population statistics to characterize experimental results, use of comparison statistics and hypothesis testing to eva

Explores Principal Component Analysis theory, properties, applications, and hypothesis testing in multivariate statistics.

Explores Principal Component Analysis, dimensionality reduction, data quality assessment, and error rate control.

Covers regression with exponential family responses using Generalised Linear Models.

Daniel Kuhn, Yves Rychener, Viet Anh Nguyen

The state-of-the-art methods for estimating high-dimensional covariance matrices all shrink the eigenvalues of the sample covariance matrix towards a data-insensitive shrinkage target. The underlying shrinkage transformation is either chosen heuristically ...

2024We present an extended validation of semi-analytical, semi-empirical covariance matrices for the two-point correlation function (2PCF) on simulated catalogs representative of luminous red galaxies (LRGs) data collected during the initial 2 months of operat ...

Functional connectomes (FCs) containing pairwise estimations of functional couplings between pairs of brain regions are commonly represented by correlation matrices. As symmetric positive definite matrices, FCs can be transformed via tangent space projecti ...