Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
In statistics, Spearman's rank correlation coefficient or Spearman's ρ, named after Charles Spearman and often denoted by the Greek letter (rho) or as , is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be described using a monotonic function. The Spearman correlation between two variables is equal to the Pearson correlation between the rank values of those two variables; while Pearson's correlation assesses linear relationships, Spearman's correlation assesses monotonic relationships (whether linear or not). If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other. Intuitively, the Spearman correlation between two variables will be high when observations have a similar (or identical for a correlation of 1) rank (i.e. relative position label of the observations within the variable: 1st, 2nd, 3rd, etc.) between the two variables, and low when observations have a dissimilar (or fully opposed for a correlation of −1) rank between the two variables. Spearman's coefficient is appropriate for both continuous and discrete ordinal variables. Both Spearman's and Kendall's can be formulated as special cases of a more general correlation coefficient. The Spearman correlation coefficient is defined as the Pearson correlation coefficient between the rank variables. For a sample of size n, the n raw scores are converted to ranks , and is computed as where denotes the usual Pearson correlation coefficient, but applied to the rank variables, is the covariance of the rank variables, and are the standard deviations of the rank variables. Only if all n ranks are distinct integers, it can be computed using the popular formula where is the difference between the two ranks of each observation, n is the number of observations. Consider a bivariate sample with corresponding ranks .
Michael Herzog, Simona Adele Garobbio
Marcos Rubinstein, Hamidreza Karami