Spearman's Rank Correlation Coefficient

A rank correlation is any of several statistics that measure an ordinal association—the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the ordering labels "first", "second", "third", etc. to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them.

We have two random variables \(X\) and \(Y\): * \(X=\{x_i, x_2, x_3, ..., x_n\}\) * \(Y=\{y_i, y_2, y_3, ..., y_n\}\)

if \(Rank_X\) and \(Rank_Y\) denote the respective ranks of each data point, then the Spearman's rank correlation coefficient, \(r_s\), is the Pearson correlation coefficient of \(Rank_X\) and \(Rank_Y\).

What does it means?

The Spearman's rank correlation coefficientis is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be described using a monotonic function. The Spearman correlation between two variables is equal to the Pearson correlation between the rank values of those two variables; while Pearson's correlation assesses linear relationships, Spearman's correlation assesses monotonic relationships (whether linear or not). If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other.

Spearman

A Spearman correlation of 1 results when the two variables being compared are monotonically related, even if their relationship is not linear. This means that all data-points with greater x-values than that of a given data-point will have greater y-values as well. In contrast, this does not give a perfect Pearson correlation.

Example

  • \(X=\{0.2, 1.3, 0.2, 1.1, 1.4, 1.5\}\)
  • \(Y=\{1.9, 2.2, 3.1, 1.2, 2.2, 2.2\}\)
$$ Rank_X \quad \begin{bmatrix} X: & 0.2 & 1.3 & 0.2 & 1.1 & 1.4 & 1.5 \\ Rank: & 1 & 3 & 1 & 2 & 4 & 5 \end{bmatrix} \quad $$

so, \(Rank_X = \{1, 3, 1, 2, 4, 5\}\)

similarly, \(Rank_Y=\{2,3,4,1,3,3\}\)

\(r_s\) equals the Pearson correlation coefficient of \(Rank_X\) and \(Rank_Y\), meaning that \(r=0.158114\)

Special case : \(X\) and \(Y\) don't contain duplicates

$$r_s=1-\frac{6\sum d_i^2}{n(n^2-1)}$$

Where, \(d_i\) is the difference between the respective values of \(Rank_X\) and \(Rank_Y\).