Day 7 - Pearson and spearman correlations

Pearson correlation coefficient

Problem

Given two n-element data sets, \(X\) and \(Y\), calculate the value of the Pearson correlation coefficient.

Python implementation

Using the formula

$$\rho_{X,Y}=\frac{cov(X,Y)}{\sigma_X\sigma_Y}$$

where

$$cov(X,Y)=\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})\cdot(y_i-\bar{y})$$
n = 10
X = [10, 9.8, 8, 7.8, 7.7, 7, 6, 5, 4, 2]
Y = [200, 44, 32, 24, 22, 17, 15, 12, 8, 4]

def cov(X, Y, n):
    x_mean = 1/n*sum(X)
    y_mean = 1/n*sum(Y)
    return 1/n*sum([(X[i]-x_mean)*(Y[i]-y_mean) for i in range(n)])

def stdv(X, mu_x, n):
    return (sum([(x - mu_x)**2 for x in X]) / n)**0.5

def pearson_1(X, Y, n):
    std_x = stdv(X, 1/n*sum(X), n)
    std_y = stdv(Y, 1/n*sum(Y), n)
    return cov(X, Y, n)/(std_x*std_y)

pearson_1(X,Y,n)
0.6124721937208479

Python implementation

Using the formula

$$\rho_{X,Y}=\frac{\sum_{i}(x_i-\bar{x})(y_i-\bar{y})}{n\sigma_X\sigma_Y}$$
def pearson_2(X, Y,n):
    std_x = stdv(X, 1/n*sum(X), n)
    std_y = stdv(Y, 1/n*sum(Y), n)
    x_mean = 1/n*sum(X)
    y_mean = 1/n*sum(Y)

    return sum([(X[i]-x_mean)*(Y[i]-y_mean) for i in range(n)])/(n*std_x*std_y)

pearson_2(X, Y,n)
0.6124721937208479

Spearman's rank correlation coefficient

Problem

Given two \(n\)-element data sets, \(X\) and \(Y\), calculate the value of Spearman's rank correlation coefficient.

Python implementation

We knwo that in this case, the values in each dataset are unique. Hence we can use the formula :

$$r_s=1-\frac{6\sum d_i^2}{n(n^2-1)}$$
n = 10
X = [10, 9.8, 8, 7.8, 7.7, 1.7, 6, 5, 1.4, 2]
Y = [200, 44, 32, 24, 22, 17, 15, 12, 8, 4]

def spearman_rank(X, Y, n):
    rank_X = [sorted(X).index(v)+1 for v in X]
    rank_Y = [sorted(Y).index(v)+1 for v in Y]

    d = [(rank_X[i]-rank_Y[i])**2 for i in range(n)]
    return 1-(6*sum(d))/(n*(n*n-1))

spearman_rank(X, Y, n)
0.9030303030303031