Day 7 - Pearson and spearman correlations
Pearson correlation coefficient
Problem
Given two n-element data sets, \(X\) and \(Y\), calculate the value of the Pearson correlation coefficient.
Python implementation
Using the formula
$$\rho_{X,Y}=\frac{cov(X,Y)}{\sigma_X\sigma_Y}$$
where
$$cov(X,Y)=\frac{1}{n}\sum_{i=1}^{n}(x_i-\bar{x})\cdot(y_i-\bar{y})$$
n = 10
X = [10, 9.8, 8, 7.8, 7.7, 7, 6, 5, 4, 2]
Y = [200, 44, 32, 24, 22, 17, 15, 12, 8, 4]
def cov(X, Y, n):
x_mean = 1/n*sum(X)
y_mean = 1/n*sum(Y)
return 1/n*sum([(X[i]-x_mean)*(Y[i]-y_mean) for i in range(n)])
def stdv(X, mu_x, n):
return (sum([(x - mu_x)**2 for x in X]) / n)**0.5
def pearson_1(X, Y, n):
std_x = stdv(X, 1/n*sum(X), n)
std_y = stdv(Y, 1/n*sum(Y), n)
return cov(X, Y, n)/(std_x*std_y)
pearson_1(X,Y,n)
0.6124721937208479
Python implementation
Using the formula
$$\rho_{X,Y}=\frac{\sum_{i}(x_i-\bar{x})(y_i-\bar{y})}{n\sigma_X\sigma_Y}$$
def pearson_2(X, Y,n):
std_x = stdv(X, 1/n*sum(X), n)
std_y = stdv(Y, 1/n*sum(Y), n)
x_mean = 1/n*sum(X)
y_mean = 1/n*sum(Y)
return sum([(X[i]-x_mean)*(Y[i]-y_mean) for i in range(n)])/(n*std_x*std_y)
pearson_2(X, Y,n)
0.6124721937208479
Spearman's rank correlation coefficient
Problem
Given two \(n\)-element data sets, \(X\) and \(Y\), calculate the value of Spearman's rank correlation coefficient.
Python implementation
We knwo that in this case, the values in each dataset are unique. Hence we can use the formula :
$$r_s=1-\frac{6\sum d_i^2}{n(n^2-1)}$$
n = 10
X = [10, 9.8, 8, 7.8, 7.7, 1.7, 6, 5, 1.4, 2]
Y = [200, 44, 32, 24, 22, 17, 15, 12, 8, 4]
def spearman_rank(X, Y, n):
rank_X = [sorted(X).index(v)+1 for v in X]
rank_Y = [sorted(Y).index(v)+1 for v in Y]
d = [(rank_X[i]-rank_Y[i])**2 for i in range(n)]
return 1-(6*sum(d))/(n*(n*n-1))
spearman_rank(X, Y, n)
0.9030303030303031