Day 8 - Least Square Regression Line
Least Square Regression Line
Problem
A group of five students enrolls in Statistics immediately after taking a Math aptitude test. Each student's Math aptitude test score, \(x\), and Statistics course grade, \(y\), can be expressed as the following list \((x,y)\) of points:
- \((95, 85)\)
- \((85, 95)\)
- \((80, 70)\)
- \((70, 65)\)
- \((60, 70)\)
If a student scored an 80 on the Math aptitude test, what grade would we expect them to achieve in Statistics? Determine the equation of the best-fit line using the least squares method, then compute and print the value of \(y\) when \(x=80\).
X = [95, 85, 80, 70, 60]
Y = [85, 95, 70, 65, 70]
n = len(X)
def cov(X, Y, n):
x_mean = 1/n*sum(X)
y_mean = 1/n*sum(Y)
return 1/n*sum([(X[i]-x_mean)*(Y[i]-y_mean) for i in range(n)])
def stdv(X, mu_x, n):
return (sum([(x - mu_x)**2 for x in X]) / n)**0.5
def pearson_1(X, Y, n):
std_x = stdv(X, 1/n*sum(X), n)
std_y = stdv(Y, 1/n*sum(Y), n)
return cov(X, Y, n)/(std_x*std_y)
b = pearson_1(X, Y, n)*stdv(Y, sum(Y)/n, n)/stdv(X, sum(X)/n, n)
a = sum(Y)/n - b*sum(X)/n
print(f"If a student scored 80 on the math test, he would most likely score a {round(a+80*b,3)} in statistics")
If a student scored 80 on the math test, he would most likely score a 78.288 in statistics
Pearson correlation coefficient
Problem
The regression line of \(y\) on \(x\) is \(3x+4y+8=0\), and the regression line of \(x\) on \(y\) is \(4x+3y+7=0\). What is the value of the Pearson correlation coefficient?
Mathematical explanation
The initial equation system is :
So we can rewrite the 2 lines this way :
so \(b_1=-\frac{3}{4}\) and \(b_2=-\frac{3}{4}\)
When we apply the Pearson's coefficient formula :
- let \(p\) be the pearson coefficient
- let \(\sigma_X\) be the standard deviation of \(x\)
- let \(\sigma_Y\) be the standard deviation of \(y\)
We hence have
by multiplying theses 2 equations together we get
finally we get \(p=\left(-\frac{3}{4}\right)\) or \(p=\left(\frac{3}{4}\right)\)
Since \(X\) and \(Y\) are negatively correlated we have \(p=\left(-\frac{3}{4}\right)\)