Least Square Regression Line
Linear Regression
If our data shows a linear relationship between \(X\) and \(Y\), then the straight line which best describes the relationship is the regression line. The regression line is given by \(\hat{Y}\)=a+bX$.
Finding the value of b
The value of \(b\) can be calculated using either of the following formulae:
- \(b=\frac{n\sum(x_iy_i)-(\sum x_i)(\sum y_i)}{n\sum(x_i^2)-(\sum x_i)^2}\)
- \(b=\rho\frac{\sigma_Y}{\sigma_X}\), where \(\rho\) is the Pearson correlation coefficient, \(\sigma_X\)
Finding the value of a
\(a=\bar{y}-b\cdot\bar{x}\), where \(\bar{x}\) is the mean of \(X\) and \(\bar{y}\) is the mean of \(Y\).
Coefficient of determination (\(R^2\))
The coefficient of determination can be computer with : \(R^2 = \frac{SSR}{SST}=1-\frac{SSE}{SST}\) Where :
- \(SST\) is the total Sum of Squares : \(SST=\sum (y_i-\bar{y})^2\)
- \(SSR\) is the regression Sum of Squares : \(SSR=\sum (\hat{y_i}-\bar{y})^2\)
- \(SSE\) is the error Sum of Squares : \(SSE=\sum (\hat{y_i}-y)^2\)
If \(SSE\) is small, we can assume that our fit is good.
Linear Regression in Python
We can use the fit function in the sklearn.linear_model.LinearRegression class.
from sklearn import linear_model
import numpy as np
xl = [1, 2, 3, 4, 5]
x = np.asarray(xl).reshape(-1, 1)
y = [2, 1, 4, 3, 5]
lm = linear_model.LinearRegression()
lm.fit(x, y)
print(f'a = {lm.intercept_}')
print(f'b = {lm.coef_[0]}')
print("Where Y=a+b*X")
a = 0.5999999999999996
b = 0.8000000000000002
Where Y=a+b*X