Multiple Linear Regression
If \(Y\) is linearly dependent only on \(X\), then we can use the ordinary least square regression line, \(\hat{Y}=a+bX\).
However, if \(Y\) shows linear dependency on \(m\) variables \(X_1\), \(X_2\), ..., \(X_m\), then we need to find the values of \(a\) and \(m\) other constants (\(b_1\), \(b_2\), ..., \(b_m\)). We can then write the regression equation as:
Matrix Form of the Regression Equation
Let's consider that \(Y\) depends on two variables, \(X_1\) and \(X_2\). We write the regression relation as \(\hat{Y}=a+b_1X_1+b_2X_2\). Consider the following matrix operation:
We define two matrices, \(X\) and \(B\) as:
Now, we rewrite the regression relation as \(\hat{Y}=X\cdot B\). This transforms the regression relation into matrix form.
Generalized Matrix Form
We will consider that \(Y\) shows a linear relationship with \(m\) variables, \(X_1\), \(X_2\), ..., \(X_m\). Let's say that we made \(n\) observations on different tuples \((x_1, x_2, ..., x_m)\):
- \(y_1=a+b_1\cdot x_{1,1} + b_2\cdot x_{2,1} + ... + b_m\cdot x_{m,1}\)
- \(y_2=a+b_2\cdot x_{1,2} + b_2\cdot x_{2,2} + ... + b_m\cdot x_{m,2}\)
- \(...\)
- \(y_n=a+b_n\cdot x_{1,n} + b_2\cdot x_{2,n} + ... + b_m\cdot x_{m,n}\)
Now, we can find the matrices:
Finding the Matrix B
We know that \(Y=X\cdot B\)
Finding the Value of Y
Suppose we want to find the value of for some tuple \(Y\), then \((x_1, x_2, ..., x_m)\),
Multiple Regression in Python
We can use the fit function in the sklearn.linear_model.LinearRegression class.
from sklearn import linear_model
x = [[5, 7], [6, 6], [7, 4], [8, 5], [9, 6]]
y = [10, 20, 60, 40, 50]
lm = linear_model.LinearRegression()
lm.fit(x, y)
a = lm.intercept_
b = lm.coef_
print(f"Linear regression coefficients between Y and X : a={a}, b_0={b[0]}, b_1={b[1]}")
Linear regression coefficients between Y and X : a=51.953488372092984, b_0=6.65116279069768, b_1=-11.162790697674419