Day 9 - Multiple Linear Regression

Problem

Here is a simple equation:

$$Y=a+b_1\cdot f_1++b_2\cdot f_2+...++b_m\cdot f_m$$
$$Y=a+\sum_{i=1}^m b_i\cdot f_i$$

for \((m+1)\) read constants \((a,f_1, f_2, ..., f_m)\). We can say that the value of \(Y\) depends on \(m\) features. We study this equation for \(n\) different feature sets \((f_1, f_2, ..., f_m)\) and records each respective value of \(Y\).

If we have \(q\) new feature sets, and without accounting for bias and variance trade-offs,what is the value of \(Y\) for each of the sets?

Python implementation

import numpy as np
m = 2
n = 7
x_1 = [0.18, 0.89]
y_1 = 109.85

x_2 = [1.0, 0.26]
y_2 = 155.72

x_3 = [0.92, 0.11]
y_3 = 137.66

x_4 = [0.07, 0.37]
y_4 = 76.17

x_5 = [0.85, 0.16]
y_5 = 139.75

x_6 = [0.99, 0.41]
y_6 = 162.6

x_7 = [0.87, 0.47]
y_7 = 151.77


q_1 = [0.49, 0.18]
q_2 = [0.57, 0.83]
q_3 = [0.56, 0.64]
q_4 = [0.76, 0.18]

With scikit learn

X = np.array([x_1, x_2, x_3, x_4, x_5, x_6, x_7])
Y = np.array([y_1, y_2, y_3, y_4, y_5, y_6, y_7])
X_q = np.array([q_1, q_2, q_3, q_4])

from sklearn import linear_model
lm = linear_model.LinearRegression()
lm.fit(X, Y)

lm.predict(X_q)
array([105.21455835, 142.67095131, 132.93605469, 129.70175405])

without scikit learn (but with numpy)

from numpy.linalg import inv

#center
X_R = X-np.mean(X,axis=0)
a = np.mean(Y)
Y_R = Y-a

#calculate b
B = inv(X_R.T@X_R)@X_R.T@Y_R


#predict
X_new_R = X_q-np.mean(X,axis=0)
Y_new_R = X_new_R@B
Y_new = Y_new_R + a

Y_new
array([105.21455835, 142.67095131, 132.93605469, 129.70175405])