
Reading Time: 5 Minutes
In this post I aim to spell out the complete logic and math into code, so that you can implement the concept of Linear Regression as discussed in the previous post on sapiencespace. This will help you develop a deeper understanding into the fundamentals and become well versed in choosing the right framework for a given Machine Learning problem.

The Program/Code
First, we import the necessary libraries and load our dataset using pandas’ .read_csv() method. The dataset which I have used for this project will be available here.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
ad = pd.read_csv('data.csv')
df = ad[['TV','Sales']]
df.head()
– | TV | Sales |
---|---|---|
0 | 230.1 | 22.1 |
1 | 44.5 | 10.4 |
2 | 17.2 | 12.0 |
3 | 151.5 | 16.5 |
4 | 180.8 | 17.9 |
Let’s keep the variables as x, y for simplified view
df = df.rename(columns={'TV':'x','Sales':'y'})
Here the table is expanded for the terms which are plugged into the final equations to find the least square estimators (slope and intercept) which we derived in the last post.
df["x-x'"] = df.x - np.mean(df.x)
df["y-y'"] = df.y - np.mean(df.y)
df["(y-y')*(x-x')"] = df["y-y'"]*df["x-x'"]
df["(x-x')^2"] = df["x-x'"]**2
df.head()
– | x | y | x-x’ | y-y’ | (y-y’)*(x-x’) | (x-x’)^2 |
---|---|---|---|---|---|---|
0 | 230.1 | 22.1 | 83.0575 | 6.9695 | 578.869246 | 6898.548306 |
1 | 44.5 | 10.4 | -102.5425 | -4.7305 | 485.077296 | 10514.964306 |
2 | 17.2 | 12.0 | -129.8425 | -3.1305 | 406.471946 | 16859.074806 |
3 | 151.5 | 16.5 | 4.4575 | 1.3695 | 6.104546 | 19.869306 |
4 | 180.8 | 17.9 | 33.7575 | 2.7695 | 93.491396 | 1139.568806 |
m = sum(df["(y-y')*(x-x')"])/sum(df["(x-x')^2"])
m
0.055464770469558805
c = np.mean(df.y - m*(np.mean(df.x)))
c
6.974821488229896
So our final equation is y=6.974+0.0554*x
This is the code to plot the line which we have found, here I have generated an array of 100 evenly spaced values between 1 and 10, inclusive, to feed in the values to obtain y (dependent variable). This line can be plotted by feeding in the training data which we used to visualize the effectiveness of the model.
x = np.linspace(1,10,100)
plt.plot(x, 6.974+0.0554*x)
plt.show()

Now that we have understood the math and its code, we come to the fun part. Instead of doing all the calculation manually and implementing the math step by step, we can directly import a library which performs linear regression in python, it is statmodels.api.
First we need to split the dataset into test and train, so that the accuracy of the model can be calculated on the unseen test data and training only happens on the train data,
X = ad['TV']
y = ad['Sales']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) # 30% is test and 70% is train
import statsmodels.api as sm
X_train_sm = sm.add_constant(X_train)
lr = sm.OLS(y_train, X_train_sm).fit() # OLS is ordinary least square
lr.params
const 6.926214
TV 0.055278
dtype: float64
We can see that we have obtained a very close value to the above results by doing manual calculation.
# plotting the model's predictions on the test dataset
sns.set_style('whitegrid')
plt.scatter(X_test, y_test)
plt.plot(X_test, 6.926+0.055*X_test, 'r')
plt.xlabel('TV')
plt.ylabel('Sales')
plt.title('Linear Regression')
plt.show()

That’s a wrap, we have covered the math and the code behind linear regression using OLS method. If you have any questions or need further clarification, please feel free to ask in the comment section below. Your curiosity and engagement are highly valued.
Thank you for reading all along, subscribe to sapiencespace and enable notifications to get regular insights.
Click here to view all the concepts related to machine learning.
Cover picture and title image credits – unsplash content creators
Want to dive deep into Linear Regression ? here are some Advanced Concepts: