Spelling out Linear Regression in Code – a 101 guide

Reading Time: 5 Minutes

In this post I aim to spell out the complete logic and math into code, so that you can implement the concept of Linear Regression as discussed in the previous post on sapiencespace. This will help you develop a deeper understanding into the fundamentals and become well versed in choosing the right framework for a given Machine Learning problem.

The Program/Code

First, we import the necessary libraries and load our dataset using pandas’ .read_csv() method. The dataset which I have used for this project will be available here.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

ad = pd.read_csv('data.csv')
df = ad[['TV','Sales']]
df.head()

–	TV	Sales
0	230.1	22.1
1	44.5	10.4
2	17.2	12.0
3	151.5	16.5
4	180.8	17.9

Let’s keep the variables as x, y for simplified view

df = df.rename(columns={'TV':'x','Sales':'y'})

Here the table is expanded for the terms which are plugged into the final equations to find the least square estimators (slope and intercept) which we derived in the last post.

df["x-x'"] = df.x - np.mean(df.x)
df["y-y'"] = df.y - np.mean(df.y)

df["(y-y')*(x-x')"] = df["y-y'"]*df["x-x'"]
df["(x-x')^2"] = df["x-x'"]**2
df.head()

–	x	y	x-x’	y-y’	(y-y’)*(x-x’)	(x-x’)^2
0	230.1	22.1	83.0575	6.9695	578.869246	6898.548306
1	44.5	10.4	-102.5425	-4.7305	485.077296	10514.964306
2	17.2	12.0	-129.8425	-3.1305	406.471946	16859.074806
3	151.5	16.5	4.4575	1.3695	6.104546	19.869306
4	180.8	17.9	33.7575	2.7695	93.491396	1139.568806

m = sum(df["(y-y')*(x-x')"])/sum(df["(x-x')^2"])
m

0.055464770469558805

c = np.mean(df.y - m*(np.mean(df.x)))
c

6.974821488229896

So our final equation is y=6.974+0.0554*x

This is the code to plot the line which we have found, here I have generated an array of 100 evenly spaced values between 1 and 10, inclusive, to feed in the values to obtain y (dependent variable). This line can be plotted by feeding in the training data which we used to visualize the effectiveness of the model.

x = np.linspace(1,10,100)

plt.plot(x, 6.974+0.0554*x)
plt.show()

Now that we have understood the math and its code, we come to the fun part. Instead of doing all the calculation manually and implementing the math step by step, we can directly import a library which performs linear regression in python, it is statmodels.api.

First we need to split the dataset into test and train, so that the accuracy of the model can be calculated on the unseen test data and training only happens on the train data,

X = ad['TV']
y = ad['Sales']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) # 30% is test and 70% is train

import statsmodels.api as sm
X_train_sm = sm.add_constant(X_train)
lr = sm.OLS(y_train, X_train_sm).fit() # OLS is ordinary least square

lr.params

const 6.926214

TV 0.055278

dtype: float64

We can see that we have obtained a very close value to the above results by doing manual calculation.

# plotting the model's predictions on the test dataset

sns.set_style('whitegrid')
plt.scatter(X_test, y_test)
plt.plot(X_test, 6.926+0.055*X_test, 'r')
plt.xlabel('TV')
plt.ylabel('Sales')
plt.title('Linear Regression')
plt.show()

That’s a wrap, we have covered the math and the code behind linear regression using OLS method. If you have any questions or need further clarification, please feel free to ask in the comment section below. Your curiosity and engagement are highly valued.

Thank you for reading all along, subscribe to sapiencespace and enable notifications to get regular insights.

Click here to view all the concepts related to machine learning.

_{Cover picture and title image credits – unsplash content creators}

Want to dive deep into Linear Regression ? here are some Advanced Concepts:

Breaking down 3D Linear Regression

Linear Regression fails here….. is Ridge Regression a creative solution? Part-1

Breaking down Gradient Descent in 4 minutes

Stochastic Gradient Descent – SGD – Simpler Gradient Descent

Master the Magic of Mini Batch Gradient Descent

What’s your Reaction?

Insightful

Helpful

Amazing

Clap

Hi-fi

Recently Posted

Data Science & Programming

Spelling out Linear Regression in Code – a 101 guide

Leave a Reply Cancel reply

Recently Posted

Discover the Magic of Decision Trees in Code – 101

Share

Subscribe To Newsletter

Spelling out Linear Regression in Code – a 101 guide

Share this post!

Leave a Reply Cancel reply

Recently Posted

Discover the Magic of Decision Trees in Code – 101

Share

Subscribe To Newsletter

Home

Data Science & Programming

Book Summaries & Review

Personal Development