Spelling out Linear Regression in Code – a 101 guide

ilgmyzin 2tHmqTooXaE unsplash

Reading Time: 5 Minutes

In this post I aim to spell out the complete logic and math into code, so that you can implement the concept of Linear Regression as discussed in the previous post on sapiencespace. This will help you develop a deeper understanding into the fundamentals and become well versed in choosing the right framework for a given Machine Learning problem.

nubelson fernandes UcYBL5V0xWQ unsplash

The Program/Code

First, we import the necessary libraries and load our dataset using pandas’ .read_csv() method. The dataset which I have used for this project will be available here.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

ad = pd.read_csv('data.csv')
df = ad[['TV','Sales']]
df.head()
TVSales
0230.122.1
144.510.4
217.212.0
3151.516.5
4180.817.9

Let’s keep the variables as x, y for simplified view

df = df.rename(columns={'TV':'x','Sales':'y'})

Here the table is expanded for the terms which are plugged into the final equations to find the least square estimators (slope and intercept) which we derived in the last post.

df["x-x'"] = df.x - np.mean(df.x)
df["y-y'"] = df.y - np.mean(df.y)

df["(y-y')*(x-x')"] = df["y-y'"]*df["x-x'"]
df["(x-x')^2"] = df["x-x'"]**2
df.head()
xyx-x’y-y’(y-y’)*(x-x’)(x-x’)^2
0230.122.183.05756.9695578.8692466898.548306
144.510.4-102.5425-4.7305485.07729610514.964306
217.212.0-129.8425-3.1305406.47194616859.074806
3151.516.54.45751.36956.10454619.869306
4180.817.933.75752.769593.4913961139.568806
m = sum(df["(y-y')*(x-x')"])/sum(df["(x-x')^2"])
m

0.055464770469558805

c = np.mean(df.y - m*(np.mean(df.x)))
c

6.974821488229896

So our final equation is y=6.974+0.0554*x

This is the code to plot the line which we have found, here I have generated an array of 100 evenly spaced values between 1 and 10, inclusive, to feed in the values to obtain y (dependent variable). This line can be plotted by feeding in the training data which we used to visualize the effectiveness of the model.

x = np.linspace(1,10,100)

plt.plot(x, 6.974+0.0554*x)
plt.show()
linear regression

Now that we have understood the math and its code, we come to the fun part. Instead of doing all the calculation manually and implementing the math step by step, we can directly import a library which performs linear regression in python, it is statmodels.api.

First we need to split the dataset into test and train, so that the accuracy of the model can be calculated on the unseen test data and training only happens on the train data,

X = ad['TV']
y = ad['Sales']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3) # 30% is test and 70% is train
import statsmodels.api as sm
X_train_sm = sm.add_constant(X_train)
lr = sm.OLS(y_train, X_train_sm).fit() # OLS is ordinary least square

lr.params

const 6.926214

TV 0.055278

dtype: float64

We can see that we have obtained a very close value to the above results by doing manual calculation.

# plotting the model's predictions on the test dataset

sns.set_style('whitegrid')
plt.scatter(X_test, y_test)
plt.plot(X_test, 6.926+0.055*X_test, 'r')
plt.xlabel('TV')
plt.ylabel('Sales')
plt.title('Linear Regression')
plt.show()
linear regression

That’s a wrap, we have covered the math and the code behind linear regression using OLS method. If you have any questions or need further clarification, please feel free to ask in the comment section below. Your curiosity and engagement are highly valued.

Thank you for reading all along, subscribe to sapiencespace and enable notifications to get regular insights.

Click here to view all the concepts related to machine learning.

Cover picture and title image credits – unsplash content creators

Want to dive deep into Linear Regression ? here are some Advanced Concepts:

What’s your Reaction?
Like
2
Like
Insightful
3
Insightful
Helpful
7
Helpful
Amazing
5
Amazing
Clap
5
Clap
Hi-fi
3
Hi-fi

Leave a Reply

Your email address will not be published. Required fields are marked *

Recently Posted

Share

Subscribe To Newsletter

Search

Home