Discover the Magic of Decision Trees in Code – 101

decision tree

Reading Time: 4 minutes

In the 101 post explaining decision trees, we have delved into the need for decision trees and the nitty-gritty of the logic and steps needed to build decision trees. As promised in the previous post, we will be leveraging a module that does the heavy lifting of all the manual calculations required to build a decision tree.

So let’s get straight into the code.

nubelson fernandes UcYBL5V0xWQ unsplash

The Program/Code

  1. Import the library required to handle the dataframe, Since I have created the table above in excel, I saved it in the format and csv and imported it into ipynb file as dataframe.
import pandas as pd

2. Load the file and view it

df = pd.read_csv('/Users/adityabharathi/Desktop/decision_tree.csv')
df.head()
AgeIncomeStudentCredit RatingBuys Computer
0YouthHighNoFairNo
1YouthHighNoExcellentNo
2MiddleHighNoFairYes
3SeniorMediumNoFairYes
4SeniorLowYesFairYes

3. Since the Decision Tree Classifier cannot process text, we need to convert the text in each column to numbers, this process is know as labelling, or encoding, we can either do it manually by using the replace function or use a library from sklearn

from sklearn.preprocessing import LabelEncoder
df[['Age','Income','Student', 'Credit Rating', 'Buys Computer']] = df[['Age','Income','Student', 'Credit Rating', 'Buys Computer']].apply(LabelEncoder().fit_transform)

4. Generally in data science, we split the dataset into train and test to check the performance of the model on unseen/test data by preparing the model with the training data, but before that we split the attributes into inputs and target.

inputs = df[['Age','Income','Student', 'Credit Rating']]
target = df[['Buys Computer']]

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(inputs, target, test_size=0.2)

Now comes the core area, here we import the tree class from sklearn/ scikit learn to implement our decision tree.

from sklearn import tree

model = tree.DecisionTreeClassifier()
model.fit(X_train, y_train)
model.score(X_test, y_test)

The score comes out to be 0.3333333333333333, which implies that it is only 33% accurate, and it is wrong in 67% of the test cases. This does’nt imply that our model isnt performing well, it is just because the size of the dataset is small.

So to justify this I will use the breast cancer dataset from kaggle which has 569 rows.

df = pd.read_csv('breast_cancer_ds.csv')

# deleteing the unwanted columns
df.drop(columns=['id', 'Unnamed: 32'], inplace=True) 

# transforming the text in diagnosis column
df[['diagnosis']] = df[['diagnosis']].apply(LabelEncoder().fit_transform)

inputs = df.drop(columns=['diagnosis']) # all the attributes except our target variable
target = df[['diagnosis']]

X_train, X_test, y_train, y_test = train_test_split(inputs, target, train_size=0.8)
model = tree.DecisionTreeClassifier()
model.fit(X_train, y_train)
model.score(X_test, y_test)

The output is 0.9473684210526315, which implies that the model is correct 94 times in 100, comparatively it is a great score. This can further be optimised, or RandomForest can be used to get higher scores.

That concludes our discussion. I believe that the decision tree series posts has provided you with a comprehensive understanding of decision trees in the realm of machine learning, as well as the fundamental insights that underpin them. I encourage you to work out the problem outlined in this post by yourself to gain a deeper comprehension of decision trees. Additionally, consider implementing the program by conducting Exploratory Data Analysis (EDA) to better grasp the data. Your dedication to furthering your knowledge in this domain is commendable.


If you have any questions or need further clarification, please feel free to ask in the comment section below. Your curiosity and engagement are highly valued. Click here to view all the concepts related to machine learning.

Thank you for reading all along, subscribe to sapiencespace and enable notifications to get regular insights.

What’s your Reaction?
Like
2
Like
Insightful
3
Insightful
Helpful
7
Helpful
Amazing
5
Amazing
Clap
5
Clap
Hi-fi
3
Hi-fi

Leave a Reply

Your email address will not be published. Required fields are marked *

Recently Posted

Share

Subscribe To Newsletter

Search

Home