Decision Trees Simplified

dynamic wang KZuZAaEojq4 unsplash

Reading Time: 8 Minutes

In the vast realm of machine learning, decision trees stand out as a versatile and comprehensible tool that mirrors human thinking capabilities. They provide a structured approach to problem-solving and are a valuable addition to the data scientist’s toolkit. In this blog post, we’ll embark on a journey to uncover the inner workings of decision trees, demystifying the process and terminology associated with this powerful algorithm.

Decision Tree

The Theory and Logic

The Building Blocks of Decision Trees:

At its core, a decision tree is a supervised machine learning technique, primarily used for classification tasks. It takes labeled data, where categories are known, and predicts the target variable. The magic happens by constructing a decision tree where internal nodes represent features, branches convey decision rules, and leaf nodes yield outcomes.

The Essence of Decision Trees:

Imagine decision trees as a way to build a graph that captures all possible solutions to a problem based on given conditions. The tree is constructed by asking yes/no questions and further branching into subtrees until we reach a definitive answer. This method is not just robust, but it’s also incredibly transparent and comprehensible.

The Power of Simplicity:

One of the most significant advantages of decision trees is their simplicity. They provide a direct route to solutions by mimicking human decision-making. In an age of complex algorithms and black-box models, decision trees offer clarity and transparency. This simplicity makes them a valuable tool for both beginners and experts in machine learning.

Key Terminologies:

Let’s familiarise ourselves with some essential terminologies linked to decision trees:

  1. Root Node: This is the top node of the tree, representing the entire dataset.
  2. Internal Nodes: These nodes split the data into subsets based on a particular feature.
  3. Branches: Decision rules that guide us from one node to another.
  4. Leaf Nodes: The final outcomes or predictions reside in these nodes.

Attribute Selection Method (ASM):

Now, the pressing question: How do we choose attributes at each level of the tree? Here, we introduce the Attribute Selection Method (ASM), a crucial step in constructing a decision tree. ASM helps us decide which features are the most informative for making decisions. Common ASM techniques include Gini impurity, information gain, and gain ratio, among others.

In this post, we will be leveraging the use of entropy and information gain.

Entropy serves as a fundamental concept in machine learning, denoting the extent of disorder within a dataset. Within this context, it functions as a representation of the impurity present in a given attribute. When an attribute contains a greater number of categories, the entropy value increases, and conversely, it decreases when the attribute exhibits fewer categories. It lies in the range of 0 to 1.

Information Gain quantifies the alteration in entropy subsequent to the segmentation of a dataset based on a particular attribute. In essence, Information Gain provides a measure of the valuable insights on attribute regarding the classification of data points into their respective classes.

With this framework in mind, our primary objective revolves around the selection of the most suitable attribute as the parent node for decision tree construction. This selection is guided by the pursuit of the attribute that promises the highest Information Gain, thereby maximising the informational content it contributes to the classification process.

Constructing a Decision Tree:

The process of building a decision tree involves several steps, with ASM being a pivotal one. Here’s a simplified algorithm to construct a decision tree:

  1. Start with the entire dataset at the root node.
  2. Select the best attribute (using ASM) to split the data into subsets.
  3. Create child nodes for each outcome of the selected attribute.
  4. Recursively repeat steps 2 and 3 for each child node until a stopping condition is met (e.g., a maximum depth or purity threshold).

Now, lets take this example to understand the logic behind decision tree.

In this case our goal is to predict whether a person with details on Age, Income, Occupation, Now lets understand the math and calculation behind forming the decision tree.

IMG EEE47F9A82A7 1

here Pi is the probability of a certain category occurring from an attribute, you will get a detailed understanding when we delve into the calculation section.

Initially we need to find the entropy of our target variable which is Buys Computer, lets name it BC for simplified usage.

The sum of probabilities is quite intuitive, but questions may arise regarding why log is used, logarithms help to quantify and compare the amount of information and uncertainty in various situations.

IMG 033212CB2EF3 1
IMG 061C63B9510F 1

To understand what this 0.9402 actuall means we need to go back to definition of entropy given above. So as the entropy value increases from 0 to 1, it indicates a higher level of uncertainty and impurity in the dataset.

This means that the elements are distributed among multiple categories or classes, and there is no clear majority class. An entropy of 1 (1.00) represents a state of maximum impurity, where elements are equally distributed among different categories. If there were 7 yes and 7 no, then the entropy would have turned out to be 1. You can do the math if you want to verify it, log(7/14) to the base 2 is -1.

Now once we have placed our root node, lets calculate the information gain of other attributes given in the dataset to generate the next level.

IMG CBAC74E83326 1

Information gain for Age = 0.246

So if we perform the same procedure for the rest of the attributes, we’ll get the following values:

Information gain for Income = 0.029

Information gain for Student = 0.151

Information gain for Credit Rating= 0.048

Therefore the order in which we need to choose the attributes for the root node are Age>Student>Credit Rating>Income. So this is how the tree would look like:

Now, Middle Aged people buys a computer in all cases in the given dataset, so we stop the expansion of Middle by ‘Yes’.

JPEG image 36732DA743A2 1

Now for Youth, we only take the other attributes where age is youth.

Screenshot 2023 11 01 at 7.23.56 PM

Now, Taking Age as the main attribute, we have to calculate the entropy and then the information of other attributes with respect to Age, as we have done above.

If we do the calculations, then we get to know that Student attribute has the highest information gain amongst Income and Credit Rating. So the Tree would look like this->

JPEG image 9DEE6C62E179 1 1

At last for the Senior Age, a similar approach has to be followed and the best attribute has to be chosen, in this case it turns out to be Credit Rating.

tree

So this is how the decision tree would look like after we sketch it out based on our findings on information gain.

Now for real world applications, we don’t need to code functions that calculates entropy and information gain in iteration to all nodes, fortunately there is a library in sklearn called tree from which we can import DecisionTreeClassifier and perform create our decision tree.

nubelson fernandes UcYBL5V0xWQ unsplash

The Program/Code

Here is the code with explanation→

  1. Import the library required to handle the dataframe, Since I have created the table above in excel, I saved it in the format and csv and imported it into ipynb file as dataframe.
import pandas as pd

2. Load the file and view it

df = pd.read_csv('/Users/adityabharathi/Desktop/decision_tree.csv')
df.head()
AgeIncomeStudentCredit RatingBuys Computer
0YouthHighNoFairNo
1YouthHighNoExcellentNo
2MiddleHighNoFairYes
3SeniorMediumNoFairYes
4SeniorLowYesFairYes

3. Since the Decision Tree Classifier cannot process text, we need to convert the text in each column to numbers, this process is know as labelling, or encoding, we can either do it manually by using the replace function or use a library from sklearn

from sklearn.preprocessing import LabelEncoder
df[['Age','Income','Student', 'Credit Rating', 'Buys Computer']] = df[['Age','Income','Student', 'Credit Rating', 'Buys Computer']].apply(LabelEncoder().fit_transform)

4. Generally in data science, we split the dataset into train and test to check the perfomance of the model, since our dataset is small, we will directly proceed with the model. But before that we split the attributes into inputs and target.

inputs = df[['Age','Income','Student', 'Credit Rating']]
target = df[['Buys Computer']]
from sklearn import tree

model = tree.DecisionTreeClassifier()
model.fit(X_train, y_train)
model.score(X_test, y_test)

The score comes out to be 0.3333333333333333, which implies that it is only 33% accurate, and it is wrong in 67% of the test cases. This does’nt imply that our model isnt performing well, it is just because the size of the dataset is small.

So to justify this I will use the breast cancer dataset from kaggle.

df = pd.read_csv('breast_cancer_ds.csv')

# deleteing the unwanted columns
df.drop(columns=['id', 'Unnamed: 32'], inplace=True) 

# transforming the text in diagnosis column
df[['diagnosis']] = df[['diagnosis']].apply(LabelEncoder().fit_transform)

inputs = df.drop(columns=['diagnosis']) # all the attributes except our target variable
target = df[['diagnosis']]

X_train, X_test, y_train, y_test = train_test_split(inputs, target, train_size=0.8)
model = tree.DecisionTreeClassifier()
model.fit(X_train, y_train)
model.score(X_test, y_test)

The output is 0.9473684210526315, which implies that the model is correct 94 times in 100, comparatively it is a great score. This can further be optimised, or RandomForest can be used to get higher scores.

That concludes our discussion. I trust that this presentation has provided you with a comprehensive understanding of decision trees in the realm of machine learning, as well as the fundamental insights that underpin them. I encourage you to work out the problem outlined in this post by yourself to gain a deeper comprehension of decision trees. Additionally, consider implementing the program by conducting Exploratory Data Analysis (EDA) to better grasp the data. Your dedication to furthering your knowledge in this domain is commendable.


If you have any questions or need further clarification, please feel free to ask in the comment section below. Your curiosity and engagement are highly valued. Click here to view all the concepts related to machine learning.

Thank you for reading all along, subscribe to sapiencespace and enable notifications to get regular insights.

😀
0
😍
0
😢
0
😡
0
👍
0
👎
0

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe To our Newsletter

Subscription Form

Recently Posted

Share

Subscribe To Newsletter

Search

Home