Mastering Neural Networks -3 – Back Propagation in Python – Harness the Magic

Reading Time: 8 minutes

This post at start helps you to build a neural network in python which exactly translates the math and logic in the previous post into code. The later part delves into building the intuition to transition into a much robust and efficient neural network with python code.

Let’s take the classification problem which we saw in the previous post, where the network predicts which class (0 or 1) does a given input belong to.

First we import the basic libraries and define elementary functions that is required for the network->

import numpy as np
import matplotlib.pyplot as plt


def sigmoid(x):
    return 1 / (1 + np.exp(-x))


def sigmoid_derivative(x):
    return x * (1 - x)

Just to stay on track initially, I’ll be using the same weights and biases as in the diagram above instead of randomly generating them. We will be explore random initialisation in the later part of the post where we will be building a dynamic and robust model.

class NeuralNetwork:
    def __init__(self):
        self.h2_error, self.h1_error = None, None
        self.o2_error, self.o1_error = None, None
        self.o2_out, self.o2_in = None, None
        self.o1_out, self.o1_in = None, None
        self.h2_in, self.h2_out = None, None
        self.h1_out, self.h1_in = None, None

        self.input = np.array([0.05, 0.10])
        self.output = np.array([0.01, 0.99])

        # Initialize weights and biases
        self.weights1 = np.array([[0.15, 0.20],
                                  [0.25, 0.30]])  # weights from input to hidden layer
        self.weights2 = np.array([[0.40, 0.45],
                                  [0.50, 0.55]])  # weights from hidden to output layer

        self.bias1 = np.array([0.35, 0.35])  # biases for hidden layer neurons
        self.bias2 = np.array([0.60, 0.60])  # biases for output layer neurons

Forward Propagation:

Since we’ll be carrying out simple computation, we have to adapt to matrix operations instead of extracting each row and column of the input, weights and biases. So if you’re not very familiar with matrices, I highly recommend you to go through basics of matrices such as representation, operations and rules from the internet.

    def forward_propagation(self):
        # Weighted sum of inputs for hidden layer neurons 
        # np.dot performs matrix multiplication
        self.h1_in = np.dot(self.input, self.weights1[:, 0]) + self.bias1[0]  # [:,0] extracts the 1st column of weight matrix
        self.h1_out = sigmoid(self.h1_in)

        self.h2_in = np.dot(self.input, self.weights1[:, 1]) + self.bias1[1] # # [:,1] extracts the 2nd column of weight matrix
        self.h2_out = sigmoid(self.h2_in)

        # Weighted sum of hidden layer outputs for output layer neurons
        self.o1_in = np.dot([self.h1_out, self.h2_out], self.weights2[:, 0]) + self.bias2[0]
        self.o1_out = sigmoid(self.o1_in)

        self.o2_in = np.dot([self.h1_out, self.h2_out], self.weights2[:, 1]) + self.bias2[1]
        self.o2_out = sigmoid(self.o2_in)

Backward Propagation:

        def back_propagation(self):
            learning_rate = 0.1
    
            self.o1_error = (self.o1_out - self.output[0]) * sigmoid_derivative(self.o1_out)
            self.o2_error = (self.o2_out - self.output[1]) * sigmoid_derivative(self.o2_out)
    
            # Hidden layer error
            self.h1_error = sigmoid_derivative(self.h1_out) * \
                            (self.o1_error * self.weights2[0, 0] + self.o2_error * self.weights2[0, 1])
            self.h2_error = sigmoid_derivative(self.h2_out) * \
                            (self.o1_error * self.weights2[1, 0] + self.o2_error * self.weights2[1, 1])
    
            # Update weights from hidden to output layer
            self.weights2[0, 0] -= learning_rate * self.o1_error * self.h1_out  # weight 5
            self.weights2[1, 0] -= learning_rate * self.o1_error * self.h2_out  # weight 6
            self.weights2[0, 1] -= learning_rate * self.o2_error * self.h1_out  # weight 7
            self.weights2[1, 1] -= learning_rate * self.o2_error * self.h2_out  # weight 8
    
            # Update weights from input to hidden layer
            self.weights1[0, 0] -= learning_rate * self.h1_error * self.input[0]  # weight 1
            self.weights1[1, 0] -= learning_rate * self.h1_error * self.input[1]  # weight 2
            self.weights1[0, 1] -= learning_rate * self.h2_error * self.input[0]  # weight 3
            self.weights1[1, 1] -= learning_rate * self.h2_error * self.input[1]  # weight 4
    
            # Update biases
            self.bias1[0] -= learning_rate * self.h1_error
            self.bias1[1] -= learning_rate * self.h2_error
            self.bias2[0] -= learning_rate * self.o1_error
            self.bias2[1] -= learning_rate * self.o2_error

Once forward and backward propagation’s logic is set, it’s time to train the model by calling them under a loop to run them iteratively for a fixed number of epochs.

    def train(self, iterations):
        for epoch in range(iterations):
            self.forward_propagation()
            self.back_propagation()

This is very basic approach to train the network, you can also visualise how the loss changes over epochs by just adding a few lines of code after defining an evaluation function like mean squared error.

def mean_squared_error(y_true, y_pred):
    return np.mean(np.power(y_true - y_pred, 2))
    
import matplotlib.pyplot as plt

    def train(self, iterations, print_interval=500):
        losses = []
        for epoch in range(iterations):
            self.forward_propagation()
            self.back_propagation()

            # Calculate the loss
            loss = mean_squared_error(self.output, [self.o1_out, self.o2_out])
            losses.append(loss)

            # Print the loss at regular intervals
            if epoch % print_interval == 0:
                print(f'Epoch {epoch}, Loss: {loss}')

        # Plot the loss over time
        plt.plot(losses)
        plt.xlabel('Epoch')
        plt.ylabel('Loss')
        plt.title('Loss over time')
        plt.show()

Finally we define a predict method that takes in an input and gives out the predicted result. It’s a very simple task, we just need to perform forward propagation with the tuned parameters.

    def predict(self, new_input):
        h1_in = np.dot(new_input, self.weights1[:, 0]) + self.bias1[0]
        h1_out = sigmoid(h1_in)

        h2_in = np.dot(new_input, self.weights1[:, 1]) + self.bias1[1]
        h2_out = sigmoid(h2_in)

        o1_in = np.dot([h1_out, h2_out], self.weights2[:, 0]) + self.bias2[0]
        o1_out = sigmoid(o1_in)

        o2_in = np.dot([h1_out, h2_out], self.weights2[:, 1]) + self.bias2[1]
        o2_out = sigmoid(o2_in)

        return o1_out, o2_out

# Initialize and train the neural network
nn = NeuralNetwork()
nn.train(10000)

# Test the neural network with a new input
new_input = np.array([0.05, 0.10])
print("Predicted output for input [0.05, 0.10]:", nn.predict(new_input))

This is the output when I run the code->

Predicted output for input [0.05, 0.10]: (0.06266065363539204, 0.9424850053177263)

When you run the altered train function, the plot will look this->

The output is really good and the loss is also very gradually decreasing, which is a good sign. The output can be made better by training it over 10000 epochs, by doing so we get->

Predicted output for input [0.05, 0.10]: (0.019674851657783092, 0.9805816157177215)

Now that we have understood how we can translate our logic into code, let’s dive straight into generalising the code and remove the redundancies. Because we are currently performing element wise operations instead of leveraging fast matrix multiplication and random number initialisation for the weights and biases.

First of all, we define the basic activation functions, gradient function and the loss function as we saw above.

import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x):
    return 1 / (1 + np.exp(-x))
    
def sigmoid_derivative(x):
    return x * (1 - x)

def mean_squared_error(y_true, y_pred):
    return np.mean(np.power(y_true - y_pred, 2))

Then instead of mentioning the weights and biases, we use numpy’s rand function to randomly generate them.

class NeuralNetwork:
    def __init__(self, learning_rate=0.1):
        self.h2_error, self.h1_error = None, None
        self.o2_error, self.o1_error = None, None
        self.o2_out, self.o2_in = None, None
        self.o1_out, self.o1_in = None, None
        self.h2_in, self.h2_out = None, None
        self.h1_out, self.h1_in = None, None
        
        self.input, self.output = None, None
        self.learning_rate = learning_rate

        # Initialize weights and biases
        # here n,m in rand is dimentions of the matrix to be created filled with random numbers
        self.weights1 = np.random.rand(2, 2)  # input to hidden layer
        self.weights2 = np.random.rand(2, 2)  # hidden to output layer
        self.bias1 = np.random.rand(1, 2)  # biases for hidden layer neurons
        self.bias2 = np.random.rand(1, 2)  # biases for output layer neurons

As you would have noticed, here we don’t initialise our input and output in the __init__ method. We will directly call the input in forward propagation and output in backward propagation as parameters so that we can directly call the forward_propagation function for inference.

    def forward_propagation(self, input_data):
        self.input = input_data
        self.h1_in = np.dot(self.input, self.weights1) + self.bias1 # the result is a 1x2 matrix which contains h1_in as the first element(0x1) and h2_in as the second element(0x2)
        self.h1_out = sigmoid(self.h1_in)
        self.o1_in = np.dot(self.h1_out, self.weights2) + self.bias2
        self.output = sigmoid(self.o1_in)
        return self.output

The input data is a numpy array instead of a python list. Instead of extracting each column from the input and weight matrix to perform dot product as earlier, we opt directly for a dot product of input and weight matrix for computational efficiency.

    def back_propagation(self, expected_output):
        self.error = expected_output - self.output
        self.o1_error = self.error * sigmoid_derivative(self.output)

        self.h1_error = np.dot(self.o1_error, self.weights2.T) * sigmoid_derivative(self.h1_out)

        # Update weights and biases
        self.weights2 += self.learning_rate * np.dot(self.h1_out.T, self.o1_error)
        self.bias2 += self.learning_rate * np.sum(self.o1_error, axis=0, keepdims=True)
        self.weights1 += self.learning_rate * np.dot(self.input.T, self.h1_error)
        self.bias1 += self.learning_rate * np.sum(self.h1_error, axis=0)

Here o2_error and h2_error follows the same logic as explained above and the values are squashed into a single variable which represents a matrix, here it is present in o1_error and h1_error with dimension 1×2.

The .T gives out the transpose of the matrix, i.e; the row gets shifted into columns and vice-versa. We do this for calculation purpose, because the weights directing to a neuron are stacked one over the other in our matrix. This will be clear when you debug back-propagation method above by adding these print statements->

print('e: ',self.error, self.error.shape)
print('o1e: ', self.o1_error, self.o1_error.shape)
print('w2: ',self.weights2, self.weights2.shape)
print('w2 transposed:', self.weights2.T, self.weights2.T.shape)

#output ->
e:    [[-0.79935138  0.15214919]] (1, 2)
o1e:  [[-0.1233413   0.02067051]] (1, 2)
w2:   [[0.21990302 0.3773528 ]
       [0.79445817 0.83928505]] (2, 2)
w2 transposed: [[0.21990302 0.79445817]
               [0.3773528  0.83928505]] (2, 2)

The parameters axis=0 and keepdims=True in the np.sum function are used to control the dimensions of the output array. Here’s an explanation of what they do and why they were included:

axis=0:
- This parameter specifies that the sum should be performed along the first axis (i.e., the rows). This is useful when you have multiple examples in your batch and you want to sum the errors across all examples for each neuron.
keepdims=True:
- This parameter ensures that the output array has the same number of dimensions as the input array. For example, if the input array has a shape of (batch_size, num_neurons), setting keepdims=True ensures the output will have the shape (1, num_neurons) instead of (num_neurons,).

Finally, we define the training and inference methods in our NeuralNetwork class.

    def train(self, input_data, output_data, iterations, print_interval=100):
        losses = []
        for epoch in range(iterations):
            self.forward_propagation(input_data)
            self.back_propagation(output_data)

            loss = mean_squared_error(output_data, self.output)
            losses.append(loss)
            # Print the loss at regular intervals
            if epoch % print_interval == 0:
                print(f'Epoch {epoch}, Loss: {loss}')

        # Plot the loss over time
        plt.plot(losses)
        plt.xlabel('Epoch')
        plt.ylabel('Loss')
        plt.title('Loss over time')
        plt.show()
      
      def predict(self, input_data):
        return self.forward_propagation(input_data)

This is what the loss plot looks like. (We get the exact result as above with fewer lines of code that also considers computational efficiency into factor)->

Okay, now that we are confident with building a neural network with backpropagation, in the upcoming post we will be scaling this network for a more complex problem. With the aim of taking a step at a time in mastering neural networks, we will choose a mild problem which involves predicting the class of a flower. If you are into data science and ML , there is a high chance that you would have come across the dataset knows as Iris. Which has 4 independent variables (Sepal Length, Sepal Width, Petal Length and Petal Width) and a dependent variable which is the class of the species.

There are multiple methods to predict the species like K Nearest Neighbours, Clustering algorithms, etc. But we will devise a simple yet effective neural network to predict the class of flower. So play with the neural network you’ve learned from this post by tweaking its parameters and using different activation functions. Stay tuned for the next post, until then happy coding.

Subscribe to sapiencespace and enable notifications to get regular insights.

Click here to view similar insights.

What’s your Reaction?

Insightful

Helpful

Amazing

Clap

Hi-fi

Recently Posted

Data Science & Programming

Mastering Neural Networks -3 – Back Propagation in Python – Harness the Magic

This post at start helps you to build a neural network in python which exactly translates the math and logic in the previous post into code. The later part delves into building the intuition to transition into a much robust and efficient neural network with python code.

Forward Propagation:

Backward Propagation:

Leave a Reply Cancel reply

Recently Posted

Can page-based indexing save Compute, Memory and Time in RAG(Retrieval Augmented Generation)? A comparative study in medical field

Share

Subscribe To Newsletter

Mastering Neural Networks -3 – Back Propagation in Python – Harness the Magic

This post at start helps you to build a neural network in python which exactly translates the math and logic in the previous post into code. The later part delves into building the intuition to transition into a much robust and efficient neural network with python code.

Forward Propagation:

Backward Propagation:

Share this post!

Leave a Reply Cancel reply

Recently Posted

Can page-based indexing save Compute, Memory and Time in RAG(Retrieval Augmented Generation)? A comparative study in medical field

Share

Subscribe To Newsletter

Home

Data Science & Programming

Book Summaries & Review

Personal Development