
Reading Time: 10 minutes
In this post I’ll guide you through the amazing working of neural network with complete explanation of the logic and the program that you will be building along with me in python to create a simple neural network.
In the Neural network 101 post, we delved into some of the most basic concepts associated with deep learning and understood the core topics the build the basis of artificial neural networks.
I highly recommend you to go through the previous post on neural networks if you have not yet seen it, so that it gives you a good basic perspective. If you already have an idea of neural networks, it’s absolutely fine you can continue reading this post.
So, first things first. All the neurons connectedly forms the network/perceptron, but the basis is the weights and biases that operate and build the learning capacity in the neural network. I initially found this concept very confusing, you will have most likely first got introduced to some basic concept, internal workings and coding this will make you feel like getting stuffed with code built upon tensorflow or pytorch.
But the coding will be very simple, you import a few libraries, adjust the hyper-parameters and that’s it, your network will work. And as the problem gets complex or your network needs to be more powerful, you have libraries on top of that does the job in a very simple and efficient manner.
Now I am not telling that the libraries I mentioned above is making you lazy or you should program the entire logic, you have to use those libraries in practical use cases. But I was very curious to understand the internal working mechanism of how the weights and biases got updated and what the activation function logically does to the calculations. So that’s when I decided to learn the maths and framework of neural network, I went through a lot of blogs, videos to understand neural networks, the task was a bit difficult and frustrating at times but finally I connected the dots and built a decent understanding.
This post is a collection of all those understanding I made along the small journey I travelled, I will try my best to make this concept as easy to grasp and simple to implement as possible.

Linearly Separable vs Non-Linearly Separable data
Linearly separable data refers to a scenario in which two groups of data points can be clearly divided using a single straight line. Imagine plotting points on a graph, where each point belongs to one of two categories. If you can draw a line on this graph that separates all points of one category from all points of the other without any overlap, then the data is considered linearly separable.
If the data is more heterogeneous and distributed amongst each other, then a complex network with multiple layers is required to build a model. And here is where we use back-propagation and the concepts of gradient descent to solve the problem.
I know that a lot of questions are running in your mind, this is just a general perspective of the type of neural network we need to build according to the problem. We’ll have a deep walkthrough to understand them clearly.
Perceptron Training Rule – Linearly separable data – Forward Propagation
The Perceptron training rule is used when our data is linearly separable and the neural network consists of only two layers which is the input and output layer, and no hidden layers. This is also the reason why no back propagation is needed to train the network as it is relatively not complex. Here the error or the deviation of the calculated output from the expected output is calculated and multiplied to the input layer’s along with a learning rate.
Let’s go thorugh two examples to unserstand the concept which I have mentioned above.
In the first example, we delve into a simple 2 layer neural network where there are two nodes/perceptron in the input layer and one node in the output layer.
The goal here is to train the model to predict the binary output (0 or 1) for the given binary input, (we are training the model to learn the or gate logic)

Of course this task can be done using simple if-else conditions, but our objective here is to understand the functioning of a simple neural network in the most easiest way possible. In the next example we will look at a network with complex inputs and hidden layers.
import numpy as np
class Perceptron:
def __init__(self, input_size):
# Initialize weights and bias randomly
self.weights = np.random.rand(input_size)
self.bias = np.random.rand()
def predict(self, inputs): # threshold activation function
weighted_sum = np.dot(inputs, self.weights) + self.bias
if weighted_sum >= 0:
return 1
else:
return 0
First we initialise the random weights and bias and then define an activation function in which we multiply the input, a random weight and add the bias to the product. If this value is greater than 0 then the output is one(a threshold) else it is zero.
Then during the training process, we basically perform two operations:
- The weights are added each time to the previous weight + (learning_rate x error x current input), here the error is the difference of actual value and predicted value.
- The bias is added to the previous bias + (learning_rate x error)
By doing this process repeatedly for a given set of inputs and outputs with a learning rate, then the model finally has a network with weights and biases which performs tasks according to our necessity.
def main():
# Training data (example: OR gate)
inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
targets = np.array([0, 1, 1, 1]) # OR gate output
input_size = inputs.shape[1]
# Create a perceptron with the specified input size
perceptron = Perceptron(input_size)
# Train the perceptron using the perceptron learning rule
epochs = 10
learning_rate = 0.1
for epoch in range(epochs):
for i in range(len(inputs)):
# Forward propagation
prediction = perceptron.predict(inputs[i])
# Update weights and bias based on the perceptron learning rule
error = targets[i] - prediction
perceptron.weights += learning_rate * error * inputs[i]
perceptron.bias += learning_rate * error
# Test the trained perceptron
test_data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
for test_input in test_data:
prediction = perceptron.predict(test_input)
print(f"Input: {test_input}, Predicted Output: {prediction}")
if __name__ == "__main__":
main()
You can assume each element of the inputs array to be as two neurons or perceptrons of the input layer and the corresponding output is stored in the targets array.
If you print the values of initial and updated weights and biases, you can see the changes ->
Initial value of weights and biases: [0.86988607 0.16502525] 0.8461224595426543
Updates value of weights and biases: [0.86988607 0.16502525] -0.05387754045734561
While this is cool and easy to implement, tasks such as natural language understanding, reasoning, and problem-solving often require more sophisticated approaches than simple feedforward computation. In some tasks, the input data may be dynamic or evolving over time, requiring models that can adapt to changing input distributions. Forward propagation typically assumes static input-output mappings and may struggle with dynamic inputs.
So, to deal with this bottleneck, we have something called the “Back-propagation” technique, it is a more general and powerful algorithm used for training multi-layer neural networks. It can handle both linear and non-linear problems.
- Learning Process: Back propagation involves two phases: forward pass and backward pass. In the forward pass, inputs are passed through the network to get the output. During the backward pass, the error between the predicted output and the actual output is calculated, and this error is propagated back through the network to update the weights. This involves the use of the chain rule to compute gradients efficiently.
- Applicability: Back propagation can train networks with multiple hidden layers (deep neural networks), allowing them to learn complex patterns and solve non-linear problems, whereas it is not possible in forward propagation
Summary of Differences:
Aspect | Perceptron Learning | Backpropagation |
---|---|---|
Problem Types | Linearly separable binary classification | Linear and non-linear, including multi-class classification |
Network Depth | Single-layer (can handle only very simple problems) | Multi-layer (deep networks) (can handle complex problems) |
Weight Update | Directly based on error | Based on gradient descent and chain rule |
Learning Process | Only forward pass with direct weight update | Forward pass and backward pass (error propagation) |
Now let’s walkthrough through a back propagation technique which is very good in predicting iris flower class based on petal length, petal width, sepal length and sepal width. This code is not built from scratch, I will guide you through a program that leverages keras library to performs prediction using two simple layers, so that you get a bird’s eye perspective of what’s happening inside back-propagation technique. In the next post of this series, I will post a complete detailed explanation of back propagation with coding from scratch explaining all the intricacies.
Sample data from iris dataset ->

Code to create neural network for predicting iris flower class ->
# Import necessary libraries
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Encode labels to categorical variables
encoder = OneHotEncoder(sparse=False)
y = encoder.fit_transform(y.reshape(-1, 1))
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Here we encode all the values in the dataset, as neural networks are efficient in processing numerical values in tensors or arrays. And we split the dataset to later test the credibility of the model’s performance using unseen data. That is we train the network only with X_train and y_train.
# Define the neural network model
model = Sequential([
Dense(10, input_shape=(4,), activation='relu'), # Input layer + hidden layer with 10 neurons
Dense(3, activation='softmax') # Output layer with 3 neurons for 3 classes. Softmax is a activation function that is used in multiclass classification's last layer to find the probability for each class's chance of coming as output
])
# Compile the model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Train the model
model.fit(X_train, y_train, epochs=100, batch_size=5, verbose=1) # verbose shows the training in each step in your output console
adam
: This specifies the optimisation algorithm to use for minimising the loss function during training. Adam is a popular gradient-based optimisation algorithm that adjusts the learning rate during training for each weight, which helps in faster convergence. It’s well-suited for problems that are large in terms of data and/or parameters.- categorical_crossentropy: This is the loss function that the model aims to minimise. Categorical cross entropy is commonly used for multi-class classification problems. It measures the dissimilarity between the true label distribution and the predictions (the closer the predictions to the true labels, the lower the loss).
- accuracy: Metrics are used to evaluate the performance of your model. Accuracy is one of the most common evaluation metrics, used for classification problems. It calculates the proportion of correctly predicted labels over all predictions. Note that metrics are not used during the training process to update the model (that’s the role of the loss function); instead, they are used to monitor the model’s performance.
- Relu is a activation function that is commonly used in hidden layers of neural networks. It outputs the input directly if positive, otherwise, it outputs zero. Relu offers better performance and convergence in practice for deep networks.
# Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {test_accuracy:.4f}")
# Example: Make a prediction
new_data = np.array([[5.9, 3.0, 5.1, 1.8]]) # New sample data
prediction = model.predict(new_data)
print(prediction)
predicted_class = np.argmax(prediction, axis=1)
print(predicted_class)
print(f"Predicted class: {iris.target_names[predicted_class][0]}")
Output:
Test accuracy: 0.9667
1/1 [==============================] – 0s 29ms/step
[[2.3528730e-04 2.5498772e-01 7.4477702e-01]]
[2]
Predicted class: virginica
NOTE: Here each number of the array represents the probability of belonging to a specific class, since the largest probability is 0.744 %, the predicted class is Virginia.
A 0.966 test accuracy denotes that the model is correct 96 out of 100 times or accuracy is 96%.
Thank you for joining me on this journey. I hope that you’ve gained some insightful knowledge from this post. I encourage you to experiment further—play with the network’s parameters, explore different activation functions, and see what new outcomes you can discover. I’d love to hear about your experiments, findings, and any insights you might have, so please share them in the comments below.
Stay tuned for an upcoming post dedicated entirely to back-propagation. It promises to be an enlightening read, offering deeper dives into the mechanics and nuances of complex neural networks.
Subscribe to sapiencespace and enable notifications to get regular insights.
Click here to view similar insights.