A 101 of Neural Networks in AI

Reading Time: 10 minutes

In the world of deep learning, where complex data and patterns are unraveled, neural networks are the cornerstone of understanding. Imagine a neural network as a digital replica of the human brain, where every cell nucleus is akin to a computational node, and these nodes, much like neurons in our brain, are connected through synapses.

These synapses, equipped with weights meticulously assigned based on a loss function, play a pivotal role in information processing and decision-making within the network. As we explore further, we’ll delve into the fascinating journey of neural networks, from their basic building block, the perceptron, to the intricacies of multi-layer networks, and the indispensable process of backward propagation.

It’s a journey that encompasses not just the model’s creation but also its refinement, much like testing and adjusting biases and functions to achieve remarkable results.

Some of the well Known domains that are adapting neural networks are:

Healthcare:
- Diagnosis of Diseases: In radiology, convolutional neural networks (CNNs) are used to detect and diagnose diseases like cancer from medical images. IBM’s Watson for Oncology, powered by deep learning, assists doctors in providing more accurate treatment recommendations.
Language Translation:
- Google Translate: Google’s neural machine translation model, GNMT, uses recurrent neural networks (RNNs) and attention mechanisms to provide accurate translations between languages.
Autonomous Vehicles:
- Self-Driving Cars: Companies like Tesla and Waymo use neural networks for object detection, path planning, and decision-making, making autonomous vehicles safer and more reliable.
Entertainment:
- Content Recommendation: Streaming platforms like Netflix and Spotify leverage neural networks to recommend personalised content to users based on their preferences and viewing history.
Agriculture:
- Crop Monitoring: Satellite imagery and deep learning help farmers monitor crop health, predict yields, and optimise irrigation and fertilisation.

A neural network is a computer model based on the human brain’s structure. It’s composed of interconnected nodes (neurons) organised into layers. Information flows from one layer to the next, with each neuron performing a simple mathematical operation with a function(this is known as an activation function). But prior to that the input is created by adding bias to the product of input and weight.

Without the activation function, the model cannot decode complexities associated with data and will only lead to a simple regression based model.

The data is fed into the input layers which further takes it into n number of hidden layers

The activation can be linear(output is proportional to input) and non-linear(curved space) (to capture the nuances in data).

These networks are trained using data, adjusting the strengths of connections (weights) between neurons to learn patterns, make predictions, or solve tasks. The process involves forward propagation (information passing through the network) and backward propagation (adjusting weights to minimise errors).

Activation function is a gate that decides whether a neuron should “fire” or be activated based on the input it receives. It takes the weighted sum of the inputs and adds a bias(it is assigned to get a desired output), and then applies a mathematical function to this sum. The result of this function determines if the neuron should be activated or not.

Types of Activation function:

Binary step function : it defines that a neuron should be activated only when a specified threshold is met.
Linear function: there is a problem associated with step functions, which is vanishing gradient descent problem, which mean that when a loss function (aims at minimising the errors in prediction) is used the function gets maximised or minimised which in the above case will lead to the step function amounting to zero. So the need to linear function comes in. If y=2x, then this is how the graph would look like :
ReLU (Rectified Linear Unit) : It takes an input and outputs the maximum of that value and 0. Basically it will eliminate the negative values that gets fed into the function. It is widely used in image processing.
Logistic function/ Sigmoid function: the function transforms the input into a range of values between 0 and 1, which can also be interpreted as probabilities.
Tanh funciton -unlike the sigmoid function, here the the range function lies between -1 to 1, making the function symmetric around 0.

There are a lot of other transformations that are used as activation functions, each of which serve a specific purpose and suit the requirements of the model. For instance where a model needs to predict whether an input falls in one of the five categories. here we can’t use the sigmoid function, as it yields results optimised only for binary classification, so this case we have to choose something like softmax, which basically gives us the probabilities of each categories for prediction.

Now that we have understood the most basic concepts in a neural network, let’s delve into some deeper segments of this world, which is used in real life applications and the concepts that support those systems.

Architectures

There are a lot of architectures of neural networks that suits for a specific task, such as feedforward neural networks( used for classification and regression tasks), convolutional neural networks (CNNs) for image processing, and recurrent neural networks (RNNs) for sequential data.

Feedforward Neural Networks (FNNs)

It is a very basic structure that is used for classification/regression. They consist of an input layer, one or more hidden layers, and an output layer. Use functions like ReLU, sigmoid, or tanh to introduce non-linearity.

Limitation of FNN – No Feedback: Unlike recurrent networks, FNNs do not have connections that loop back to keep track of the patterns in data.

Convolutional Neural Networks

They are used specifically for processing images and audio spectrograms which are basically grid like data.

The logic here is to identify the spatial and temporal patterns in the data fed into it. Primarily there are 3 layers which forms the basis of a CNN.

This is just a glimpse of a what a CNN does in it’s layer, a post covering all the detailed aspects along with a practical example will be uploaded soon.

Recurrent Neural Networks

Used to handle sequential data like text and speech. It is used in NLP (Natural language Processing) tasks like language modeling, machine translation and speech recognition. It is also used in time-series analysis and weather forecast.

Unlike feedforward neural networks, RNNs have connections that loop back on themselves, allowing them to maintain a form of memory and process sequences of data.

Loss Functions

As the name suggests, loss functions aims at minimising the error between the actual values and predicted values, it is commonly used in training neural networks, such mean squared error (MSE) and cross-entropy.

Overfitting

Overfitting can be one of the most challenging situations when developing a neural network based model, it is a scenario in which the the model performs so well on training data, but performs bad on new data, it happens due to several factors, such as:

Complex Model: Using a model that is too complex for the given dataset can lead to overfitting. The model may have too many parameters, allowing it to fit the training data too closely.
Insufficient Data: When the training dataset is small, the model may memorize the data rather than learning generalizable patterns. This can result in overfitting.
Noisy Data: Data with errors, outliers, or inconsistencies can mislead the model and lead to overfitting.

The techniques used to overcome this issue are regularisation (prevents in giving too much importance to certain feature), including dropout (random neurons are dropped off to increase robustness), and early stopping (stops the training of training data, if the performance with the validation data deceases a down a certain threshold).

Preprocessing

In the world of machine learning, all the textual data, images and audio cannot be directly understood by the models used, so we convert the data into tensors (commonly kwon as encoding), some of the most commonly used techniques include normalisation, scaling, and one-hot encoding.

Validation and Testing

It is one of the most important phases of developing a neural network based model, it helps in creating unbiased and efficient model, that performs well on new data inputs.

For example, in Large Language Models, from the generated output, the model’s performance metrics are calculated using:

ROUGE-1 (Recall oriented under study for gesting evaluation) measure the number of words/unigrams that matches between the reference and the generated output.
Precision is a measure of the accuracy of positive predictions made by a model.
F1 considers both false positives and false negatives, providing a single value that represents the model’s overall performance. ROUGE-1 = unigram matches/unigrams in reference for example: Reference: It is rainy in the city Generated Output: It is extremely rainy in the city ROUGE-1 (or recall) = 5/5 = 1 Precision = unigram matches/unigrams in output = 5/6 = 0.833 F1 = (precision x recall) / (precision + recall) = 0.833*1/1+0.833 = 0.4545 These numbers, can be useful to alter the weights of the network and prune away the unnecessary layers or nodes.

Ethical Considerations

In the realm of neural networks, ethical considerations occupy a crucial and evolving space. The deployment of these models increasingly requires careful scrutiny to ensure adherence to ethical principles.

With a growing emphasis on ethical AI, a multitude of stakeholders, from data scientists to policymakers, work together to steer neural network-based models along paths that align with moral standards.

Neural networks must be programmed to act ethically, delivering results that are genuinely helpful, truthful, and devoid of any harm. The pursuit of ethical AI not only safeguards against the misuse of technology but also fosters trust and credibility, vital for the widespread acceptance and responsible application of neural networks in diverse sectors.

That’s is a wrap, Thank you for reading all along. We have just scratched the surface of neural networks, and there is a lot to uncover, so stay tuned and subscribe to sapiencespace and enable notifications to get regular insights.

Click here to view similar insights.

What’s your Reaction?