Secret of AI – Why Logarithms matter – Unlock the Mystery by Playing a Game

Reading Time: 3 Minutes

We all know Logarithms as a math function that has a parabolic graph, but most of us don’t know what is its purpose and how it serves as a crucial tool in most fields. In this post, we’ll delve into the logic behind logarithms and how it can specifically be used in AI.

This is a graph of natural log – log with base e(2.71828) given input from 0 to infinity. This at first glance looks like its just another transformation function, but there is more to it.

Logarithms helps us answer questions like-

How many times do I have to multiply n to get m –
- example – How many times do I multiply 2 to get 8 – the answer is 3 (log₂(8) = 3). But the lesson here is that a big number can be compressed and represented as a smaller number.
- log₂(32) is basically asking how much times should I multiply 2 with itself to get 32, the answer is 5.
Shrinking big numbers: log(1,000,000) = 6
They turn multiplication into addition – log(n x m) = log(n) + log(m) – helps in reducing the computational cost and efficient processing.

From the above points and nature of log, you would come to an understanding that negative numbers and zero can’t be given as input as anything raised to the power cant be negative, is it????

What about -2^3 = -8 ??? Ok so log_-2( -8)=3. Does this mean that its valid for all, unfortunately no. The range of logarithms is real numbers. what happens when (-2)^0.5, it is √(-2), which is an imaginary number. So the input values are in the domain of (0, +∞).

Some of the common use of logarithm in real world are:

Richter Scale for Earthquakes – Earthquake energy varies hugely, and the Richter scale compresses that using base-10 logarithms. A magnitude 6 quake is 10 times stronger than a magnitude 5 in terms of ground movement.
Algorithm efficiency: Big-O notation uses logs for certain algorithms like binary search: O(log⁡n)
Photography and Light Exposure: The exposure value (EV) system in cameras is based on powers of 2, and logarithms help adjust shutter speed, aperture, or ISO in consistent steps.

How does Logarithms come to use in AI?

Before we go into terminologies and techniques in AI that use logarithms, I want you to develop an intuition of how logarithms is used in a simple game.

Let’s take an example of a guessing game where there are three colored balls (red, blue and green) and the goal is to predict the color of a randomly chosen ball by your friend.

Conditions of the game

Guessing with Confidence:
- If you are very confident that the ball will be red, you might guess red with 90% confidence, blue with 5% confidence, and green with 5% confidence.
- If you are not very sure, you might guess red with 40% confidence, blue with 30% confidence, and green with 30% confidence.
Winning and Losing Points:
- If the ball is indeed red, you win points. The more confident you were (higher percentage), the more points you win.
- If the ball is blue or green, you lose points. The more confident you were about the wrong colour (higher percentage), the more points you lose.

Log(1) is 0, so as you make confident predictions, you will be provided with more score, but if you make wrong predictions with high confidence, say that the current chosen ball is green with 90% surety and red with 5% and blue with 5%.

Now after revealing, the actual ball is red. But the since the confidence is just 5% then log(0.05) is -2.996 (a very bad score).

But if you were 90% confident that the ball is red, which is true, then the score would have been log(0.95) = -0.05 – which is closer to 0 —> means a very good score.

So, now that the game is finished, how can this help in AI. If you look closely, you’ll notice that correct predictions with high confidence are getting high scores and poor predictions are getting bad scores. And this is what exactly AI uses to penalise a model during training to minimise the loss and update the gradients to get to the local minima.

A very common use case of Logarithms in AI is Categorical Cross Entropy – it is a way to update the gradients so as to make the AI model learn useful information based on the patterns and contextual relevance in the training data.

The word ‘categorical’ is obvious, as we are using this loss function to predict categorical features. Cross-Entropy is a measure of the difference between two probability distributions:

The predicted distribution (from your model — softmax output)
The true distribution (ground truth — usually one-hot encoded).

Lets derive the mathematical formula for Categorical Cross Entropy by ourselves using the game we have just played.

Rules:

We convert the colors into one-hot encoded vectors. [Red, Blue, Green] -> if the current prediction is red, then the vector is [1, 0, 0] and if the current prediction is blue, then the vector is [0, 1, 0]. (You’ll get to know why we do this soon).
Here we don’t blindly make the prediction, rather we have information from input features to make the decision.
The predicted output is a logit (an input that has some parameters attached to it, optimizing those parameters through gradient updates is the key to train our model !!!) then that is fed into softmax to convert the raw scores into probabilities -> to say how confident the we are with guessing a color.

Now that the rules are clear, let’s convert our game into the most important application in AI !!!

We perform the same operation as before, we run it through a log function to get the penalization, so we have a actual output, say the ball is blue and the predicted is red.

So we need a way to leverage the known information and then create the penalty. The best way is to one-hot encode the information so that when the confidence/predicted scores from the softmax are multiplied with the only the actual input and rest are zeroed out, then we get only the score for the the predicted ball which can later go through a log transformation.

Here’s the above spelled out process in a small formula:

Categorical Cross Entropy =−∑y_i * log(^{^}y_i)

^y_i -> predicted output

y_i -> actual output

A minus sign is included as the output of log for 0 to 1 (our prediction confidence from 0 to 100%) is always negative, you can see that in the above graph on the start of the page for reference.

In real world, developers don’t code the formula manually for training a model, rather modules like pytorch and tensorflow directly provides functionality to import the Categorical Cross Entropy loss.

That’s a wrap, here’s a summary of what we have learnt in this post ->

Basics of logarithms – evolution and their real world uses.
Use of Logarithms in AI – leveraging the penalty nature of log for teaching AI models.
Explored the AI application through an example of Categorical Cross Entropy.

Reference for logarithms application in real world- geeksforgeeks.org/applications-of-logarithms/

To know more about logarithms for AI, the practical implementation of Categorical Cross Entropy is done in my series of neural networks in sapiencespace. Do visit this link to learn more->