#3 Deep Diving into Hash Tables

Reading Time: 15 minutes

This post is part-3 of this data structures series, which will purely focus on exploring Hash tables in depth and finally demonstrating the logic with python code.

In this series of exploring abstract data types we will be diving into concepts like trees, graph traversals, stack, queue, linked list, hash tables etc in python programming language. This series requires you to have a basic understanding of python programming, concepts of class, object oriented programming, functions and scope, data types such as array, list, tuple, strings, integers, dictionary etc.

For those who are beginners into the world of programming, and have explored the basic data types, assume that all the use cases and requirements related to data can be performed using them. Even I had the same mindset when I started to learn data structures, I was able to handle data of any volume and complexity with the basic data types such as list and dictionary.

But then when I started diving deep into the world of data structures, I began to understand the different aspects of it and how they can help to formulate complex relationships between data and overall improve code efficiency when we are dealing with large and complex data.

A hash table is a data structure in python that stores the data based on a key-value pair based system, now a natural question may arise, then what is it that makes hash tables different from regular dictionary data type.

The answer to this question is not a one small advantage which hash table possesses, rather it is the underlying fact that dictionaries are built on the basis of hash tables and presented to the end user to use it in a hassle free manner, just as how lists originate from stack data structure.

When it comes to dictionaries, we create a dictionary with a pre-defined key-value pair, or update our dictionary in iterations, by accessing the key value, and to access the value, we have to call the dictionary by specifying the key value.

In Hash tables, the basic idea is to store the data which is the form of a string or a number without assigning a index manually, here the index is generated through a function called hash function and mapped to the given value.

So, we don’t need the entire key’s name to get the value, rather is index generated by the hash function for a value is enough to access the elements in the most efficient manner.

No, it doesn’t stop here, there is more to it that makes hash tables a versatile data structure. It is quite intuitive that in some cases a hash function can generate the same key for 1 or more values fed into the function, The basic principle is that, one key cannot hold two values.

Let’s think of solutions to solve this problem of collision, one simple way is mitigate this situation would be to design a hash function that always produces new keys and never repeats a key, where you could concepts such as recursion to check whether the key exists in the hash table or note, and generate a key with such a function that uses randomising function.

But this will increase the time complexity and reduce the efficiency when we are working with huge amounts of data. So we have two efficient strategies to deal with the issue:

Open Addressing -it further classified into linear probing, quadratic probing and double hashing
Chaining

Before we actually delve into the intricacies of the strategies mentioned above lets see a simple example where we want to create a hash table and insert values into it.

1.First we create a class named HashTable under which we define the basic methods (they are functions but, in the context of classes, we refer them to as methods). The basic methods include hash function, insertion and display

class HashTable:
    def __init__(self, size):
        self.inc = 0
        self.size = size
        self.arr = [0 for i in range(self.size)]

    def hash_fn(self, element):
        return element % self.size

    def insert(self, element):

        hash_key = self.hash_fn(element)
        self.arr[hash_key] = element

    def display(self):
        for i in range(len(self.arr)):
            print(i, "->", self.arr[i])

2. By calling the function, we get an output like this->

table = HashTable(5)
table.insert(10)
table.insert(11)
table.insert(12)
table.insert(13)
table.insert(14)

table.display()

output: 
0 -> 10
1 -> 11
2 -> 12
3 -> 13
4 -> 14

3.If we try for a different input, like inserting 10,11,12,13,22, we would get the output like this->

output: 
0 -> 10
1 -> 11
2 -> 22
3 -> 13
4 -> 0

As you can see, 12 was initially assigned to 2 and then 22 also got the hash key as 2 and replaced 12, so this problem is called collision, in order to solve this we resort to methods like open addressing.

Let’s start with Open Addressing’s linear probing technique to address this issue.In linear probing the hash function is created as the same as given in the above code, but the main change here is that, we try to increment the hash value by 1 when ever a collision occurs. Let’s understand it with code.

Linear Probing

1.Creating a basic Hashtable with the same hash function

class Linear_Probing:
    def __init__(self, size):
        self.size = size
        self.arr = [0 for i in range(self.size)]

    def isFull(self):
        return 0 not in self.arr  # is True, then it means that the table is full, if False then it means that there
        # is space in the table

    def hash_fn(self, element):
        print("Element: {} | hash_key: {}".format(element, element % self.size))
        return element % self.size

2.The main difference is in the insertion part, we increment the hash value by 1 when ever a collision occurs.

def insert(self, element):

        if self.isFull():
            print("The hash table is full...")
            return False

        isStored = False
        position = self.hash_fn(element)
        if self.arr[position] == 0:  # it means that the current index is empty and ready to take values
            self.arr[position] = element
            print("Element " + str(element) + " is inserted in position: " + str(position))
            print("Hash table: ", self.arr)
            isStored = True

        else:
            print("Collision has occurred for element " + str(element) + " at position " + str(
                position) + " finding new Position...")
            while self.arr[position] != 0:
                position = position + 1
                if position >= self.size:
                    position = 0

            self.arr[position] = element
            print("Element " + str(element) + " is inserted in position: " + str(position))
            print("Hash table: ", self.arr)

            isStored = True

        return isStored

3. And finally to wrap up with the class we define a display method, in which we simply run a loop over the self.arr object and print each element with the index.

def display(self):
        for i in range(len(self.arr)):
            print(i, "->", self.arr[i])

Implementation of the code:

table1 = Linear_Probing(9)

# storing elements in table
table1.insert(12)
table1.insert(26)
table1.insert(31)
table1.insert(17)
table1.insert(90)
table1.insert(28)
table1.insert(88)
table1.insert(40)
table1.insert(77)  

# displaying the Table
table1.display()

Output: 

Element: 12 | hash_key: 3
Element 12 is inserted in position: 3
Hash table:  [0, 0, 0, 12, 0, 0, 0, 0, 0]
Element: 26 | hash_key: 8
Element 26 is inserted in position: 8
Hash table:  [0, 0, 0, 12, 0, 0, 0, 0, 26]
Element: 31 | hash_key: 4
Element 31 is inserted in position: 4
Hash table:  [0, 0, 0, 12, 31, 0, 0, 0, 26]
Element: 17 | hash_key: 8
Collision has occurred for element 17 at position 8 finding new Position...
Element 17 is inserted in position: 0
Hash table:  [17, 0, 0, 12, 31, 0, 0, 0, 26]
Element: 90 | hash_key: 0
Collision has occurred for element 90 at position 0 finding new Position...
Element 90 is inserted in position: 1
Hash table:  [17, 90, 0, 12, 31, 0, 0, 0, 26]
Element: 28 | hash_key: 1
Collision has occurred for element 28 at position 1 finding new Position...
Element 28 is inserted in position: 2
Hash table:  [17, 90, 28, 12, 31, 0, 0, 0, 26]
Element: 88 | hash_key: 7
Element 88 is inserted in position: 7
Hash table:  [17, 90, 28, 12, 31, 0, 0, 88, 26]
Element: 40 | hash_key: 4
Collision has occurred for element 40 at position 4 finding new Position...
Element 40 is inserted in position: 5
Hash table:  [17, 90, 28, 12, 31, 40, 0, 88, 26]
Element: 77 | hash_key: 5
Collision has occurred for element 77 at position 5 finding new Position...
Element 77 is inserted in position: 6
Hash table:  [17, 90, 28, 12, 31, 40, 77, 88, 26]
0 -> 17
1 -> 90
2 -> 28
3 -> 12
4 -> 31
5 -> 40
6 -> 77
7 -> 88
8 -> 26

This sounds like a great way to solve the basic hashing right? well, if you look closely, you can spot out that the time complexity will increase like crazy if the hash key has to be incremented until the last empty space, to give a context, think of a case where we have a hash table of length in thousands and all the elements upto the before last space is filled, even if the computed hash key is somewhere in the middle, think of adding the hash key upto the last size number, it would take a lot of time and space in the memory, this issue is commonly known as Primary Clustering.

Another problem same as identifying the index would arise when we want to compute the index of the element to search or delete in cases where we won’t loops to iterate over the loop to decrease time and space complexity.

So to solve this we have quadratic probing. In this technique, we don’t increment the value of the computed hash value by one for each time, we slightly alter the hash function to (element % self.size) + self.inc**2, where self.inc is initially 0, to give a general sense, this is how the function will render on recursion→ hash(x)+0^2, hash(x)+1^2, hash(x)+2^2, hash(x)+3^2,…

Quadratic Probing

class Quadratic_Probing:
    def __init__(self, size):
        self.inc = 0
        self.size = size
        self.arr = [0 for i in range(self.size)]

    def hash_fn(self, element):
        print("recieved: ", element, " computed: ", (element + self.inc**2) % self.size)
        print("self.inc: ", self.inc)
        return (element % self.size) + self.inc**2

    def isFull(self):
        print(self.arr)
        return 0 not in self.arr  # is True, then it means that the table is full, if False then it means that there
        # is space in the table

    def insert(self, element):
        if self.isFull():
            return "Hash table is full"

        hash_key = self.hash_fn(element)
        print("Init: ",hash_key)

        if self.arr[hash_key] == 0:  # it means that this specific space is empty
            self.arr[hash_key] = element
            print("Hash: ", hash_key, " element: ", element)
            self.inc = 0
            return "inserted at " + str(hash_key)

        else:
            print("Else part")
            print("Hash: ", hash_key, " element: ", element)
            self.inc = self.inc + 1
            self.insert(element)
            # if op != "inserted":

            return "no"

    def display(self):
        for i in range(len(self.arr)):
            print(i, "->", self.arr[i])

Implementation of the code

table = Quadratic_Probing(5)
print(table.insert(12))
print(table.insert(20))
print(table.insert(80))
print(table.insert(72))
print(table.insert(45))
print(table.insert(90))

table.display()

Output: 

[0, 0, 0, 0, 0]
recieved:  12  computed:  2
self.inc:  0
Init:  2
Hash:  2  element:  12
inserted at 2
[0, 0, 12, 0, 0]
recieved:  20  computed:  0
self.inc:  0
Init:  0
Hash:  0  element:  20
inserted at 0
[20, 0, 12, 0, 0]
recieved:  80  computed:  0
self.inc:  0
Init:  0
Else part
Hash:  0  element:  80
[20, 0, 12, 0, 0]
recieved:  80  computed:  1
self.inc:  1
Init:  1
Hash:  1  element:  80
no
[20, 80, 12, 0, 0]
recieved:  72  computed:  2
self.inc:  0
Init:  2
Else part
Hash:  2  element:  72
[20, 80, 12, 0, 0]
recieved:  72  computed:  3
self.inc:  1
Init:  3
Hash:  3  element:  72
no
[20, 80, 12, 72, 0]
recieved:  45  computed:  0
self.inc:  0
Init:  0
Else part
Hash:  0  element:  45
[20, 80, 12, 72, 0]
recieved:  45  computed:  1
self.inc:  1
Init:  1
Else part
Hash:  1  element:  45
[20, 80, 12, 72, 0]
recieved:  45  computed:  4
self.inc:  2
Init:  4
Hash:  4  element:  45
no
[20, 80, 12, 72, 45]
Hash table is full
0 -> 20
1 -> 80
2 -> 12
3 -> 72
4 -> 45

Now it seems that quadratic probing is a good deal right there is no problem of primary clustering, time complexity reduces, then why would we even need a method like double hashing, well if you look closely the problem of clustering still persists in quadratic probing, think of a situation where all the possible indexes are filled and the hash function would give a key that would point to them in the first call, and imagine filling rest of the spaces, it the same happened to rest of the elements to be inserted, it would be a worst case scenario.

So, this is where double hashing comes in, as the name suggests the hashing is done twice with different hash functions to completely eliminate the possibilities of clustering, in addition to that double hashing relies on the necessity for the hash table’s size to be a prime number, because by choosing a prime number for the array’s size, we can ensure that no number perfectly divides it, guaranteeing that the probe sequence eventually examines every cell.

Double Hashing

The code below shows a very basic implementation of double hashing, where two pre-defined hash values are taken as input and a single dynamic element “i” is placed in the logic to deal with collisions.

class Double_Hashing:
    def __init__(self, primary_hash, secondary_hash):
        self.size = 11
        self.table = [0 for _ in range(self.size)]
        self.inc = 0
        self.primary_hash = primary_hash
        self.secondary_hash = secondary_hash

    def is_full(self):
        return 0 not in self.table  # Returns true if the table is full; false if there's space.

    def hash_function(self, key):
        h1 = key % self.primary_hash
        h2 = key % self.secondary_hash
        hashed_key = (h1 + self.increment * h2) % self.size
        return hashed_key

    def insert(self, key):
        if self.is_full():
            print("The hash table is full...")
            return False

        hashed_key = self.hash_function(key)
        stored = False

        if self.table[hashed_key] == 0:
            self.table[hashed_key] = key
            stored = True
            return stored

        else:
            # Collision occurred, need to increment self.increment
            self.inc += 1
            result = self.insert(key)

            if result:
                # The key has been stored
                self.inc -= 1
            else:
                self.inc += 1

    def display(self):
        for i in range(self.size):
            print(i, "->", self.table[i])

Implementation of the code:

h = Double_Hashing(11,9)
h.insert(76)
h.insert(26)
h.insert(37)
h.insert(59)
h.insert(21)
h.insert(65)

h.display()

Output:
0 -> 0
1 -> 65
2 -> 21
3 -> 0
4 -> 26
5 -> 37
6 -> 0
7 -> 0
8 -> 0
9 -> 59
10 -> 76

if you want to take size as the input and process the table, then you can define a simple function like this to find the closest prime number bigger than the given size.

def is_prime(n):
    flag = 0
    for i in range(2, n):
        if n % i == 0:
            flag = 1
            break
    return flag == 0

def closest_prime(n):
    new = n
    val = False
    while not val:
        chk = is_prime(new)
        if not chk:
            new = new + 1
        else:
            val = True
    return new

Hashing with Chaining

This is one of the most easiest algorithm to implement when compared with the others mentioned above, here the idea is to chain the elements with the same hash value into a list and store it in the asme hash key. Here is a simple implementation of the same.

class Chaining:

    def __init__(self):
        self.MAX = 10
        self.arr = [[] for i in range(self.MAX)]

    def get_hash(self, element):
        return element % self.MAX

    def insert(self, element):
        h = self.get_hash(element)
        self.arr[h].append(element)


    def display(self):
        for i in range(len(self.arr)):
            print(i, "->", self.arr[i])

table = Chaining()
table.insert(12)
table.insert(20)
table.insert(80)
table.insert(72)
table.insert(45)
table.insert(90)

table.display()

Output :

0 -> [20, 80, 90]
1 -> []
2 -> [12, 72]
3 -> []
4 -> []
5 -> [45]
6 -> []
7 -> []
8 -> []
9 -> []

Through out the post I have tried to maintain a very simple logic in implementing the concepts, but you can modify the hash function and other functions according to your requirements, so sky is the limit, experiment with the techniques, and I highly recommend you to write code for searching and deleting elements in each of the method, this would be a very good exercise to get a complete picture of the logic, because you will be writing those functions based on the insertion logic.

That’s a wrap, thank you for reading all along, your curiosity and engagement are highly valued, post your comments or doubts in the comment section below. Subscribe to sapiencespace and enable notifications to get regular insights.

Click here to view similar insights.

What’s your Reaction?

Insightful

Helpful

Amazing

Clap

Hi-fi

Recently Posted

Data Science & Programming

#3 Deep Diving into Hash Tables

This post is part-3 of this data structures series, which will purely focus on exploring Hash tables in depth and finally demonstrating the logic with python code.

Implementation of the code:

Implementation of the code

Implementation of the code:

if you want to take size as the input and process the table, then you can define a simple function like this to find the closest prime number bigger than the given size.

Leave a Reply Cancel reply

Recently Posted

Spelling out Linear Regression in Code – a 101 guide

Share

Subscribe To Newsletter

#3 Deep Diving into Hash Tables

This post is part-3 of this data structures series, which will purely focus on exploring Hash tables in depth and finally demonstrating the logic with python code.

Implementation of the code:

Implementation of the code

Implementation of the code:

if you want to take size as the input and process the table, then you can define a simple function like this to find the closest prime number bigger than the given size.

Share this post!

Leave a Reply Cancel reply

Recently Posted

Spelling out Linear Regression in Code – a 101 guide

Share

Subscribe To Newsletter

Home

Data Science & Programming

Book Summaries & Review

Personal Development