HashTable: One Data Structure To Rule Them All
project photo
author photo
Iftekhar AhmedNov 28, 2022

What is a HashTable?

HashTable is one of the most powerful and well-known data-structure. The reason hash tables are so important is that they offer constant time O(1) lookup/insert/delete. It stores data in key-value pairs. For example, let’s say you want to store the students' marks according to their student_Id.

project photo

Here we can say Student_Id is the key and Marks is the value. One thing to notice is that a key must be unique where values can be repeated.

Keys can be of any type not just string and numbers but also objects. However, the key needs to be hashable. Now, what does hashable mean? Before that, to be able to understand how mapping is constructed between key and value we first need to talk about hash functions.

Hash Function:

A hash function is a function that maps a key ‘x’ to a whole number in a fixed range. For example, we want to hash a string where the hash value should be less than 50.

1function hashFunction(str) {
2    let sumOfAscii = 0;
3    for (let i = 0; i < str.length; i++) {
4        let currentCharAscii = str.charCodeAt(i);
5        sumOfAscii += currentCharAscii;
6    }
7    let hashNumber = sumOfAscii % 50;
8    return hashNumber;
9}

here,

hashFunction(”ABC”) → 48

hashFunction(”BB”) → 32

hashFunction(”abg”) → 48

so, here we can see a key might have multiple hash values ( which is a problem, later we will discuss how to solve this particular problem).

Properties of Hash Function:

If we have a hash function and hashFunction(x) == hashFunction(y) then x and y might be equal. but if hashFunction(x) != hashFunction(y) then x and y certainly not equal.

So how can we use that property? Let us consider we have two huge files. File_1 and File_2. We have to figure out whether these two files are identical. So, If we precomputed hashFunction(File_1) and hashFunction(File_2), rather than compare these two files we can compare their hash value which is super fast O(1). If they are not equal we can easily say that these two files are not the same.

N.B: For file checking, OS uses some sophisticated hash functions like cryptographic hash functions/checksum.

Another property of a hash function is that it has to be deterministic. This means if hashFunction(x) produces 123 then hashFunction(x) must always produce 123.

To make the hash function more robust, we always should be trying to make uniform hash functions to minimize the number of hash collisions ( A hash collision is when two objects x, and y hash to the same value. hashFunction(x) == hashFunction(y) )

Now, to the previous question what does hashable mean?

To make our hash function deterministic, the key of our hash table must be immutable means we can't change the keys when after they are used. Like immutable strings, integers, etc. So, a key is hashable if it has a hash value that does not change during its entire lifetime.

How does a hash table work?

Suppose we are inserting (integer, string) key-value pairs into the table representing Student_Id and Student_Name. To insert (101, Alex) we hash the key (101) with a hash function and find where it goes in the table.

project photo

so here the hash value of key 101 is 6 so we put (101, Alex) into the 6th position and so on. So if we want to find the student with id 101 we just hash(101) and lookup hash(101)’th index which is O(1) constant time. But we didn’t talk about what we do if multiple keys have the same hash value. To fix this hash collision we can use one of many hash collision resolution techniques, the two most popular ones are Separate Chaining and Open Addressing.

Separate Chaining: It deals with a hash collision by maintaining a data structure ( usually a linked list ) to hold all the different values which hashed to a particular value.

Open Addressing: It deals with hash collisions by finding another place within the hash table for the object to go by offsetting it from the position to which it hashed.

Time Complexity

The time complexity of a hashtable entirely depends on its hash function. If the hash function is robust and sophisticated then operations like insert, delete, and search becomes O(1).

project photo