Hashing and Hash maps

Ryan Syed
3 min readAug 30, 2018
We are definitely not gonna talk about that :P

Say you wanted to code up an implementation of a dictionary of English words, then how would you approach this problem

Approach 1:

Using a LinkedList or an Array

In order to check the definition or presence of a word, we would have to iterate over all the words sequentially, and the time complexity would be O(n).

Approach 2:

Using a Balanced Binary Search Tree

In case of a Balanced Binary Search Tree, the longest node would be at a depth of log N, hence the time complexity would be O(log n)

Approach 3:

What if we could access the key using an index like an array, then the time taken would be O(1). Presenting Hash maps , they use an array with a hash code as index to store and retrieve the elements quickly.

Say we wanted to store “Hello” and “Bye” in our map, then we could use their lengths and store them at that corresponding index and their definitions in another array with the same index.

However, if we then planned to store a word like “Bingo”, then we would be in a pickle (Same would happen if we did try to store “Pickle”). This is referred to as a collision.

In order to prevent collisions we can instead store linked lists in the array so that elements with the same hash code would be present in the same list.

But what about space wastage?

Sure, if we used the length as a hash code, then really long words would be stored at a distant index and we would also be wasting a lot of space. Instead we have to figure out how we could generate better hash codes.

But what about huge memory requirement?

What if we kept generating unique hash-codes, then won’t our arrays will have to be really long. The answer is no. We can have a desired/fix size of our array and then take a modulus of the generated hash code with our array size, resulting in an index within the array.

But why is the word length inadequate as our hash function?

The entries with same hash codes are stored in the same linked list. These are called buckets as they group together entries with the same index. Now say for common English words of letter 4, 5, 6 etc. the buckets would be really large and therefore finding the definition would not be a constant time operation. Therefore we would need a better hash function than our word length.

So what constitutes a good hash function?

A hash function would be considered good if the values produced by it are distributed over a large range of values so that each element can be found in a few comparisons.

I will write another post about hashing in the future, so if you enjoyed this post, then do check it out and other content by me. Also, don’t forget to follow me so that you don’t miss any post. Happy coding!!

--

--