Hash tables work by mapping array indices to strings which can be used to index the array. Strings are converted to an array index via a Hashing algorithm. This hash function reliably converts the string to a number which can be used to index the array.
So, under the hood, hash tables (or hashmaps) are really just an array with a hashing function.
Because hash tables use arrays internally, their read time is constant (
O(1) in Big O notation). Their write time is also constant, since the hash function takes care of indexing.
However, because hash tables must pre-allocate their internal arrays, they take up memory registers which may potentially go unused. They also may be overflown, in which case they need to be resized.
Mitigating hash collisions with linked lists
Because hashing algorithms aren’t perfectly reliable (there are collisions in any algorithm), hash tables often use linked lists at array indices which have collisions. This slows the read time of the hash table down however. Ideally, hash tables have the same read time as arrays:
O(1), or constant time. However, as linked lists are introduced, the overall read time of the hash table veers toward
O(n) (see Arrays vs. linked lists).
When the array underlying a hash table becomes full, that array must be resized. This is commonly done by initiating a new array doubling the previous allocation size.
The steps are:
- allocate a new array with
prev_length x 2
- use the hash function to re-insert items into the new array
- deallocate the old array
Even with resizing, hash tables are still constant time on average.