Hash tables - jaredgorski.org

Hash tables

data structures

Hash tables work by mapping array indices to strings which can be used to index the array. Strings are converted to an array index via a Hashing algorithm. This hash function reliably converts the string to a number which can be used to index the array.

So, under the hood, hash tables (or hashmaps) are really just an array with a hashing function.

Efficiency

Because hash tables use arrays internally, their read time is constant (O(1) in Big O notation). Their write time is also constant, since the hash function takes care of indexing.

However, because hash tables must pre-allocate their internal arrays, they take up memory registers which may potentially go unused. They also may be overflown, in which case they need to be resized.

Mitigating hash collisions with linked lists

Because hashing algorithms aren’t perfectly reliable (there are collisions in any algorithm), hash tables often use linked lists at array indices which have collisions. This slows the read time of the hash table down however. Ideally, hash tables have the same read time as arrays: O(1), or constant time. However, as linked lists are introduced, the overall read time of the hash table veers toward O(n) (see Arrays vs. linked lists).

Resizing

When the array underlying a hash table becomes full, that array must be resized. This is commonly done by initiating a new array doubling the previous allocation size.

The steps are:

  1. allocate a new array with prev_length x 2
  2. use the hash function to re-insert items into the new array
  3. deallocate the old array

Even with resizing, hash tables are still constant time on average.