Rehashing (Variable Hashing)

Steven J. Zeil

Last modified: Oct 26, 2023
Contents:

Hash tables offer exceptional performance when not overly full.

This is the traditional dilemma of all array-based data structures:

Rehashing or variable hashing attempts to circumvent this dilemma by expanding the hash table size whenever it gets too full.

Conceptually, it’s similar to what we do with a vector that has filled up.

1 Expanding the hash Table

For example, using open addressing (linear probing) on a table of integers with hash(k)=k (assume the table does an internal % hSize):

We know that performance degrades when $\lambda$ > 0.5

Solution: rehash when more than half full

 

So if we have this table, everything is fine.

 

But if we try to add another element (24), then more than half the slots are occupied …

 

So we expand the table, and use the hash function to relocate the elements within the larger table.

The actual expansion mechanism is similar to what we do for vectors. In fact, if we stored the table in a vector, we could use the vector resize() function to force the expansion of the table.

However it’s important to remember that the value of hash(x) % hSize changes if hSize changes. So the elements need to be repositioned within the new larger hash table.

In this case, I’ve shown the hash table size doubling, because that’s easy to do, despite the fact that it doesn’t lead to prime-number sized tables.

If we were going to use quadratic probing, we would probably keep a table of prime numbers on hand for expansion sizes, and we would probably choose a set of primes such that each successive prime number was about twice the prior one.

2 Saving the Hash Values

 

The rehashing operation can be quite lengthy. Luckily, it doesn’t need to be done very often.

We can speed things up somewhat by storing the hash values in the table elements along with the data so that we don’t need to recompute the hash values. Also, if we structure the table as a vector of pointers to the hash elements, then during the rehashing we will only be copying pointers, not the entire (potentially large) data elements.