Loading [MathJax]/jax/output/HTML-CSS/jax.js

Rehashing (Variable Hashing)

Steven J. Zeil

Last modified: May 21, 2025
Contents:

Hash tables offer exceptional performance when not overly full.

This is the traditional dilemma of all array-based data structures:

Rehashing or variable hashing attempts to circumvent this dilemma by expanding the hash table size whenever it gets too full.

Conceptually, it’s similar to what we do with an ArrayList that has filled up.

1 Expanding the hash Table

For example, using open addressing (linear probing) on a table of integers with hash(k)=k (assume the table does an internal % hSize):

We know that performance degrades when λ > 0.5

Solution: rehash when more than half full

 

So if we have this table, everything is fine.

 

But if we try to add another element (24), then more than half the slots are occupied …

 

So we expand the table, and use the hash function to relocate the elements within the larger table.

The actual expansion mechanism is similar to what we do for ArrayLists. However it’s important to remember that the value of hash(x) % hSize changes if hSize changes. So the elements need to be repositioned within the new larger hash table.

In this case, I’ve shown the hash table size doubling, because that’s easy to do, despite the fact that it doesn’t lead to prime-number sized tables.

If we were going to use quadratic probing, we would probably keep a table of prime numbers on hand for expansion sizes, and we would probably choose a set of primes such that each successive prime number was about twice the prior one.

2 Saving the Hash Values

 

The rehashing operation can be quite lengthy. Luckily, it doesn’t need to be done very often.

We can speed things up somewhat by storing the hash values in the table elements along with the data so that we don’t need to recompute the hash values.

3 What Would Java Do?

In, the next lesson, we’ll look at the hashing based containers in Java, HashSet and HashMap.

From the source code I have seen, they use closed hashing with linked-list buckets. When the table contains more then (3/4) * hSize elements, they increase hSize by just more than double: hSize = 2*hSize + 1;

So, assuming a hash function that uniformly distributes the keys around the table, the buckets have an average length of no more than 0.75, so the average time to add, access, or remove entires is O(1).

The worst case for all of these is O(size()), because of the possibility of a bad hash function. However, even if we have a good hash function, adding data to these structures will, on occasion, be O(size()), because of the occasional need to resize the hash array and relocate all of the data in the set/map.