A hash table is a data struture commonly used to associate keys with values
to implement dictionaries or to test if a key is part of a set of keys to
implement set operations.
The best hash table is the one that enables these operations at the lowest
cost.
Cost in computing essentially is two-dimensionsional, time and space.
Keys or key-value pairs are stored at a pseudorandom position determined
first by the hash of the key and then either storing all the entries that map
to the same position in a secondary data structure
(this method is called chaining) or by a probing strategy that finds
another free position.
The first step creats a cache miss.
A cache miss is when data, that cannot be found in any of the caches of the
CPU, is requested.
The time this takes is around 300 CPU cycles and typically determines the
performance of a hash table.
It cannot be avoided due to the pseudorandom nature of hashing.
The second step however occurs more frequently the fuller the table becomes,
it is a function of the load factor α.
The load factor is defined as the number of elements stored in the table
divided by the total number of positions available in the table. Therefore,
for a given hash function and and collision resolution scheme, the larger
table is also faster because it has to resolve the less collisions,
and therefore less cache misses.
The hash table with the best memory efficiency is simply the one with the
highest load factor, (it can even exceed 100% memory efficiency by using key
compression with compact hashing ).
A hash table like that does still provide O(1) lookups, just very slow.
I argue that the best hash table is not the fastest or the most memory
efficient, but the one with the best trade-off between space and time.
Asymptotically the best trade-off is the one that has the lowest product of
the the two.
Imagine building a big server that does nothing but searching entries in a
hash table.
Say the hash table is very large so that the offset of the price for the CPU
and the other hardware will be dwarfed by the cost of the RAM.
The server will have a finite life-time, therefore the cost of a single
lookup into the hash table will be directly proportional to the product of the
time it takes to access an element and the amount of space the hash table
needs per element.
Let's call this concept the space-time.
If the space-time seems too abstract let's take a common task we all probably
do from time to time:
Finding hash collisions by brute force.
By first computing as many hashes as fit in memory and then - using a hash
table - comparing to all of them at the same time with a single
lookup for each newly computed hash.
Say we can neglect the time to compute a hash, then the performance can be
expressed as the number of comparisons that can be done per unit of time.
That number is, not incidentally, proportional to the reciprocal of the
space-time.
I designed a small benchmark to be as fair and comprehensive as possible.
Only random keys can produce comparable memory access patterns across
different hash table implementations using different hash functions and
probing strategies, keep in mind however that this masks bad hash functions.
For the following benchmark 2²⁶ random 64 bit keys were associated with
arbitrary 64 bit values. The average memory consumption and time for
insertion, lookup and deletion per key-value pair can be found in the plots
below for the different hash tables.
The best hash-table implementations define a lower bound of the best
achievable space-time trade-off. I chose to fit c • ( 1+(α-1)¯¹ ) as this is
the asymptotic complexity of double hashing, which is one of the best probing
strategies in terms of asymptotic complexity and it seemed to fit well.
Given this curve of achievable trade-offs we want to choose the one where the
product of space and time requirement is the lowest. The solid black line
indicates the line the hash table implementations need to be below on the plot
of memory and speed if they want to beat the apparently achievable best
space-time trade-off. The hash tables that have an average memory efficiency
of around
2-√2 = 0.585786...
will achieve the lowest space-time given this trade-off curve.
Coincidentially, but not surprisingly modern hash tables tend to cluster
around this memory efficiency.
As can be seen from the plots there are some hash tables with significantly suboptimal performance characteristics, but the best space-time trade-off is heavily contested and depends on the use case.
F14ValueMap, bytell, patchmap, tsl::sparse_map, robin_hood::unordered_map and absl::flat_map all exhibit similar performance characteristics.
At that level of difference compiler versions or memory fragmentation can already make the difference. For lookups bytell achieves the lowest space-time, closely followed by the patchmap, then absl::flat_map, F14ValueMap and robin_hood::unordered_map.
Depending on your constraints you might want to use ska::flat_hash_map for speed, it actually is the fastest hash table for lookups, just like Malte Skarupke claims, while google::dense_hash_table is a very close second.
Insertions and deletions are typically much less frequent than lookups, this is why they are not part of the ranking, they should still be reasonably fast for a good hash table however.
If you want a fast hash table with higher memory efficiency you might want to take a look at tsl::sparse_map and especially patchmap, which achieves the highest performance for a memory efficiency between 0.6 and 0.87.
The fastest hash table in the very high memory efficiency regime is google::sparse_hash_map at 0.88, but it can be beat by using a hash table combining chaining, a very high load factor and pseudorandom ordering, indicated with a green dot at 0.95, more on that here.
A Judy array is good for medium to small datasets, but the asymptotic properties are catching up to it at 2²⁷ keys and it will only get worse.
The std::unordered_map is forced to perform sub-par because the C++ standard requires all iterators except to elements that are directly deleted to stay valid for all hash table operations. This necessitates chaining instead of probing and prohibits reordering schemes.
All in all this shows that pseudorandom ordering is an an interesting technique for achieving highly memory efficient hash tables, while also being competitive at medium memory efficiencies.
Below 50% memory efficiency the overheads associated with this technique make it unsuitable, as collisions become the exception rather than the norm and the reordering does not pay off.
A sorted array is a compact data structure, it takes up just as much space as absolutely necessary. It also allows for relatively fast acceses on the order of O(log(log(n))) if the keys are uniformly distributed. Insertions and deletions end up being O(n). While this might be good or good enough for some applications, especially when the number of keys is small, it is not what we want from a hash table in general. There is no general compact data structure to my knowledge that allows asymptotically constant lookups, insertions and deletions, but what happens if we allow a sorted array to be more wasteful?
If there is more memory available some positions in the sorted array can be left empty. The sorted values will form clusters or "patches" which will be sorted internally and with respect to each other, in between there will be positions that are not set. For insertions this means that the reordering required is much more localised and the cost only a function of the load factor, not the total size. For lookups, because there are more positions than entries to store, the offset from the position determined by linear interpolation can be minimised. This in turn means that an interpolation search will be able to find the values much quicker, in fact for a given load factor α<1 the number of steps that an interpolation search needs will be independent of n ( number of elements stored in the table ) and doubly logarithmic as a function of α.
But what if the keys we want to store are not uniform? This is where patchmap comes in. By sorting the keys by their hash value the order becomes pseudorandom. If the hash function is good then the values that it produces will be uniform even if the keys themselves are not. Therefore by combining a good hash function with an order given by the hash values a data structure can be created that stores keys or key-value pairs in a very memory efficient way and that interpolates between a (randomly) sorted array and a hash table.
Example: Inserting into a patch
To know which buckets in the table are set and which are not one could reserve a special value like the google implementations do for an empty key. But I have decided to use a binary mask that indicates whether an entry is set or not, because I do not want to impose any restrictions on the keys that can be stored in the hash table. If there was one value of the key that was impossible, like say the nullptr it could be used to indicate an empty position, like is being done in the hash table implementations of google.
When inserting into the patchmap it goes something like this:
- map the hash value of the key to the range of the table
- look at the mask to find the closest free bucket to this position
- insert the key-value pair there
- restore the order by sorting by the hash value with a single insertion-sort step
The patchmap before insertion
key ...__257936__...
hash ...__455678__...
mask ...--++++++--...
First step of inseting the key 1 with hash value 7
key ...__2579361__...
hash ...__4556787__...
mask ...--+++++++--...
The order has been restored in a single insertion sort step
key ...__2579136__...
hash ...__4556778__...
mask ...--+++++++--...
The patchmap achieves a good speed/space trade-off, but it cannot beat googles sparse hash map at very high memory efficiency although the comparison to it and sparsepp seemed not even very fair in the first place. sparsepp and google::sparse_hash_map minimize the peak memory consumption instead of the average memory consumption. Each time the other hash tables (including the patchmap) have to grow they reserve a new contiguous block of memory, copy the values from the old table, that has become too small, into the new table, therefore taking up at least two times more memory during each grow operation. To make the comparison fair I used a hashed array tree to generate the point for the patchmap with the highest memory efficiency. A hashed array tree allows the hash table to grow virtually in place, similar to what realloc allows, but at some additional cost, however keeping non copy-constructible data intact.
Still the patchmap performs worse at the memory efficiency that google::sparse_hash_map achieves for 64-bit keys.
This is because at high load factors large patches have a very much increased probability of growing bigger because they occupy more positions in the array where values are being mapped to randomly, leading to an asymptotically quadratic behavior. If we really want to beat google::sparse_hash_map for memory efficiency we can work around that and instead make a chaining hash table. Entries that hash to the same hash value are going to be stored in a single contiguous array which is being sorted by another hash value, again using the pseudorandom ordering trick. Chaining makes the distribution of the sorted patches more even, as there is no feedback mechanism at work. Interpolation search gives very fast access times and the memory overhead is very little if these arrays are large. At a load factor between 64 and 128 the memory efficiency of google::sparse_hash_map can be beat, while not giving up on performance completely, see the green dot in figure 1.
The patchmap seems to fare worst in comparison to the other hash
tables for deletions. This is probably because I went a different route there.
Usually when deleting an element from a hash table elements are not actually
removed from the table, they are only marked as deleted. This technique is
known as tombstoning. It is done mostly because this is the fastest way to
delete elements, but also because the other ways often are much harder to
implement. This is not the case in the patchmap. There deleting works
just like insertion, but in reverse. Deleted but not dead entries, let's
call them zombie entries, can accumulate and if we don't pay close
attention they can take over the table. Such a worst case scenario can be
triggered by repeatedly inserting and deleting elements while keeping the
table at the same size. Some implementations are immune and others have taken
specific precautions to mitigate this issue, usually by rehashing the table
as soon as the zombies accumulate too much. This usage pattern or others that
generate many zombies are unusual but not impossible, therefore a small to
medium slowdown is perfectly acceptable, a severe slowdown might however
become an unforeseen problem at some point.
hash table | slowdown | rehasing |
patchmap | no | none |
bytell | no | none |
std::unordered_map | no | none |
flat_hash_map | no | none |
F14ValueMap | no | none |
sparsepp | minor | frequent |
google::sparse_hash_map | minor | frequent |
google::dense_hash_map | severe | rare |
khash | severe | occasional |
Additional points
How to make a good hash function?
I had been thinking about good hash functions for a long time so it was not hard to combine two multiplications and a binary rotation to get a hash function that had a very even output. This is similar to what Malte Skarupke ended up doing starting with a single multiplication with the golden ratio, aka fibonacci hashing and composing it with a bitshift. Galois Field multiplications are a great building block too, but my laptop is a little old and therefore struggles with carry-less multiplications.
Recently I wanted to compute a large number of large histograms.
The fastest way of computing histograms involves hash tables.
As c++ is my favorite programming language I had chosen
std::unordered_map as the backend.
Everything was going smoothly just as I had imagined but then when I was
running the code on my laptop instead of the servers at my institute
everything froze.
The memory required was around 8 times of what I had anticipated.
It turns out I was triggering close to the worst case for the
std::unordered_map as I was mapping 16-bit values to 16-bit values
on a 64-bit machine.
The std::unordered_map is a chaining hash table which means that for
each pair of 16-bit values it needs to store a 64-bit pointer,
this creates an especially large overhead in this case.
I ended up sorting the values first and then computing the histograms from
the sorted arrays, which takes much longer, but uses the minimum amount of
memory.
But I still had more memory available, I was just using 6 GB out of the 16 GB
I had in my laptop.
Would it not be great if there was a data structure that could interpolate
between the two solutions? Or at least if there was a hash table that was more
memory efficient?
And before I even found out that there were indeed hash tables that are way
more efficient I had an idea that I needed to try.
The idea is as has been hinted at to interpolate between a sorted array and
a hash table.
That is mapping the keys linearly to the hash table and resolving collisions
based on the numerical value of the key should they occur.
This works well if the keys are uniformly distributed and in this case has
the added effect of essentially sorting the keys in O(1) time and O(1) memory,
not in place however as for this to work efficiently there is some fractional
extra space needed, at least 5% extra would be good.
This approach might sound very similar to the one suggested in
"Ordered Hash Tables" (01 January 1974) O. Amble D. E. Knuth
but there is a mayor difference in that the hash tables there are ordered
first by the hash of the keys and then by the value of the key.
If the entries are however only ordered by the hash value there exists a
global order, which all of a sudden allows advanced search strategies like
interpolation search and binary search.
In fact the approach is much more similar to robin hood hashing with linear
probing because if two keys collide at the same position the key whose hash
value is lower will also have been displaced more and will therefore displace
the key with the larger hash value, therefore keeping up the same order.
There has however been no hash table using robin hood hashing with linear
probing that takes advantage of this fact.
https://github.com/1ykos/ordered_patch_map
[ C++17, open source, beta stadium, let me know if you want to use it, I will try to make it work for you! ]
For a more detailed description of the inner workings of the patchmap you can have a look at (open-access):
INFOCOMP article 581
Bijective hash functions
Hash functions with minimal output size that do not introduce additional
collisions are necessarily bijective.
Bijective functions
are just the way to go when creating a hash function.
They can be composed arbitrarily and they can also be inverted.
They can be made to be arbitrarily hard to invert.
Exponentiation by squaring on a prime field for example is used in
cryptography, it is computationally very expensive to invert,
yet it is bijective.
It's a common misconception to think that the security of a cryptographic hash
function depends on hash collisions.
For a hash table however good mixing is all that matters, being easily
invertible can even be an advantage.
Store permutated data
Lookup is the most frequent operation in a hash table, therefore it imperative
that it is fast.
In the patchmap, because the data is ordered by the hash of the key,
for each lookup the hash of the key needs to be compared to several other
hashes.
To speed up this comparison the hashes could be stored alongside the
data, like in some other hash tables.
Given a bijective hash function and its reciprocal however this space can be
saved and instead of storing hash and data we can just store the hash.
The data can be computed from the hash.
Resizing allocations
Some allocators are able to resize the allocations, similar to
realloc, but no C++ other than dlib::allocator expose that
feature. I experimented using resizing and the time to grow and shrink the
table can be reduced, but there are lots of corner cases which need to be
handeled correctly, and that's why this feature is not in the current version
of the patchmap.
lazy rehashing
Resizing without relocating is especially beneficial when there is a way to
craft the hash reduction so that only a fraction of the data actually needs
to move to a new location. The only hash table that I know of that manages
to make use of this ingenious idea is khash. When the table size is
a factor of 2, the hash reduction can be as simple as take n bits of the hash.
When growing the table this means that only half of the keys in the table
need to be moved, making the grow operation even cheaper. If only there was a
way to generalise this trick to arbitrary hash table sizes!
compact hashing
The insertion scheme of the patchmap minimises the offset of the hash of the
key to the position, mapped to the range of available buckets.
This means that roughly floor(log2(n)) bits of the key can be guessed
from the where the key is stored.
Only storing the differences between the hash and the hash that would be
expected at a given position could save floor(log2(n)) per key.
At least in theory this is how compact hashing could be adapted to the
patchmap. The devil is in the details, mainly in the alignment.