CIS 22 - Data Structures: Hash Tables and Hash Functions
Hash Table, Hash Function, Collisions.
Hash Function is a function which, when applied to the key, produces an integer which can be used as an address in a
hash table. The intent is that elements will be relatively randomly and uniformly distributed. In the example above the code for the hash function would
look like this (assuming TABLE_SIZE was defined 100):
Hash Table is a data structure in which keys are mapped to array positions by a hash function. This table can be searched for
an item in O(1) time using a hash function to form an address from the key. The easiest way to conceptualize a hash table is to think of it as an array.
When a program stores an element in the array, the elements key is transformed by a hash function that produces array indexes for that array.
Then, for example, to find an element in the table, a program applies hash function to the element’s key, producing the array index at which the element
is stored.
int Hashtable::hash_function(int id_num)
{
return (id_num % TABLE_SIZE);
}
Type* Hashtable::find (char* key)
{
int index;
index = hash_function(key);
//finding the element
}
Unfortunately, this is not quite simple. For example, we have already stored several employees’ records, and our table looks something like this:

Suppose, our next employee’s key is 57879. Then our hash function will produce array index 79. But the array element with index 79 already has a value.
As the array begins to fill, keys will inevitably produce the same array index when transformed by the hash function. When more than one element tries to
occupy the same array position, we have a collision.
Collision is a condition resulting when two or more keys produce the same hash location.
Now, another question arises: can’t we generate the hash function that never produces a collision?
The answer is that in some cases it is possible to generate such a function. This is known as a perfect hash function.
Perfect Hash Function is a function which, when applied to all the members of the set of items to be stored in a
hash table, produces a unique set of integers within some suitable range. Such function produces no collisions.
Good Hash Function minimizes collisions by spreading the elements uniformly throughout the array.
There is no magic formula for the creation of the hash function. It can be any mathematical transformation that produces a relatively random and unique
distribution of values within the address space of the storage. Although the development of a hash function is trial and error, here are some hints
that may make process easier:
However, finding a perfect hash function that works for a given data set can be extremely time consuming and very often it is just
impossible.
Therefore we must live with collisions and learn how to handle them.
Back to the main page
Back to CIS 22 home page
Luba Leyzerenok, Fall 2003