CIS 22 - Data Structures: Hash Table


1. Introduction
2. Hash Table, Hash Function, Collisions
3. Handling Collisions
4. Source Code

Introduction
     Is it possible to design a search of O(1)– that is, one that has a constant search time, no matter where the element is located in the list? In theory, this goal is not an impossible dream. Let’s look at an example. We have a list of employees of a fairly small company. Each of 100 employees has an ID number in the range 0 – 99. If we store the elements (employee records) in the array, then each employee’s ID number will be an index to the array element where this employee’s record will be stored.


     In this case once we know the ID number of the employee, we can directly access his record through the array index. There is a one-to-one correspondence between the element’s key and the array index. However, in practice, this perfect relationship is not easy to establish or maintain. For example: the same company might use employee’s five-digit ID number as the primary key. In this case, key values run from 00000 to 99999. If we want to use the same technique as above, we need to set up an array of size 100,000, of which only 100 elements will be used:


Obviously it is very impractical to waste that much storage in order to make sure that each employee’ record in a unique and predictable location.

     But what if we keep the array size down to the size that we will actually be using (100 elements) and use just the last two digits of key to identify each employee? For example, the employee with the key number 54876 will be stored in the element of the array with index 76. And the employee with the key 98759 will be stored in the element of the array with index 59. Note that the elements are not stored according to the value of the key as they were in the previous example. In fact, the record of the employee with the key number 54876 precedes the record of the employee with key number 98759, even though the value of its key is larger. Moreover, we need a way to convert a five-digit key number to two-digit array index. We need some function that will do the transformation. Using this technique we call the array a Hash Table and a function is called a Hash Function.


The following materials were used in creating the website:
  • Data Structures using C and C++ by Y. Langsam, M. Augenstein, and A. Tenenbaum
  • Object-Oriented C++ Data Structures by Harrington, Jan L
  • C++ plus Data Structures by Dale, Nell B.

Back to CIS 22 home page
Luba Leyzerenok, Fall 2003