Pong Eksombatchai | March 10, 2017

The Heap data structure is a data structure that maintains a set of items in a semi-ordered manner where the smallest (or largest) item can be retrieved quickly. The beauty of it comes from its ability to efficiently support updates, such as deletions or insertions of items. The heap can either be a min-heap or a max-heap. The min-heap allows fast retrieval of the smallest item, while the max-heap allows fast retrieval of the largest item.



In this post, we will only talk about the min-heap because similar principles apply to the max-heap. A min-heap is essentially a complete binary tree that satisfies the heap property – if node A is the parent of node B then the value of A is less than the value of B. The heap supports the following two operations:

  1. Min extraction - This is done in O(1). The minimum element in the set of items is the root of the binary tree because otherwise the heap property would not be satisfied.
  2. Update - This is done in O(log n). The update step is supported by the heapify operation. The main idea of heapify is to rearrange the complete binary tree such that the heap property is satisfied again after it is broken by an update. An item insertion is done by inserting the node at the lowest level and using the heapify operation on that node. An item deletion is done by replacing the deleted node with the last leaf node and using the heapify operation on that node.

To go into further detail, the heapify operation on node A works as follow:

  1. If A is less than the parent, swap A with the parent. Recursively “surface” A until the heap property is satisfied.
  2. If A is greater than one of its children, swap A with the smaller children. Recursively “plunge” A until the heap property is satisfied.
  3. If heap property is already satisfied. Do nothing.

It is easy to see that the operations can be executed at most the depth of the tree times. Since the tree is a complete binary tree, the depth is O(log n) and thus, heapify is done in O(log n).

Since the heap is also a complete binary tree, it is much simpler to implement the data structure by using an array. Index 1 represents the root of the tree while indices 2i to 2(i + 1) - 1 represents the nodes on the i-th level of the tree.

Real World Application

The heap can be used to implement a priority queue, where a stream of tasks with different priorities arrives at different times. The task with the highest priority can be extracted quickly in O(1), while each update (tasks arriving or ending) takes O(log n).

The heap is also a crucial component in the Dijkstra’s shortest path algorithm, where at each iteration the node with the minimum value has to be retrieved and updated.