Writing a priority queue in C# (part 3)

published: Tue, 17-Feb-2004   |   updated: Tue, 20-Apr-2004

In the previous installments of this occasional series ( here's part 1 and here's part 2), we'd looked at how to write a priority queue in C#, such that it followed the standards set by the System.Collections namespace.

At the end of the last installment we ended up with a PriorityQueue class that used the heap algorithm, that stored object instances that, used integers as the priority, and that implemented ICollection (so that it can be viewed as a .NET collection).

Before we move on, the code I published last time has a subtle bug. Not that the priority queue doesn't enqueue and dequeue items properly, you understand, but that it likes to keep a hold of objects well beyond their possible deaths.

To understand what the bug is, we need to quickly review some basic .NET functionality. In .NET you don't have to free any of the objects you create since the .NET Common Language Runtime includes a garbage collector that doe sit for you. This garbage collector periodically marks all objects that are reachable and sweeps up the remainder (the unreachable ones) to reclaim their heap memory. (There's a little more to it than that, of course, but this will suffice for now.)

So, what are unreachable objects? Well, when the garbage collector (let's call it GC from now on) runs, it works out which objects have active references. It looks in the method that was currently executing (so all its local variables will be marked as reachable), as well as the chain of methods that called the current one. For some objects, there will be references to other objects within them, so those also get marked as reachable. The GC, in essence, creates a graph of objects (a graph is a data structure consisting of vertices — the reachable objects, in our case — and edges — the links between the objects). After this analysis is complete, the GC knows which objects are reachable from the currently executing code, and therefore which objects are not. It then sweeps up (or compacts) the unreachable memory.

So, let's go back to our priority queue and especially the Dequeue() method.

public object Dequeue() {
    if (count == 0) 
        throw new InvalidOperationException();
    
    object result = heap[0].Item;
    count--;
    trickleDown(0, heap[count]);
    version++;
    return result;
}

What's happening here? Well, we save the object at the top of the heap (the zeroth item), decrement the count (there's now one less item in the heap), and we trickle down the item at the last index of the heap.

Actually, the English language is masking the problem. Suppose we had 10 items in the heap to begin with, numbered 0 to 9. We make a copy of object 0, decrement the count to 9 (so that the items are numbered from 0 to 8), and then trickle down the item at position 9. This does not cause an out-of-bounds exception, by the way; the array onto which we're mapping the heap has many more items available.

Anyway, after the trickle down operation, the old items from 1 to 8, together with the old item 9, will have been rearranged in some way to form a new ordering of the nine items. Any reference to the old item 0 will have disappeared in the process (luckily we have a copy). However — and here's the bug — there will still be a reference to the old item 9 at index 9 in the array.

In other words, if we were to enqueue ten items, dequeue them all, and abandon these dequeued references, the ten objects will not die, at least until the priority queue is allowed to die.

PriorityQueue pq = new PriorityQueue();
for (int i = 0; i < 10; i++) {
  pq.Enqueue(new object(), i);
}
while (pq.Count != 0) {
  // dequeue and throw away the object
  pq.Dequeue();
}
// but, all 10 objects are still alive!

This is bad news indeed. The Dequeue() method should in fact be coded like this:

public object Dequeue() {
    if (count == 0) 
        throw new InvalidOperationException();
    
    object result = heap[0].Item;
    count--;
    trickleDown(0, heap[count]);
    heap[count].Clear();
    version++;
    return result;
}

Here we make use of a new method Clear() of the HeapEntry struct that resets its item reference and priority.

Now that we've cleared up this anomaly, let's move on to some extra functionality.

We've so far agreed to make the priority an integer value: higher values have higher priority. I can imagine other types of priority as well, perhaps integers where lower values have higher priority, perhaps floating point values, perhaps a full-blown priority class. We should change our priority queue so that the priorities it uses are not assumed to be simple ascending integers.

The easiest way to do this is make the priority an IComparable instance instead (that is, an instance of a class that implements IComparable.CompareTo()). As it turns out, apart from the various method signatures that accept a priority, we only have to change two lines of code, where we compared the integer values.

First, let's write the test:

[Test] public void TestPriorityType() {
    pq.Enqueue("item one", new ReversePriority(12));
    pq.Enqueue("item two", new ReversePriority(5));
    string s = (string) pq.Dequeue();
    Assertion.AssertEquals(
        "Dequeuing item should set count to 1",
        1, pq.Count);
    Assertion.AssertEquals(
        "Dequeuing should retrieve highest priority item",
        "item two", s);
    s = (string) pq.Dequeue();
    Assertion.AssertEquals(
        "Dequeuing final item should set count to 0",
        0, pq.Count);
    Assertion.AssertEquals(
        "Dequeuing should retrieve next item",
        "item one", s);
}

This fails, of course. My intent here is to have the ReversePriority class organize priorities in reverse order, so to make the test pass we first have to write this class:

public class ReversePriority : IComparable {
    private int priority;

    public ReversePriority(int priority) {
        this.priority = priority;
    }

    public int CompareTo(object obj) {
        ReversePriority castObj = obj as ReversePriority;
        if (castObj == null) 
            throw new InvalidCastException();
        if (priority < castObj.priority) 
            return 1;
        if (priority == castObj.priority)
            return 0;
        return -1;
    }
}

Now we can write the code in HeapEntry and PriorityQueue that will make the test pass:

public struct HeapEntry {
    private object item;
    private IComparable priority;
    public HeapEntry(object item, IComparable priority) {
        this.item = item;
        this.priority = priority;
    }
    public object Item {
        get {return item;}
    }
    public IComparable Priority {
        get {return priority;}
    }
    public void Clear() {
        item = null;
        priority = null;
    }
}

public class PriorityQueue : ICollection {
    ...

    public void Enqueue(object item, IComparable priority) {
        if (count == capacity)  
            growHeap();
        count++;
        bubbleUp(count - 1, new HeapEntry(item, priority));
        version++;
    }

    private void bubbleUp(int index, HeapEntry he) {
        int parent = getParent(index);
        // note: (index > 0) means there is a parent
        while ((index > 0) &&
              (heap[parent].Priority.CompareTo(he.Priority) < 0)) {
            heap[index] = heap[parent];
            index = parent;
            parent = getParent(index);
        }
        heap[index] = he;
    }

    ...

    private void trickleDown(int index, HeapEntry he) {
        int child = getLeftChild(index);
        while (child < count) {
            if (((child + 1) < count) &&
                (heap[child].Priority.CompareTo(heap[child + 1].Priority) < 0)) {
                child++;
            }
            heap[index] = heap[child];
            index = child;
            child = getLeftChild(index);
        }
        bubbleUp(index, he);
    }

    ...
}

Nothing too surprising here, as I've already said. In essence the priority is an IComparable instance, and the comparisons between two priorities are done with CompareTo().

The next thing we should consider for our PriorityQueue class is the serialization of an instance. On the one hand this is really simple (just add the Serializable attribute to the PriorityQueue and the HeapEntry classes), but on the other hand the simple answer is very wasteful, at least in terms of space. Let's see why.

The PriorityQueue class contains an array of HeapEntry items. This array is deliberately created too large so that we have room to add new items to the priority queue without constantly growing the array. If we just add the Serializable attribute to the PriorityQueue class, the serialization code will save all of our members (good), but will, in serializing the array, serialize a whole set of null entries (not so good). For a little extra effort, we can serialize only the non-empty elements in the array.

First thing we do is to write the serialization test.

[Test] public void TestSerialization() {
    pq.Enqueue("item one", new ReversePriority(12));
    pq.Enqueue("item two", new ReversePriority(5));

    Stream s = new MemoryStream();
    IFormatter f = new BinaryFormatter();

    f.Serialize(s, pq);
    s.Position = 0;
    PriorityQueue pq2 = (PriorityQueue) f.Deserialize(s);
    s.Close();

    string str = (string) pq2.Dequeue();
    Assertion.AssertEquals(
        "Dequeuing item should set count to 1",
        1, pq2.Count);
    Assertion.AssertEquals(
        "Dequeuing should retrieve highest priority item",
        "item two", str);
    str = (string) pq2.Dequeue();
    Assertion.AssertEquals(
        "Dequeuing final item should set count to 0",
        0, pq2.Count);
    Assertion.AssertEquals(
        "Dequeuing should retrieve next item",
        "item one", str);
}

This test fails, of course, so the first thing we should do is to mark the classes with the Serializable attribute, and re-run the test.

[Serializable()]
public struct HeapEntry {
    ...
}

[Serializable()]
public class PriorityQueue : ICollection {
    ...
}

The test passes; however, we've already discussed that the size of the serialized priority queue will generally be larger than it should be.

Let's refactor. Mark the PriorityQueue class as implementing ISerializable. By doing so, we are telling the serialization engine that we'll be doing the serializing. In adding the ISerializable interface, we must implement the GetObjectData() method:

void ISerializable.GetObjectData(SerializationInfo info, StreamingContext context) {
    info.AddValue("capacity", capacity);
    info.AddValue("count", count);
    HeapEntry[] heapCopy = new HeapEntry[count];
    Array.Copy(heap, 0, heapCopy, 0, count);
    info.AddValue("heapentries", heapCopy, typeof(HeapEntry[]));
}

What's happening here? Well, first of all, we shall serialize the capacity and the count values. We need these later when we deserialize the object, so that we can reconstitute the queue. Now we need to serialize the heap array, but as we've stated, it could have a lot of empty entries. So we create a temporary array of exactly the right size, copy all the entries into it and then serialize this temporary array.

What about deserialization? Well, here we need to write a protected constructor for the deserialization engine.

protected PriorityQueue(SerializationInfo info, StreamingContext context) {
    capacity = info.GetInt32("capacity");
    count = info.GetInt32("count");
    HeapEntry[] heapCopy = 
        (HeapEntry[]) info.GetValue("heapentries", typeof(HeapEntry[]));
    heap = new HeapEntry[capacity];
    Array.Copy(heapCopy, 0, heap, 0, count);
    version = 0;
}

First we read the capacity and the count. Then we read the array of entries, count items in all. At this point, we can create out internal array (of size capacity, not count), and then copy the entries from the temporary array into the internal array. Finally we set version to 0 (we didn't serialize it, though we could have done).

Finally, after all this refactoring, we run the tests again to show that they still pass.

(By the way, notice that we're repeating the names of the serialized values in both the constructor and GetObjectData(). A definite code smell that we should remove by using some constants. I won't show that code here.)

This time around in our series on implementing a priority queue in C#, we've discussed a subtle bug in the previous version, one that wasn't (and couldn't really be) brought out by the unit tests, and then solved it. Then we changed the definition of a priority from an integer to an instance of IComparable, so that we could do some more interesting things with priorities. Finally, we showed how to implement efficient serialization for the priority queue class.

The latest code (as of the end of this article) can be found here.