Red-black trees (part 1)

published: Fri, 29-Feb-2008   |   updated: Sat, 3-Nov-2018

After my few recent rants, I need to cleanse myself with some hardcore algorithms and data structures. That's what you're here for, right?

But what to do? Actually, the answer came pretty quickly: in the last rant I was discussing red-black trees. Unlike the author of the book I was ripping apart, I did some actual research to validate my position. I took books off my shelf, opened them up, and cross-checked them against each other. In doing so I came to the realization — again — that red-black trees are quite amazing data structures. They're seriously cool, and ultra-handy to boot. Yet not many people know how they actually work. In the past, I've written some rough red-black tree code and some good red-black tree code, and I've written about them a couple of times, but this time I think I can do a much better job of it all.

Before launching into this most colorful of data structures, I want to lay some groundwork and set some ground rules. We're going to be writing code in C#, and VB. (Yes, my last rant prodded me into not being afraid of writing in VB.) I warn you, we'll be using the latest versions of .NET for the C# and VB code, no apologies and no excuses. The code will be available for download from this web site. You can respond to these posts by email (the RSS feed has an email me link at the end), and in some very short but unspecified time in the future I'll have a blogging engine with real comments up and running.

On the groundwork front, we're going to start off by talking about ordinary binary search trees, seeing what issues they have, and then building up into full and complete red-black-ness. At the end of the posts, you'll not only know how and why a red-black tree works, but also have some code ready to run in your applications.

Another thing I want to use is figures, figures, figures. Red-black trees are (a) colorful, duh!, and (b) full of bloody difficult edge cases. You just got to have lots of diagrams. On paper, in books like mine, we're left with monochrome printing and so have to say something silly like "the black nodes are, er, white, and the red nodes are shaded grey". Pah! No more! We're going to use nodes properly colored, using these new-fangled color monitor thingies. OK, I may, to stave off sarky comments from Mark Miller, shade these colors somewhat so that they're not glossy black and primary red, but you will always be sure which nodes are red and which are black.

So this brings me to the subject of this first chapter: how to draw binary trees. I admit it: I hate drawing binary trees, and the only way this series is going to work is if I have some automated drawing tools. Adobe Illustrator is a wonderful program, but I've got better things than to link up circles with lines so that the join doesn't show. And, remember, we need figures, figures, figures.

First let's define a couple of things. A binary tree consists of a node (the root) with links to two subtrees, also binary trees. These are normally called the left child and the right child. Each of the children can either be present or not. As you can see, the definition is recursive. In a binary search tree, the nodes in the left child are smaller than the root, and the nodes in the right child are greater than the root.

The level of a node in a binary tree is the number of links you have to travel down to reach it from the root. The root is therefore at level 0 (no travel involved), and its children nodes are at level 1. Their children are at level 2, and so on.

There are several methods, normally called traversals, to visit every node in a binary tree. By "visit", I mean that the traversal code does something with the node, like draw it. The first three traversals are recursively defined, just like the tree. A pre-order traversal visits the root, and then traverses the left child, and then the right child. An in-order traversal traverses the left child, visits the root, and then traverses the right child. Finally, a post-order traversal traverses the left child, then the right child, and then visits the root.

The fourth standard traversal is the level-order traversal. Here we traverse the tree in level order, starting with level 0. With each level we visit the nodes from left to right. So, in a level-order traversal we'll visit the root, its left child node, its right child node, the left child's left child node, the left child's right child node, and so on, so forth.

An example binary tree
An example drawing of a binary tree

When we draw a tree, we usually draw the root at the top. Its children are drawn to the left and to the right and underneath the root (so that we see a direct representation of the terms left child and right child). The children of the children of the root are drawn in the same manner underneath their parents. Determining where to place each node on the surface we're drawing on is a very hard problem: sometimes it's better to lose a little aesthetics just to get something that draws quickly.

In discussing a drawing algorithm (we'll be showing the Simple Level Drawing Algorithm, by the way), we don't really care about the node as a real entity. All we care about is some basic properties: the left child and the right child. We'll create an interface for this and then use it to define "real" nodes.

C#:
  public interface IBinaryNode {
    IBinaryNode Left { get; set; }
    IBinaryNode Right { get; set; }
  }
VB:
Public Interface IBinaryNode
    Property Left() As IBinaryNode
    Property Right() As IBinaryNode
End Interface

Since a node doesn't know anything about painting itself (and this is good: we don't want to have our eventual red-black tree nodes burdened with paint code), we'll need a class that does know about it. And in fact this class had better be able to draw the links as well. Enter the node painter. We'll create another interface for this, and eventually when we have a concrete node, we'll have to write a concrete node painter to go with it. That way an ordinary binary search tree node will need its node painter to paint it and its key, whereas the node painter for a red-black tree node will now to paint certain nodes in red and others in black.

C#:
  public interface IBinaryNodePainter {
    void DrawEdge(int childNumber, int childLevel, int parentNumber);
    void DrawNode(IBinaryNode node, int number, int level);
    void DrawEdgedNode(IBinaryNode node, int childNumber, int childLevel, int parentNumber);
  }
VB:
Public Interface IBinaryNodePainter
    Sub DrawEdge(ByVal childNumber As Integer, _
                 ByVal childLevel As Integer, _
                 ByVal parentNumber As Integer)
    Sub DrawNode(ByVal node As IBinaryNode, _
                 ByVal number As Integer, _
                 ByVal level As Integer)
    Sub DrawEdgedNode(ByVal node As IBinaryNode, _
                      ByVal childNumber As Integer, _
                      ByVal childLevel As Integer, _
                      ByVal parentNumber As Integer)
End Interface

Imagine that we have a grid overlaying our drawing surface (you can see it in the image above). Each intersection of the lines of the grid has an integer (x, y) coordinate and, for convenience, we'll assume that the x-axis increases from left to right and the y-axis increases from top to bottom (the opposite way we usually draw mathematical graphs). Nodes can only appear at these gridline intersections, which we'll call grid points.

The tree-drawing algorithm, as it runs, will assign grid points to each node in our binary tree: "this is where you should draw this node." The node painter will be responsible for painting the node and also the link to the node's parent. This means that, if you stepped through the tree painter code in the debugger, you'd see the tree being painted from the bottom up.

The y coordinate is essentially going to be the level number of the node, so that the root node will have y coordinate 0, both its children will have y coordinate 1 and so on. The x coordinate is going to be the number of the node as we visit them in an in-order traversal. So the first node we'd visit in an in-order traversal will have an x-coordinate of 0, the second one we visit will be number 1, and so on. Of course, in an in-order traversal, the first node we visit will be the leftmost node.

If you look again at the image, you'll see that x coordinates are not shared: there's only one node for each x coordinate.

The algorithm starts off by being given the root node of the tree. Before we can draw the root node, we have to know its x coordinate and to know that we have to draw its left subtree.

And so we kick off the recursive nature of this algorithm. In effect, we are going to do an post order traversal: draw the left subtree, draw the right subtree, draw the root. Each time we step down into the tree, it'll be the same thing: draw the left, draw the right, draw the node.

An binary tree drawn badly
Drawing edges after nodes looks bad

The main reason for doing it this way is to make it much easier to draw the links. What we shall do when we draw a node is first to draw the line from the gridpoint of the node to the gridpoint of the parent, and then draw the node at its gridpoint. That way the node overlaps the line nicely and we don't have to do some weird trigonometry to try and stop the line at the border of the node's circle. Of course that means that the parent node can only be drawn once both its children have been painted, so that it can be drawn overlapping the lines from its children. The image here shows the links and nodes being drawn the wrong way round.

When we finally draw a node and its link to its parent, we have to assign a node number to it. The node number is essentially going to be our x coordinate. The easiest way to do this is to have a field in the tree painter class that we increment every time we're about to draw a node.

Here's the start of the recursive implementation:

C#:
  public class TreePainter {
    private int lastNodeNumber;
    private IBinaryNodePainter nodePainter;

    public TreePainter(IBinaryNodePainter nodePainter) {
      this.nodePainter = nodePainter;
      Reset();
    }

    private void Reset() {
      lastNodeNumber = -1;
    }

    // ...

    protected int DrawChildren(IBinaryNode parent, int childLevel) {
      DrawLeftSubtree(parent.Left, childLevel);
      int parentNodeNumber = ++lastNodeNumber;
      DrawRightSubtree(parent.Right, childLevel);
      return parentNodeNumber;
    }
    private void DrawRoot(IBinaryNode root) {
      if (root != null) {
        int thisNodeNumber = DrawChildren(root, 1);
        nodePainter.DrawNode(root, thisNodeNumber, 0);
      }
    }
    public void DrawTree(IBinaryNode root) {
      Reset();
      DrawRoot(root);
    }
  }
VB:
Public Class TreePainter
    Private _lastNodeNumber As Integer
    Private _nodePainter As IBinaryNodePainter

    Public Sub New(ByVal nodePainter As IBinaryNodePainter)
        _nodePainter = nodePainter
        Reset()
    End Sub

    Public Sub Reset()
        _lastNodeNumber = -1
    End Sub

    '...

    Public Function DrawChildren(ByVal parent As IBinaryNode, ByVal childLevel As Integer) As Integer
        DrawLeftSubtree(parent.Left, childLevel)
        _lastNodeNumber += 1
        Dim parentNodeNumber As Integer = _lastNodeNumber
        DrawRightSubtree(parent.Right, childLevel)
        Return parentNodeNumber
    End Function

    Public Sub DrawRoot(ByRef root As IBinaryNode)
        If root IsNot Nothing Then
            Dim thisNodeNumber As Integer = DrawChildren(root, 1)
            _nodePainter.DrawNode(root, thisNodeNumber, 0)
        End If
    End Sub

    Public Sub DrawTree(ByVal root As IBinaryNode)
        Reset()
        DrawRoot(root)
    End Sub
End Class

As you can see to draw a tree, we need to draw the root. To draw the root, we need to draw its children (and this operation will return the node number of the root itself), and then we daw the node for the root. To draw the children we first draw the left subtree, we calculate the node number of the parent (it's one more than the last node number we used), and then we draw the right subtree. Finally when all's done, we return the node number of the parent.

Now the real recursion kicks in. Here's how to draw the left subtree.

C#:
    private void DrawLeftSubtree(IBinaryNode node, int level) {
      if (node != null) {
        int thisNodeNumber = DrawChildren(node, level + 1);
        nodePainter.DrawEdgedNode(node, thisNodeNumber, level, lastNodeNumber + 1);
      }
    }
VB:
    Public Sub DrawLeftSubtree(ByVal node As IBinaryNode, ByVal level As Integer)
        If node IsNot Nothing Then
            Dim thisNodeNumber As Integer = DrawChildren(node, level + 1)
            _nodePainter.DrawEdgedNode(node, thisNodeNumber, level, _lastNodeNumber + 1)
        End If
    End Sub

As you can see we draw the children of this node (which gives us the node's number) and then we draw the node itself and its link. Not too difficult. Notice that before the children of this node are drawn, we don't know the node number of this node's parent. It's only afterwards that we can calculate it — it's one more than the last node number used, essentially — and can then draw the link for this node to the parent.

Finally here's how to draw the right subtree.

C#:
    private void DrawRightSubtree(IBinaryNode node, int level) {
      if (node != null) {
        int parentNodeNumber = lastNodeNumber;
        int thisNodeNumber = DrawChildren(node, level + 1);
        nodePainter.DrawEdgedNode(node, thisNodeNumber, level, parentNodeNumber);
      }
    }
VB:
    Public Sub DrawRightSubtree(ByVal node As IBinaryNode, ByVal level As Integer)
        If node IsNot Nothing Then
            Dim parentNodeNumber As Integer = _lastNodeNumber
            Dim thisNodeNumber As Integer = DrawChildren(node, level + 1)
            _nodePainter.DrawEdgedNode(node, thisNodeNumber, level, parentNodeNumber)
        End If
    End Sub
An unbalanced binary tree
A drawing of an unbalanced binary tree

When we enter this method, the parent's gridpoint is fully known, so we save the parent's node number (we can easily calculate the level of the parent). After that we draw the children of this right-hand child, and then draw the link back to the parent and the node itself.

And that's it for the tree-drawing algorithm. The rest of the code is the actual drawing of the links and nodes on the graphics surface, and doesn't really have anything to do with the algorithm. This image shows a less balanced binary tree. and the digit shown in each node is the node number. As you can see from this image, the algorithm aesthetics suffer a little if the tree is not perfectly balanced, but the relationships between the nodes are still very obvious.

C# Source code. VB Source code. Note that this is a work in progress. These links merely point to the state of the code after part 1. There is a lot more to come.

[This is part 1 of a series of posts on red-black trees. Part 2. Part 3. Part 4. Part 5. Enjoy.]