Red-black trees (part 2)

published: Mon, 3-Mar-2008   |   updated: Sat, 3-Nov-2018

Now that I have the technology available to draw trees quickly and accurately (and you have seen the algorithm and its implementation at first hand), it's time to go back to the main subject: red-black trees. Except that I have some more foundation to lay first. So in this episode we'll look at binary search trees, and how to insert new nodes into them. After all, visually, a red-block tree is a binary search tree with colorful nodes.

Last time we saw the definition of a binary tree. It's a recursive definition, and to put it in a slightly different way: a binary tree is either a single node or is a node with one or two binary trees as children, known as the left and right child.

In a binary search tree we impose a little more structure to this bald definition by adding an ordering relation. By "ordering relation" I mean that, given two nodes, I can say that one is smaller than the other. (We'll ignore equality for the moment.) Of course I'm using shorthand here: by saying that one node is smaller than another, I'm really saying that each node contains a "key" (a value of some type), and it is those key values that get compared.

Let us imagine that our nodes are containers for single character keys (they're are easier to draw inside a circle representing a node). We can look at two distinct nodes and say that the character key in this node is smaller than the character key in that one. Or if the keys were integers, that the integer in this node is smaller than the integer in that node. Mind you it gets a little tedious to continue saying "the key in this node is smaller than the key in that one", so we'll continue to take the shortcut and say "this node is smaller than that one", understanding that the ordering relation is on the keys not on the nodes themselves.

A complete balanced binary search tree
A complete, balanced binary search tree

Having been utterly pedantic on that point, we can now give the definition of a binary search tree. A binary search tree is a binary tree where, for every node, the left subtree, if present, contains nodes that are all smaller than that node, and the right subtree, if present, contains nodes that are greater than that node. In fact, if you were to list all the nodes using an in-order traversal, you'd find that all the nodes were... in order. (BTW, a quick note to Dustin is in order here: check out the gradient fill, dude.)

To find a given key in a binary search tree, we can use a recursive or iterative process, much as we do with binary search. Although many search routines are written as iterations, as will mine, it'd be instructive for you to try and write it as a recursive method. To find s in the tree shown, we'd start at the root (it's the only place we know of, anyway). Does the key in the root equal s? No, so in which child subtree will it be found? Since s is greater than p, it'll be in the right subtree. Move down to the right child, and repeat the same process. Is it equal? No? Then in which subtree will it be found? Every time we reach a node and apply our comparison check, we eliminate "half" the nodes remaining in the tree. I put "half" in quotes because it depends on the balance of the tree, which we'll get to in a moment.

If we ever get to the point where we can't move down the tree any further and haven't yet found the search key, then we can say that the value we seek is not there.

The Find operation for a binary search tree is very efficient, provided that the tree is balanced. By "balanced" I mean that the tree looks bushy — every internal node generally has two children — and that there are no long "twigs" where there's a chain of nodes with only one child, one after the other. In other words, every path from root to leaf (those are the nodes at the bottom) is roughly the same length. I've cheated in my image above: I created a perfectly balanced tree. Not only that, but it is complete — the last level has all its possible nodes.

Using this tree, we can make some calculations. If there were only one level in the tree, we'd find our key (or not, as the case may be) with only one comparison. If there were two levels, there would be three nodes, but we'd only need to make at most two comparisons (I'm assuming that a comparison returns one of three values, less than, equal to, or greater than, rather than say just is greater than or not, requiring another comparison for equality). For three levels, there are at most seven nodes, but only three comparisons are required. For four levels, 15 nodes and four comparisons. Now — watch carefully here — for n levels, there would be at most 2n-1 nodes, and we'd only need a maximum of n comparisons in order to find a key or determine if it wasn't there at all.

In general, with some hand-waving, we can say that for a reasonably balanced tree (rather than the perfectly balanced ones I just used in our thought experiment), for n nodes we'd need at most log(n) comparisons to determine whether a key was present or not. And because comparisons tend to take the most amount of time when searching (following a link is simple stuff indeed), we say that the Find operation in a reasonably balanced binary search tree is O(log(n)). Or, to put it another way, if you square the number of items in a binary search tree (say from 100 to 10,000), and maintain its balance, you only double the time it takes to find a value (or not).

Compare that result with a linked list, say. Since the number of comparisons needed to determine that a value is not present in a linked list is equal to the number of items in the linked list (we assume that the items are not ordered), merely doubling the number of items (say from 100 to 200) doubles the time taken.

Let's look at inserting nodes into a binary search tree, since repeated applications of this operation is the means by which we get a binary search tree to search in the first place.

Inserting the first node

The easy case: if we're inserting a node into an empty tree, the node becomes the root of the tree. Here the p key is the first node in the tree.

Otherwise, we search for the new key to be inserted. If we find the key already in the binary search tree, we throw an exception: equal keys are not allowed.

(Aside: what to do about equal keys in a binary search tree is a thorny question. Over the years of using keyed containers like the binary search tree, I've never really hit a scenario where I absolutely, positively, definitely had to have two or more items with the same key in the container. I've always had more than enough information to differentiate the items, maybe with the help of other fields of the item. At a last resort, I can see using bucketing as a solution: the keyed container just contains a bucket of items for each key. Hence, in this series, I'm going to insist on no key duplicates.)

Eventually we reach a leaf node without finding the key. The new item (key and value pair) will be a child of this node, which side depending on whether the key we're inserting is greater than or less than this node. And that's it.

Inserting the second node

Adding the g key. Starting at the root we see that g is less than p, and so we add it as the left child of p since it hasn't one yet.

Inserting the third node

Adding the w key. Starting at the root we see that w is greater than p, and so we add it as the right child of p since it hasn't one yet.

Inserting the fourth node

Adding the s key. Starting at the root we see that s is greater than p, and so we move down to the w node. s is less than w, so we add it as the left child of w since it hasn't one yet.

Inserting the fifth node

Adding the c key. Starting at the root we see that c is less than p, and so we move down to the g node. c is less than g, so we add it as the left child of g since it hasn't one yet.

Inserting the sixth node

Adding the u key. Starting at the root we see that u is greater than p, and so we move down to the w node. u is less than w, so we move down to the s node. Finally u is greater than s, so we add it as the right child of s.

And so on, so forth (the code for this article shows adding nodes one by one, until the complete, balanced tree above is created).

Before we write the code, we need to address the question of how to test it. After all, it's a real cop-out if I just say: print off the tree after each insertion and ascertain visually that the binary search tree is valid. Maybe elsewhere authors can get away with that, but not here.

I can see two possible verifications for our tests: first: can we find the key we just inserted? Second, looking at every node in the tree, is the left child, if present, smaller than this node, and the right child, greater? (This test could be implemented using a pre-order traversal.) A third possibility — the one I'll use — is to check that an in-order traversal lists the keys in ascending order (this is equivalent to the second option, but miles easier to implement).

To thoroughly test it I'll just add a couple of thousand random numbers as keys to a new tree. For each one added, I'll check the structure to make sure we're not creating an invalid tree.

As for the actual code, it's a great example of removing code duplication. I certainly didn't originally write it like this, but it involved a set of refactorings using Refactor! Pro, trying to remove the duplication. Consider this: everything we do with a binary search tree involves a search for a key. Want to see if a key is there? Search for it. Want to get the value for a key? Search for it. Want to insert a new key and value? Search for the key, fail to find it and then add a new node where you stopped searching.

First up: the node class for a binary search tree:

C#:
  public class BsTreeNode<K, V> : SimpleNode where K : IComparable<K> {

    public BsTreeNode(K key, V value) {
      this.key = key;
      this.value = value;
    }

    public IBinaryNode Advance(K key) {
      int compareResult = key.CompareTo(Key);
      if (compareResult == 0) return this;
      if (compareResult < 0) return Left;
      return Right;
    }

    private K key;
    public K Key {
      get { return key; }
      set { key = value; }
    }

    private V value;
    public V Value {
      get { return value; }
      set { this.value = value; }
    }
  }
VB:
Public Class BSTreeNode(Of K As IComparable(Of K), V)
    Inherits SimpleNode

    Private _key As K
    Public Property Key() As K
        Get
            Return _key
        End Get
        Set(ByVal Value As K)
            _key = Value
        End Set
    End Property

    Private _value As V
    Public Property Value() As V
        Get
            Return _value
        End Get
        Set(ByVal Value As V)
            _value = Value
        End Set
    End Property

    Public Sub New(ByVal key As K, ByVal value As V)
        _key = key
        _value = value
    End Sub

    Public Function Advance(ByVal key As K) As BSTreeNode(Of K, V)
        Dim compareResult As Integer = key.CompareTo(Me.Key)
        If compareResult = 0 Then Return Me
        If compareResult < 0 Then Return Left
        Return Right
    End Function

End Class

As you can see it's generic and is implemented for a key of type K (which obviously must be IComparable) and a value of type V. There's a peculiar method called Advance() though. It's called on a node, takes a key, and returns either the node itself if the node has that key, or the left or right child of the node depending on where the key should be found if it is there (so if the key is less than the node, the left child is returned, if not, the right child). A nice little utility routine that we're about to use in our search methods.

I then wrote a simple method of the BinarySearchTree class find the node for a key. It uses this utility method from the node class.

C#:
  public class BinarySearchTree<K,V> where K : IComparable<K> {

    private BsTreeNode<K,V> root;
    public BsTreeNode<K,V> Root {
      get { return root; }
      set { root = value; }
    }

    private BsTreeNode<K, V> GetNode(K key) {
      BsTreeNode<K, V> walker = root;
      while (walker != null) {
        BsTreeNode<K, V> nextNode = walker.Advance(key) as BsTreeNode<K, V>;
        if (nextNode == walker) {
          return nextNode;
        }
        walker = nextNode;
      }
      return null;
    }
  }
VB:
Public Class BinarySearchTree(Of K As IComparable(Of K), V)

    Private _root As BSTreeNode(Of K, V)
    Public Property Root() As BSTreeNode(Of K, V)
        Get
            Return _root
        End Get
        Set(ByVal Value As BSTreeNode(Of K, V))
            _root = Value
        End Set
    End Property

    Public Function GetNode(ByVal key As K) As BSTreeNode(Of K, V)
        Dim walker As BSTreeNode(Of K, V) = _root

        While walker IsNot Nothing
            Dim nextNode As BSTreeNode(Of K, V) = _
                DirectCast(walker.Advance(key), BSTreeNode(Of K, V))
            If Object.ReferenceEquals(nextNode, walker) Then
                Return nextNode
            End If
            walker = nextNode
        End While
        Return Nothing
    End Function
End Class

As you can see, it starts at the root and continues advancing until either the correct node is found or we hit a null reference. Having written this new utility method, writing a Contains() method is simplicity itself.

C#:
    public bool Contains(K key) {
      return GetNode(key) != null;
    }
VB:
    Public Function Contains(ByVal key As K) As Boolean
        Return GetNode(key) IsNot Nothing
    End Function

Ditto, in fact, for the TryGetValue() method, which, when given a key, either returns true, the key was found, and the value for that key, or false and the default value for the V type.

C#:
    public bool TryGetValue(K key, out V value) {
      BsTreeNode<K, V> node = GetNode(key);
      if (node == null) {
        value = default(V);
        return false;
      }
      value = node.Value;
      return true;
    }
VB:
    Public Function TryGetValue(ByVal key As K, ByRef value As V) As Boolean
        Dim node As BSTreeNode(Of K, V) = GetNode(key)
        If node IsNot Nothing Then
            value = Nothing
            Return False
        End If
        value = node.Value
        Return True
    End Function

Next up is the Insert() method. The wrinkle to appreciate here is that as we move down the tree, following links, we have to maintain a reference to the parent of the current node. That way, when we finally hit the null reference at the bottom of the tree, we have a node to which we can attach the new node, either as the left child or the right one.

C#:
    public void Insert(K key, V value) {
      BsTreeNode<K, V> parent = null;

      BsTreeNode<K, V> walker = root;
      while (walker != null) {
        BsTreeNode<K, V> nextNode = walker.Advance(key) as BsTreeNode<K, V>;
        if (nextNode == walker)
          throw new Exception("Binary Search Tree: inserting duplicate key");
        parent = walker;
        walker = nextNode;
      }

      walker = new BsTreeNode<K, V>(key, value);
      if (parent == null)
        root = walker;
      else if (key.CompareTo(parent.Key) < 0)
        parent.Left = walker;
      else
        parent.Right = walker;
    }
VB:
    Public Sub Insert(ByVal key As K, ByVal value As V)
        Dim parent As BSTreeNode(Of K, V) = Nothing

        Dim walker As BSTreeNode(Of K, V) = _root

        While walker IsNot Nothing
            Dim nextNode As BSTreeNode(Of K, V) = _
                DirectCast(walker.Advance(key), BSTreeNode(Of K, V))
            If Object.ReferenceEquals(nextNode, walker) Then
                Throw New Exception("Binary Search Tree: inserting duplicate key")
            End If
            parent = walker
            walker = nextNode
        End While

        walker = New BSTreeNode(Of K, V)(key, value)
        If parent Is Nothing Then
            _root = walker
        ElseIf key.CompareTo(parent.Key) < 0 Then
            parent.Left = walker
        Else
            parent.Right = walker
        End If
    End Sub

This actually is the most complex method of the lot with a Cyclomatic Complexity of 5, right under my red flag.

Now with all that code written, we can write a quick indexer for the binary search tree class, just like the real CLR.

C#:
    public V this[K key] {
      get {
        BsTreeNode<K,V> node = GetNode(key);
        if (node == null)
          throw new Exception("Binary Search Tree: Key not found using indexer");
        return node.Value;
      }
      set
      {
        BsTreeNode<K,V> node = GetNode(key);
        if (node == null)
          Insert(key, value);
        else
          node.Value = value;
      }
    }
VB:
    Default Public Property ValueByKey(ByVal key As K) As V
        Get
            Dim node As BSTreeNode(Of K, V) = GetNode(key)
            If node Is Nothing Then
                Throw New Exception("Binary Search Tree: Key not found using indexer")
            End If
            Return node.Value
        End Get
        Set(ByVal value As V)
            Dim node As BSTreeNode(Of K, V) = GetNode(key)
            If node Is Nothing Then
                Insert(key, value)
            Else
                node.Value = value
            End If
        End Set
    End Property

I won't show the test code here, but suffice it to say, the code above works perfectly.

Next time we have a bit more work to do with the basic binary search tree (mostly talking about how balanced a tree is), and then we'll be ready to discuss a red-black tree.

C# Source code. VB Source code. Note that this is a work in progress. These links merely point to the state of the code after part 2. There is a lot more to come.

[This is part 2 of a series of posts on red-black trees. Part 1. Part 3. Part 4. Part 5. Enjoy.]