julian m bucknall >> Smackdown! The worst computer book

Smackdown! The worst computer book

published: Thu, 28-Feb-2008 | updated: Thu, 28-Feb-2008

So, a couple of days ago, when I gravely pronounced Data Structures and Algorithms Using C# by Michael McMillan as the worst computer book I'd ever read (or, more accurately, partially read, for I no longer have the spare neurons to destroy in reading it all), a friend swiftly came back with, are you sure? Well, duh, nitwit, yes I am, didn't you read the three previous posts? He then sent me a PDF of another book, smacked me across the face with a virtual gauntlet, and challenged me. SMACKDOWN!

So in the red corner we have our old friend Data Structures and Algorithms Using C#. And in the blue corner? Data Structures and Algorithms Using Visual Basic.NET by the very same Michael McMillan. Game on!

First challenge. How up to date is it? The red corner is from 2007 and does mention generics every now and then, so is in theory aware of .NET 2.0, but the one in the blue corner is from 2005 and doesn't mention them at all (maybe the beta wasn't available to Mr. McMillan at the time). First round goes to the VB book: it's not updated yet and so doesn't teach you anything that a new VB developer might need now. Mind you the C# book was pretty anemic in this round too (no List<T>, Dictionary<K,V>, remember) but the VB book wins easily. Ding!

Next round. Availability of source code and errata. Bonus points deducted for access to the author. Wow, look at this: both books knocked each other out. There is none! No publicly available source code. No website with errata. No email access to the author. It's like each book was released into the pre-Internet era and we went to the bookstore on our horse and cart to buy them. It's a draw!

(Note: for this particular reader, the VB book, being a PDF, does allow copy and paste, so nominally the C# book should win the round, but the judges are having none of it, they want solution files that can be compiled, for a kick off.)

Next round: accuracy about the .NET Framework. Oooh, tough one, given what I've already said about the matter. Let's look at the VB book in the blue corner for possibly the same quotes that I found in the C# book. Ready?

p.44: "To instantiate a Random object, you have to pass a seed to the class constructor. This seed can be seen as an upper bound for the range of numbers the random number generator can create." Nice try, but no, squared.

p.103: "The [Stack] class is implemented in the .NET Framework as a circular buffer" Wrong!

p.207: "the SortedList class is but a specialization of the DictionaryBase class." Bzzzt!

p.208: The description of how the Hashtable class is implemented is exactly the same, even down to the "when the load factor reaches 1.0, it's doubled" tripe. Score!

And? I know you're all waiting for this...

p.239: "In VB.NET, the ArrayList data structure is implemented using a circularly linked list." AWESOME!

But, come on, there must be more, right? All right, stop twisting my arm, here's another I found:

p.8: "Every object stored on the heap has a special method called a finalizer. The finalizer method is executed as the last step before deleting the object." Gosh, this is so wrong it has an inner beauty all of its own. But, wait a moment, is it there in the C# book? Yes, by golly, it is! The exact same text. (It seems pedantic of me to quote this bit as well: "you can’t even be sure an object’s finalizer method will run at all, but we know that before we can be certain an object is deleted, its finalizer method must execute." but I'm nothing if not pedantic.)

So, I'm sorry, but round three ia another draw. So far, the VB book is edging ahead in the worst computer book category, but there's still everything to play for.

Round four: accuracy and depth of knowledge of the algorithms and data structures. This will be a tough one, and I feel as if I should explore something new rather than go over old, and by now well-trodden, ground. OK, red-black trees, here we come. Buckle up, people, there might be some rough terrain ahead.

A red–black tree is one in which the nodes of the tree are designated as either red or black, depending on a set of rules. By properly coloring the nodes in the tree, the tree stays nearly perfectly balanced. [...]

The following rules are used when working with red–black trees:

1. Every node in the tree is colored either red or black.

2. The root node is colored black.

3. If a node is red, the children of that node must be black.

4. Each path from a node to a Nothing reference must contain the same number of black nodes.

First sentence is good. The second sentence seems to imply that if we just color the nodes, we'll be fine. We could call this the Crayola rule for red-black trees if it wasn't just silly, there is a lot more to it than this. I think what he meant to say is this: by properly maintaining the rules on colorization as we manipulate the nodes in the tree (inserting and deleting them, performing rotations), the tree stays roughly in balance.

The "nearly perfectly" though is rubbish. In fact, Robert Sedgewick (co-inventor of the red-black tree, whose book is cited in the references in both our contenders) derived a nice mathematical rule for how balanced a red-black tree would be. It's expressed as the length of a search for a node: a search for a node in a red-black tree with N nodes needs less than 2*log(N)+2 comparisons. (The value for a perfectly balanced tree is log(N)+1.) If you like, the path with the longest length will not be more than twice the length of the shortest — which is somewhat obvious if you think of the longest path as alternating red and black nodes and the shortest as having just black ones.

And now we get to the rules themselves. Rule 1 is absolutely 100% correct, every node in a red-black tree is either red or black. (It is interesting though that Sedgewick prefers to label the links between the nodes as either red or black, rather than the nodes themselves. However, it's a short step to labelling the node at the end of a link as being the same color as its link.)

Rule 2 is unequivocably false. The root node can be red quite easily. What rule 2 should say involves something called external nodes. Every node in a red-black tree has two children. Those nodes at the periphery of the tree (what you might call the leaf nodes) still have two children, but the links to these children are both null or Nothing. Imagine that these null links point to some external nodes whose contents we're not aware of or that have no effect on balancing the tree. These external nodes are colored black. (See Introduction to Algorithms by Cormen et al.)

In the Sedgewick equivalent but parallel universe, nodes have no color, but links do. The links to the external nodes in the Sedgewick universe are black.

Rule 3 is fine, and Rule 4, although correct, seems to assume that you know about external nodes being black without actually being told that.

The next bit really has to be read to be believed, but I'll try to give a flavor of it here.

There are two paragraphs that describe how to insert a new node into a red-black tree. The first paragraph quite correctly points out that trying to add a new black node at a leaf (which, if you remember, is always where you insert nodes in a binary search tree) violates rule 4 since it's always guaranteed to increase the number of black nodes along that path from the root. (The only exception is if it's the root itself: the tree was empty and now it contains a single node.) So the first rule of insertion is that you should make the node that's being inserted red.

The second paragraph then takes the tack of using a concrete example and tries to describe what happens when you add that new red node to a node that is itself red. You are invited to look at the before picture of a red-black tree that shows the situation. OK, you recognize that it's a violation of rule 3 (a red node cannot have a red child), and then you're left in the dust. The paragraph veers off and says, but, if these nodes' colors here are swapped, this node is rotated to that node there "and then [you] perform other rotations", you get the after picture. There is no explanation why this bizarre tack was taken, why these nodes are recolored but those are not, there's no real analysis of that particular case, or even if there are other cases to consider (there are). "Do this, that, the unspecified hand-waving other, and we're obeying the red-black rules again." To top it off, the after picture doesn't even have the node that's just been added, the one that started all these shenanigans.

And that's all the explanation you get for inserting a new node into a red-black tree. The very next thing that you see is two pages of code with no explanation of what's going on, which admittedly would be a bit hard to understand since you've not been given any foundation of such questions as, although rotations preserve the ordering of a binary search tree, how do they affect the rules of a red-black tree? What nodes need to be recolored after a rotation and why?

I'm sorry but red-black tree insertion code is pretty hard to write and understand anyway even with some nice and accurate diagrams, so I'm not even going to try to scrutinize Mr. Mcmillan's. I dare say, since he doesn't describe all the cases for node insertion anyway, there's going to be bugs. I don't have his test code obviously so I can't easily do experiments. (Actually, just skimming the Insert method shows that he has no comprehension of the different cases for insertion. Here be bugs.)

And, of course the C# book just repeats the same text with the same figures. At least the word Nothing was replaced with the word null. So, again we have a tie.

(Late news: the text names the red-black tree class as RedBlack, twice, but the actual code shown uses RBTree for the class name.)

OK, the final stretch now. The code itself. Well, sorry and all that but the C# book has a slam dunk here. no contest. It's so bad, my guess is that it was converted in situ inside the book's document. To put it another way, no Visual Studio instance was harmed in the preparation of the C# code in the book.

It is also clear that Michael McMillan has a greater facility with Visual Basic than C#. However that's not saying much: I don't know VB very well yet these code snippets stand out as being wrong:

p.28: Mr. McMillan seems to have a great difficulty with indexing into an array whose first item is index 0. Here's an off-by-one bug again:

Private Function IsFull() As Boolean
  If (pArr(pCapacity) <> Nothing) Then
    Return True
  Else
    Return False
  End If
End Function

(Here pCapacity is the number of slots in the array for items. Reading the item at index pCapacity will result in an out-of-bounds exception.) Note also the rookie mistake of evaluating a boolean expression and returning true if it was true and false otherwise. You can just return the value of the boolean expression. (Note to self, ahhhh, that's why true is spelt True in the text of the C# book and false, False: it's how it's written in VB.)

p.69: there are times when Mr. McMillan writes some absolute howlers. Here's a method that shifts all elements in an array down by 1 from a given position, in order that a new item can be inserted at that position.

Sub ShiftDown(ByVal arr() As Integer, ByVal pos _
                                      As Integer)
  Dim index As Integer
  For index = pos To arr.Length - 2
    arr(pos + 1) = arr(pos)
  Next
End Sub

(Hint: think about you end up with in this array from pos onwards.)

p.231: creating a new object on the heap and then immediately throwing it away (a pattern repeated elsewhere several times).

Dim current As New Node()
current = header

p.231: a Find method for a singly linked list (called a "singularly linked list" on p.234, which seems to imply links of some hertofore unseen structure) that will crash if the item being searched for isn't there (hint: at the end of this list is a Nothing link).

Private Function Find(ByVal item As Object) As Node
  Dim current As New Node()
  current = header
  While (current.Element <> item)
    current = current.Link
  End While
  Return current
End Function

p.309: comparison of a boolean value with an integer, using two underscores to continue the line.

If ((item.CompareTo(grandParent.element) < 0) __
     <> (item.CompareTo(parent.element))) Then

All right, suffice it to say that Michael McMillan is a haphazard and sloppy programmer, but indubitably his C# skills are the lesser. A convincing win for the red corner, methinks, despite some strong possibilities from the blue corner.

Smackdown! Data Structures and Algorithms Using C# is still the worst computer book I've read, but nevertheless Data Structures and Algorithms Using Visual Basic.NET is definitely the second-worst computer book.

[This is part 4 of a series of posts on this book. Part 1. Part 2. Part 3. Enjoy.]