Design and Code Reviews

published: Thu, 26-Feb-2004   |   updated: Sat, 6-Aug-2016

One thing that has been on my mind recently is the subject of design and code reviews. (Warning: this is a long one, folks.)

This was triggered by several independent things happening at once, almost as if they were related.

The first was that MSDN Magazine published an article that I reviewed whilst still at Microsoft. The subject was a discussion of using the standard I/O streams in console apps in C#, and how you can use them with pipes from the command prompt. The author, Michael Brook, went in a lot of satisfying detail about how to do all this, and the number of issues I could find were small, but not insignificant. Some of my comments he took to heart, and some he ignored (to the article's detriment, in my view).

The second was that MSDN (the online version, not the magazine this time) published another article in the series on data structures in C# and .NET. I've already commented on the second in the series, showing that some minor but important theoretical problems had crept past both the author and the reviewer. Well the third article in the series has some design and code problems to which I need to devote another article. (And now the fourth article has been posted, with more issues. Will I be able to keep up?)

The third was a contract I've been working on for a while. The work needed is some enhancements to an older Delphi 3 (scream!) application, which was recently purchased by my client.

Man, oh, man, this code is dreadful. Let me count the ways, and in doing so draw some conclusions about design and code reviews, as well as design and code patterns.

Global variables are evil

Now, I'm sure I don't need to discuss why global variables are bad, but just in case: having a global variable means that you are giving up the ability to ascertain when and where such a variable is being initialized, is being modified, is being read. You may know all about it right now as you write the program, but will you remember in six months time? Will the guy who has to maintain it (Hello! That's me!) understand the semantics of using this global variable?

An example from my early programming days. In those far off times I used to write apps in RPG II for the IBM System/34 and RPG III for the IBM System/38. Every variable in an RPG program is global (it may be different now, I haven't kept up). There were also a set of 99 boolean values called flags and having the identifiers 01 to 99. You got into the habit of commenting every routine's effect on variables and flags. Anyway, there was one program I worked on for a small bank that would fail in peculiar ways in certain conditions. It turned out to be a flag being reused in a routine being called from another.

Anyway, here's a global variable from this source code I'm modifying:

var
  ok : boolean;

Yessss! I kid you not. No indication with the name that this is a global variable, no subtle warning that you may screw something up if you use or change it. Since the same name is used for local boolean variables throughout the source code (I kid you not, again), it becomes somewhere between hard and difficult to identify where it's used and when it's changed.

Mixing data access code and presentation code is bad

More of an architectural issue this time, I suppose. Throughout this application's source code are SQL statements being constructed and used. It is nigh on impossible to create any unit tests for this kind of code. This source code goes even further: it deliberately adds methods to form classes that just do database work. Let's see what this means: in order to use this method to get at the data you want, you have to create an instance of the form. Thank you very much, but doubleplus ungood.

Actually this points to one of the more difficult design problems we face as database application developers: how to divorce the data access code from the presentation code (by presentation code I mean code that presents the data to the user on the screen, in a browser, on a piece of paper, on a mobile phone's display). Achieving this separation provides a whole host of benefits: better testability and quality (you can test the data access code in an automated fashion), greater ease in supporting other database engines (say, you provide an Access version for single users, but a higher-priced SQL Server version for corporate customers), more ability to optimize database access (the SQL statements will all be in one place in the code not spread throughout the application), improved code reuse (avoiding those "I need this data in this form, so I'll just code another SQL statement to get it, without looking elsewhere in other forms' code to see if it exists already), and so on.

try..finally is your friend

We all know that try..finally helps you write more robust programs that don't suffer from memory and resource leaks. Looking at this program I'm maintaining I see (some identifiers changed to protect the guilty):

procedure TMainForm.ShowBlahFormClick(Sender: TObject);
begin
  with TBlahForm.create(self) do
  try
    showmodal;
  finally
    free;
  end;
end;

which is pretty good, nice and safe, and:

procedure TMainForm.ShowOtherBlahClick(Sender: TObject);
begin
  OtherBlahForm := TOtherBlahForm.create(self);
  OtherBlahForm.showmodal;
  OtherBlahForm.free;
end;

which is frankly appalling. (And both of these snippets are from the same source file: let's all scream in terror together!) Let's see why. First, it doesn't use a try..finally block. Second, it uses a global variable called OtherBlahForm. Admittedly, that's partly due to the annoying habit of Delphi to auto-create a global variable of the right type when you create a new form. My recommendation is to delete this auto-created variable as soon as you can.

The final problem with this second snippet of code is the usual one about global variables. Suppose you're in a different source file and you reference OtherBlahForm. Is is initialized so you can use it as is? will your app crash if you do so? can you create the reference again? will there be a resource and memory leak if you do so?

A corollary of this point is that the code reviewer (if not the developer) must be on the alert to resource and memory allocations, and track where they're freed. Is an instance being created and used locally? Then there must be a try..finally block to ensure that it's freed. (This, in essence, is why the first code snippet in this section is safe code, and the second is not.) If the resource is freed far away from where it's allocated, there's little the code reviewer can do, we will have to rely on some kind of heap and resource monitoring program to check for allocations without corresponding frees. Here's an example that should raise red flags in the reviewer's mind:

      assignfile(sourcefile,source1);
      reset(sourcefile);
      ... lot's of work that could raise exceptions ...
      closefile(sourcefile);

Functions and procedures should have at most four parameters, maybe five at a pinch

Consider that, in 32-bit Delphi, the first three parameters are passed in registers (ignoring any obvious exceptions to the rule) so ideally you should have three parameters or less. One routine in this program has 26 parameters, most of them strings. Give me a break.

Functions and procedures should be small

Sigh. A difficult one this. How small is small? I was always told a page of printout in the days of fanfold, green-and-white paper. Nowadays the rule of thumb tends to be a screen's worth; in other words, of a size such that the reader doesn't have to scroll back and forth. For the past 18 months, I tend to think of it in terms of complexity: if the routine's cyclomatic complexity (CC, a way of counting the number of paths through the routine) is below 8, I'd feel OK about the size of the routine. (In my last contract, the most complex routine had a CC value of 4.) However, that's only feasible if you have a program that can analyze your source code and calculate the CC for you, so stick to a screen's worth.

Another measure (that's directly related to CC, if you think about it) is the depth of indentation. Don't indent too deep (maybe four levels, max). Consider splitting up a routine is you find that its code is resolutely marching over to the right.

The worst I've seen so far in this particular set of source files is a routine that's 23 pages long and that indents about 8 levels at the worst. That's ridiculously long and is totally incomprehensible and unmaintainable. (But I shall have to have a go. Sigh.)

Premature optimization is a waste of time

How can I put this? Don't optimize until you have a program to optimize, otherwise you could be optimizing something that makes little to no difference to the running of your program. Always code simply and directly. Don't say to yourself "I'll inline this so that it's faster", say to yourself, "I'll extract this code into another method, to make the current routine easier to understand."

Let's put it this way. At some point in your development cycle, you will have some approximation to the required application. Users could use it if they wished. It doesn't have all the bells and whistles, certainly, but it does do something. At this point, you should profile the application for one or more real-life scenarios. For example, if your users will only use 1Kb XML files, don't profile parsing a 1Mb XML file. There may come a time when 1Mb XML files are de rigueur, but you can do the work to make it faster for that situation then. Doing it now is a waste of your time, since 1Mb XML files may never happen for your application, or someone may write an XML parser in hardware, or whatever.

Set yourself some goals at this point, based on your profiling results: "Initial program load to the first screen displayed should be less than 3 seconds", "processing 1000 customer records should take less than 20 seconds", "clicking a button to showing the next window should be less than a second", and so on. Solving each of these scenarios involves different trade-offs, different ways to achieve the goal. It is pointless to say that each and every possible routine in the code must run as fast as possible and no slower. That's equivalent to saying, we will fix each and every bug, no matter how small, avoidable, or rare it is, in our product. You can't: your company would die before you shipped.

Every week (at least) you should be profiling to check that you don't fail the important scenarios you've identified. If you do, investigate. Locate the new code that causing the failure, fix it, alter it, move it so that it runs elsewhere, and so on.

I'm lucky with this application: the author made no attempt at any optimization whatsoever. Phew! For example, all dates are passed as strings. Obviously, the TDateTime type is for wimps. But I'm afraid it does lead to some bizarre behavior. For example, there's a routine to calculate today's date next year. If you trace through the code that gets executed, it does this: calls Date to get today's date; converts it into a string representation; extracts the year substring; converts it into an integer; adds one; converts the new value back to a string; concatenates the day and the month from the original string to this new year string to give the answer. (Incidentally, this routine has a bug: it fails for the 29th February in a leap year. The date a year away is either 28th February or the 1st March.)

Code duplication is a hanging offense

You should be ruthless in removing code duplication. It's one of the basic code smells and one of the easiest ways to drive the maintenance developer down the road into a murderous rage. Consider this question: how much is spent in original development of an application versus in maintaining and enhancing that application? Robert Glass in Facts and Fallacies in Software Engineering states that "maintenance typically consumes 40 to 80 percent of software costs" and that "enhancement is responsible for 60 percent of all maintenance costs." He further points out that 30% of maintenance is spent in understanding the code. So be nice to your maintainer, he might know where you live.

One of the source code files has five routines that do exactly the same thing. I can see how it happened: they're all event handlers, and it's easier to fill in the blanks in the newly created method body than to think and make a call to another routine that does the required work. Imagine that something else had to be done inside this duplicated code. It's extremely likely that the maintenance guy, unless he'd spent a long time understanding the code, would miss one or more of these additional routines to change.

Delphi may be case-insensitive, but humans are not

Take a look at those preceding code snippets again. Notice that, apart from the changed names TMainForm, ShowBlahFormClick, etc, the identifiers are all lower-case? Hello? It's Create and it's ShowModal and it's Free, for heaven's sake. We can argue about whether procedure, begin, end, and self should have initial capitals, but not method and routine names that are declared in source code we can look at.

The code I'm modifying is pretty much all lower-case. It's as if the shift key hadn't been invented when the app was written. It's ridiculous: although the compiler doesn't give a flying, we human readers of the code do. Is it easier to read otherblahform or OtherBlahForm? Case closed. (And no, that doesn't mean we should use the underscore to separate words.)

Delphi doesn't care about whitespace, humans do. This is an exhortation to decide on a set of coding standards and stick to them. If you want an if statement's begin to be on the same line as the if, then do it, and stick to it. If you want it on the next line, at the same indentation level, do it. At another indentation level? Fine. Do it and stick to it. Just don't change your style half way through a source file, like this code does.

Indentation and nicely spaced out code makes a difference to us, the human readers, not to the compiler. Well, OK, it may make your compilation a couple of hundredths of a second slower, but bite the bullet. If you stumble over your own code in six months time because the indentation and spacing is alien to you, you'll lose a lot more than the couple of hundredths of a second in compiling it.

Delphi doesn't care what identifiers are called, but humans do

OK, here's an example, from my current bête noire. There's a function called empty and there's another called empty2. To use texting parlance, WTF? Here's another example: showmonth and showmonth2. How about decmonth and newdecmonth? Super clear, right? Glad you agree.

Let's think about a function called empty for a moment. First of all, according to my previous rule, this should be Empty. Now, not knowing anything else about the function at all, let's try and guess some facts about it. To me the name implies a verb or a command: hey, empty this! If I now tell you that the 'this' is a string passed by value, what can we infer? It sets the string to an empty string? But that's silly, since MyString := ''; does it a lot simpler. Another hint: the function returns a boolean. Ah ha, maybe the function returns whether the string is empty or not. But (MyString = '') does that so much more succinctly, or even (length(MyString) = 0). No, my dear readers, this function actually returns whether the string consists only of whitespace characters, where whitespace not only means the space character, but also null, and some control characters. So why isn't the function just called IsStringWhitespace? It's a lot clearer to us human readers. (To the compiler, it's a unique identifier that happens to be a few characters longer.)

Please name your identifiers so that the code can be read as if it is English. If you find that you're adding the characters And, or ExceptWhen, or something similar, take it as a hint that perhaps your routine is doing too much (it should only be doing one thing). Split up the routine, and name its parts something significant. For example, for functions returning a boolean, consider using Is, Was, Check, Verify verbs to indicate that the function is returning a boolean value. Consider using command type verbs for procedures that modify something. Consider the verbs Get or Retrieve for functions that return something important. If you do this you'll be able to gain something from the semantic meanings of the identifiers you use. If you do so, you'll have a pretty good idea of what these routines do: IsStringWhitespace (a function returning a whether a string is blank), GetUserName (return the user name), ClearAddress (set the address structure or object to zeros, or nulls), ConvertDateToString (sometimes abbreviated to DateToString, to convert a date to its string representation).

If you pass back a valid/invalid indicator, check it

There's a function in this source code that converts a string value to a longint value and returns not only the value converted but also an indication of whether the conversion succeeded (a bit like later Delphi's TryStrToInt). The function result is the longint value, the success indicator is a var parameter (which is called ok: see above about a global variable called ok). Here's a cleaned up and renamed version of the function header:

function StrToInt(aValue : string; var aIsValid : boolean) : longint;

Now, we can argue about whether it's better to have the function result as the boolean, but it behooves the user of this function to actually check whether the conversion succeeded. Needless to say, it's hardly ever checked. Why? Well I think the main reason is because the function returns the converted integer value. It's just too damned easy to blow off the "did it succeed or not" indicator when you can just use the function in some expression. If it were defined as TryStrToInt is defined, we'd have to take notice of the success flag, since it would be impossible to use in an expression.

You must refactor to simplify the code

A lot of the above points can be boiled down to this important point. If we took care of code smells and refactoring so that they disappear, we'd have better code. So, let's take an unaltered routine from this program and refactor it to death. I'll take the routine I just mentioned (it is obviously redundant since Delphi's SysUtils provides a better one, and it certainly doesn't contain any intellectual property). Here it is, unaltered.

function str_to_int(passtr : string;var ok : boolean):longint;
 var
   spare : longint;
   code  : integer;
   begin
     passtr := remove(' ',passtr);
     if passtr = '' then passtr := '0';
     val(passtr,spare,code);
     if code = 0 then ok := true else ok := false;
     if code = 0 then result := spare else result := 0;
   end;

Sorry, I should have warned you first. Let's clean up the indentation and spacing first, together with some capitalization, to the style I prefer.

function ConvertStrToInt(Passtr : string; var Ok : boolean) : longint;
var
  Spare : longint;
  Code  : integer;
begin
  Passtr := Remove(' ', Passtr);
  if (Passtr = '') then 
    Passtr := '0';
  Val(Passtr, Spare, Code);
  if (Code = 0) then 
    Ok := true 
  else 
    Ok := false;
  if (Code = 0) then 
    Result := Spare 
  else 
    Result := 0;
end;

The first executable line calls a routine to remove a certain character (here, the space character) from the string. Although I agree we should remove leading and trailing spaces, I don't agree that we should be removing spaces in between the non-space characters. In other words, " 1 " is a valid number to me, but "1 2" is not. Let's assume that there's a Trim function available to us that trims leading and trailing blanks from a string.

Now take a look at the next couple of lines. Here we're checking to see if the passed in string is now completely empty, and, if so, setting it to '0'. And then we convert that '0' to an integer. Why bother? Surely we can just say, if the string is empty, the routine successfully converts it to a result of zero.

function ConvertStrToInt(Passtr : string; var Ok : boolean) : longint;
var
  Spare : longint;
  Code  : integer;
begin
  Passtr := Trim(Passtr);
  if (Passtr = '') then begin
    Result := 0;
    Ok := true;
  end
  else begin
    Val(Passtr, Spare, Code);
    if (Code = 0) then 
      Ok := true 
    else 
      Ok := false;
    if (Code = 0) then 
      Result := Spare 
    else 
      Result := 0;
  end;
end;

Next, look at the usage of the variable Spare. It's just standing in for the pre-defined variable called Result, so why don't we just use Result instead, and get rid of Spare. Next we have two if statements with exactly the same condition. Big slap upside the head here. Let's clean that up straightaway.

function ConvertStrToInt(Passtr : string; var Ok : boolean) : longint;
var
  Code  : integer;
begin
  Passtr := Trim(Passtr);
  if (Passtr = '') then begin
    Result := 0;
    Ok := true;
  end
  else begin
    Val(Passtr, Result, Code);
    if (Code = 0) then 
      Ok := true
    else begin
      Ok := false;
      Result := 0;
    end;
  end;
end;

Is that last statement that sets Result to 0 really necessary? Looking at the help for Val, we can't say (the help gives no hint as to whether the value is zeroed or whether it contains the number parsed until the error was found). So, we'll let it stay.

The final refactorings I'll do will be some renaming and reorganizing:

function ConvertStrToInt(aValue : string; var aIsValid : boolean) : longint;
var
  ErrorCode  : integer;
begin
  aIsValid := true; // assume string is valid
  aValue := Trim(aValue);
  if (aValue = '') then 
    Result := 0;
  else begin
    Val(aValue, Result, ErrorCode);
    if (ErrorCode <> 0) then begin
      aIsValid := false;
      Result := 0;
    end;
  end;
end;

Of course, we should have some tests set up with DUnit to verify the correct working of the code. I would suggest testing '' to succeed with result 0; ' ' to succeed with result 0; '1 2' to fail with result 0; 'a' to fail with result 0; and '123' to succeed with result 123.

Conclusion

Well, after all that, I hope you found this discussion helpful. Although most of the article was geared to developing in Delphi, there are lots of nuggets that apply in other languages, and on other platforms. One thing you can do now is to be your own code reviewer. Take a look at your code critically. Code duplication? Remove it. Nasty identifier name? Change it. A routine with pages of code? Split it up. Go and make your code reviewer happy.