Understanding single precision MBF

[Click to print this article]

Man, that was fun. A friend IMed me recently with a problem. It seems that he has some data from a legacy system (I know, yuk, and I can see you wiping your hands on your shirt from here), that was a floating point value in single precision MBF (Microsoft Binary Format). In case you didn't know — I didn't — this is the format used by CVS and MKS$ in Microsoft QuickBASIC and GW-BASIC, those fabby-dabby BASIC interpreters from MS-DOS days. Hey, I wrote my first BASIC program with GW-BASIC; no sniggering at the back. Yes, Virginia, we are talking l-e-g-a-c-y.

So the crux of the matter was that he could get a four-byte array containing one of these single precision values from a file but was totally stuck on how to interpret it. If you go here on MSDN, you can download a "magic" dll to do the work but it's for VB4 and not for .NET. Yep, this legacy is the real thing.

The code you get with the download is laughable since it's not all there. You need Microsoft C's run-time libraries and headers. There is also no description of what the single-precision format actually is in terms of bit patterns, etc. So I started surfing with Google. In the end I ended up at wotsit.com, and if you look down the list you'll find an entry for MBF with some code by Inprise to download. (For those not steeped in Borland history, they were momentarily called Inprise at the end of 90s.) The downloaded code was for Borland C, and I was starting to feel like Indiana Jones, brushing the cobwebs and dust from my arms, looking around for the giant ball to start rolling.

At least the comments in the code had the format of the data. My friend gave me some array values and what they meant, gleaned from the legacy system. I used my hex calculator (actually MathU Pro) to do some back of the envelope calculations to make sure I had the right conversion. Everything panned out, after a small glitch when I forgot that there was an assumed bit to take care of. Since single precision MBF and the IEEE 754 single type were the same size I decided to do a conversion from one to the other (rather than, say, converting to a double value).

Here's the format of the two data types.

IEEE 754 Single format
32 bits long (4 bytes)

sign (1 bit)| exponent (8 bits) | fraction (23 bits)

The exponent is biased by 127. 
There is an assumed 1 bit before the radix point 
 (so the assumed mantissa is 1.ffff... where f's are the fraction bits)

Microsoft Binary Format (single precision)
32 bits long (4 bytes)

exponent (8 bits) | sign (1 bit) | fraction (23 bits)

The exponent is biased by 128. 
There is an assumed 1 bit after the radix point 
  (so the assumed mantissa is 0.1ffff... where f's are the fraction bits)

(For a discussion on exponent biases, assumed bits, etc, see the Wikipedia article.)

Looking at the definitions of the mantissae, I saw that the IEEE mantissa is twice the MBF mantissa. So, wrapping this observation up into the definitions, to convert from MBF to IEEE single, all we need to do is subtract 2 from the exponent (one for the bias change, one for the mantissa factor), and then rearrange the sign and exponent bits. The fraction does not change. To convert from IEEE single to MBF, all we need to do is add 2 to the exponent (one for the bias change, one for the mantissa factor), and then rearrange the sign and exponent bits. The fraction does not change.

Since we use little endian CPUs, the bytes making up these bit patterns would in fact be reversed, and the most significant byte would be the last byte of the byte array, and the least significant byte the first byte of the array.

This was my first cut of the conversion, which was pretty much a direct copy of the old Inprise code but recast in C# and .NET:

    public static float ConvertMbf4ToFloat(byte[] mbf) {
      if ((mbf == null) || (mbf.Length != 4))
        throw new ArgumentException("Invalid MBF array");

      if (mbf[3] == 0) return 0.0f;

      byte[] single = new byte[4];
      byte signBit = (byte)((int)mbf[2] & 0x80);
      single[3] |= signBit;

      byte exponent = (byte)((int)mbf[3] - 2);
      single[3] |= (byte)((int)exponent >> 1);
      single[2] |= (byte)((int)exponent << 7);
      single[2] |= (byte)((int)mbf[2] & 0x7F);
      single[1] = mbf[1];
      single[0] = mbf[0];

      return System.BitConverter.ToSingle(single, 0);

Not too bad, but it really looks awkward with all those required casts. But it wasn't all bad, since it did get me to revisit the BitConverter class, something I'd known about a while back (we're talking .NET 1.0 timeframes here) but had forgotten completely about in the interim.

That's the version I sent my friend, but, after I had done so and thereby making sure his immediate problem was covered, I got to thinking about it from more of a distance. I could, if I were clever, just use bit-twiddling on the full 32-bit value and not on the array of bytes. Here's that version:

    public static float ConvertMbf4ToFloat(byte[] mbf) {
      if ((mbf == null) || (mbf.Length != 4))
        throw new ArgumentException("Invalid MBF array");

      if (mbf[3] == 0) return 0.0f;
      if (mbf[3] <= 2)
        throw new ArgumentException(
          "Underflow when converting from MBF to single");

      UInt32 temp = BitConverter.ToUInt32(mbf, 0);
      temp = (((temp - 0x02000000) & 0xFF000000) >> 1) |
        ((temp & 0x00800000) << 8) |
        (temp & 0x007FFFFF);
      byte[] single = BitConverter.GetBytes(temp);
      return BitConverter.ToSingle(single, 0);

I then wrote the reverse conversion in a similar manner:

    public static byte[] ConvertFloatToMbf4(float s) {
      if (s == 0.0f) {
        return new byte[4];

      if (Single.IsNaN(s))
        throw new ArgumentException(
          "Cannot convert a NaN to MBF format");
      if (Single.IsInfinity(s))
        throw new ArgumentException(
          "Cannot convert an infinity to MBF format");

      byte[] single = BitConverter.GetBytes(s);
      UInt32 temp = BitConverter.ToUInt32(single, 0);
      temp = (((temp & 0x7F800000) << 1) + 0x02000000) |
        ((temp & 0x80000000) >> 8) |
        (temp & 0x007FFFFF);

      return BitConverter.GetBytes(temp);

I'm quite sure that those people who know IEEE formats are already raising their hands, as well as those who are worried about edge cases. Let's think edge cases first. First off: what if the biased MBF exponent were 2, say? The conversion merely subtracts 2, making the IEEE exponent 0. But that means either zero or a denormal. Worse still is if the MBF exponent were 1 or 0 for then the IEEE exponent would become very large through wraparound. In my changed code, it was easy enough to check for these cases, and so I just threw an exception if I did. (You can easily change this to return 0.0, if you want).

Going the other way we have more edge cases, but this time in the realm of the very large. If the IEEE exponent were 254, the MBF exponent would become zero, which indicates that the entire number is zero. Of course, there are the denormals to take care of as well (the realm of the very small). Although some denormals could be represented in MBF format, there are many more that won't be able to, necessitating another exception to be thrown. I ignored these special cases, since they are unlikely to occur (and wouldn't for my friend).

Furthermore, IEEE singles can contain infinities (exponent == 255 and fraction == 0), or NaNs (exponent == 255 and fraction != 0), which have no representation in MBF format. Coming across these would require throwing the relevant exception, and this I did since it was easy enough to do.

And, yes, I could tidy it up, make the MBF values a class, say, but that would be too much like real work, and, as I say this was just a bit of fun to help someone out.