Saturday, February 16, 2008

A word on portability

When I first switched to Linux (it was 1993 or 1994 so, and the music was way better back then), I wrote a vi-like editor program for MS-DOS. With the help of a bunch of ifdefs, the thing would also compile and run under Linux. Some argued that this was pretty useless, because Linux already came with editors that were much more powerful than my poor clone. But the main reason I was so thrilled with it, was because the same code worked for both platforms (1). It was portable.

Linguists might say ‘of course’, because the C programming language is portable. In practice, there are many tiny differences to be taken into account when programming cross-platform. On top of that, the differences between MS-DOS and Linux are huge (you can’t argue with that).

The UNIX operating system is a wonderful piece of machinery and it runs on all kinds of hardware. All variants of the UNIX operating system look more or less the same, at least from a distance. When you start programming under UNIX you will learn the true meaning of the term “portability” (2).

Portability does not mean that your code will build on run everywhere by default. You will find out that UNIX A is not the same as UNIX B, and your Linux code may not run on BSD, AIX, Solaris, or whatever. It’s the little differences that will make a big difference. Your code may misbehave, dump core, or not build at all.
To counter these problems there is POSIX compliancy for operating systems. POSIX is a set of rules that dictates what system calls are available in the operating system, and how they behave. POSIX is what makes cross-platform development possible today, although it is by no means a perfect world yet.

A great tool aiding portability is autoconf. You should clearly understand though, that autoconf is not a magic tool that makes your code portable; it is a tool that can help you rephrase your code so that it works cross-platform. As with many tools, you still have to do the majority of the work yourself (3).
autoconf takes some time to learn, but it is worth the investment, if your project is large/important enough. It took me a week or two to make a good configure.in for my bbs100 project, but afterwards it built and ran correctly on every machine I could get my hands on—that includes PC, Sun, IBM, SGI, and CRAY hardware. With little more effort, it was also ported to Apple Mac OS X.

autoconf revolves around checking whether a function is available, and if it is, it #defines a HAVE_FUNCTION for you that you can use. A good configure.in script makes very specific checks on functionality that you actually need for your program to work. A lot of software packages come with some kind of default configure script that checks everything, which is totally useless if the code doesn’t make use of it.

In general, a check for ifdef HAVE_FUNCTION is much better than operating system-specific checks like ifdef __linux__, or ifdef __IRIX__.
An ifdef BIG_ENDIAN works much better than checking every existing architecture with ifdef __ppc__, ifdef __mips__, etcetera. I happen to know a Linux code that completely broke because of this. Linux is probably the most ported operating system there is, but lots of people seem to believe it is a PC-only, RedHat-only thing (4).

Actually, the best-practice trick is to stay away from using autoconf’s ifdefs as much as possible, and to stick with what works everywhere. Once you learn what works and what is funky, you are often able to get away with not having to use autoconf at all. A well-written program is not kept together with the duct tape that ifdef is. This is somewhat of a bold statement, especially since so many software packages run configure before build. But a truly valid question is, do they depend on autoconf that badly and is autoconf’s functionality actually being used? It is a joy to see (some of) my Linux software build everywhere with a simple make.

The funny thing is, it is still hard to write truly portable code today. Last week I wrote some 2D SDL/OpenGL code on my Linux machine. When I moved it over to Mac OS X, I got a blank screen. I found up to three problems with the code:

Apparently there is a slight difference in the SDL library when it comes to blitting bitmaps that have an alpha channel. The man page mentions that the outcome may be unexpected (when you blit a pixel surface over an empty surface with an alpha channel the outcome is zero; hence the blank screen) but then why does it work alright under Linux? I resorted to writing a custom TGA image loader, and staying away from SDL’s blitting functions.
Resetting the polygon mode in conjunction with enabling/disabling texturing multiple times in one frame seems to confuse OpenGL on Mac OS. It messes up badly.
After resizing the screen, OpenGL has lost its state and texture data and must be reinitialized. This is actually in the OpenGL standard and a bug on my side. But it does raise the question why this never surfaced on my Linux box. Apparently the (NVidia) video card has enough memory and does not get into an undefined state after a screen resize.

Lessons learned: Test your code across multiple platforms, test, test, test..!

I have yet to see my favorite DOS editor(s) run under Linux natively. Switching platforms usually means leaving your familiar apps and tools, and replacing them with a substitute.
In fact, I have a feeling that the ifdef preprocessor was invented for the sake of portability. It has other uses, but it kinda smells of a ‘fix’ for the problem of supporting different architectures.
Having a hammer does not make a great carpenter.
Supporting all kinds of distributions is not easy either.