Tuesday, October 21, 2008

Parallel programming with pthreads

Today, we had a little conversation about pthreads programming in the office. Every now and then the topic of pthreads seems to come up. Surprisingly, I’m one of very few who actually has had some hands-on experience with it. I remember that learning pthreads was difficult because there were no easy tutorials around, and I didn’t really know where to start. There is a certain learning curve to it, especially when you don’t really know what your doing or what it is you’d want to be doing with this pthread library.

A thread is a “light-weight process”, and that doesn’t really explain what it is, especially when they say that the Linux kernel’s threads are just as “fat” as the processes are.
You should consider a thread a single (serialized) flow of instructions that is executed by the CPU. A process is a logical context in the operating system that says what memory is allocated for a running program, what files are open, and what threads are running on behalf of this process. Now, not only can the operating system run multiple processes simultaneously (1), a single process can also be multi-threaded.

The cool thing about this multi-threadedness is that the threads within a process share the same memory, as the memory is allocated to the process that the threads belong to. Having shared memory between threads means that you can have a global variable and access and manipulate that variable from multiple threads simultaneously. Or, for example, you can allocate an array and have multiple threads operate on segments of the array.

Programming with pthreads can be hard. The pthread library is a fairly low-level library, and it usually requires some technical insight to be able to use it effectively. After having used pthreads for a while, you are likely to sense an urge to write wrapper-functions to make the API somewhat more high-level.
While there are some synchronization primitives like mutexes and condition variables available in the library, it is really up to the programmer to take proper advantage of these—as is often the case with powerful low-level APIs, the API itself doesn’t do anything for you; it is you who has to make it all work.

Programming pthreads is often also hard for another reason; the library enables you to write programs that preempt themselves all the time, drawing you into a trap of writing operating system-like code. This kind of parallelism in code is incredibly hard to follow, and therefore also incredibly hard to debug and develop. The programmer, or developer, if you will, should definitely put some effort into making a decent design beforehand, preferably including a schematic of the communication flow that should occur between the threads. The easiest parallel programs do not communicate at all; they simply divide the data to be processed among the threads, and take off.
It should be clear that the pthread library is powerful, and that the code’s complexity is really all the programmer’s fault.

While the shared memory ability of pthreads is powerful, it does have the drawback that when the programmer cocks up and a single thread generates a dreadful SIGSEGV, the whole process bombs (2).
Also, as already described above, pthreads has the tendency of luring you into a trap of creating parallelism that is not based on communication, creating overly complex code flows.
The automatic shared memory has the drawback that you may not always want to share data among all threads, and that the code is not thread-safe unless you put mutex locks in crucial spots. It is entirely up to the programmer to correctly identify these spots and make sure that the routines are fully thread-safe.
It is for these reasons that communication libraries like MPI, PVM, and even fork()+socket based code are still immensely popular. The latest well-known example of a forking multi-threaded application is the Google Chrome browser, in which a forked-off “thread” may crash, but will not take the entire application down.

This blog entry has gotten too long to include some useful code examples. Therefore I will provide you with a useful link:

ptheads programming tutorial

Skip right to the examples, unless of course you wish to learn some more on the theory of threads… Note how this tutorial covers all you need to know, and cleverly stays away from the more advanced features of the pthread library, that you are not likely to need to know anyway.

If the C programming language is not your thing, try using the Python threading module. Although not truly concurrent threads, Python’s threading threads appear similar in use and behavior to pthreads.

Another interesting topic related to parallel programming is that of deadlock and starvation, and the “dining philosophers” problem. See this online computer science class material for more.

Multiple processes or threads of execution can run concurrently on multi-processor or multi-core machines. On uniprocessor machines, they are scheduled one after another in a time-shared fashion.
There are (or have been, in the past?) implementations of UNIX in which it’s unclear what thread receives the UNIX signal. I believe Linux sends it to the main thread, ie. the first thread that was started when the process was created. For this reason, it’s wise to keep the main thread around during the lifetime of the process.