Putting the fun back in C with threads
On this blog we’ve been (re)implementing some basic data structures to make
programming in C fun again. If there is anything a modern language or library
needs these days, it is good multi-threading support. By ‘good’ I mean,
of course, easy and fun to use. The C11 standard has given us threads,
but support has been lacking and when you google for it, even today you’ll be
hard-pressed to find useful information on the subject. In the meantime the
world has moved on to C++, which offers std::thread
. In reality these
threads are based upon POSIX threads, which we have full access to via the
C API functions of libpthread
.
Bare pthread programming is a chore so to put the fun back in C we will wrap
the usual suspects with C++ classes: thread, mutex, and condition variable.
The thread
, mutex
, and condition_variable
classes are given an interface
that is a lot like Python’s. In pseudo-code:
class thread {
void start(func, void *arg);
void join();
void exit();
}
class mutex {
void lock();
bool trylock();
void unlock();
}
class condition_variable {
bool wait(mutex&, ulong timeout=0UL);
void notify();
void notify_all();
}
Because it makes no sense to copy any of these, the copy constructor and assignment operator are explicitly deleted. Like so:
// can not copy construct
thread(const thread&) = delete;
// can not copy
thread& operator=(const thread&) = delete;
Because C++ is a RAII language, we can make a lockguard
object that
automatically acquires and releases mutex locks in a critical region.
// critical region
do {
lockguard(global_lock);
if (bad) {
// mutex will be auto-released
return;
}
do_good();
} while(0);
// implementation of the lockguard
class lockguard {
mutex& m_;
public:
lockguard(mutex& m) : m_(m) {
m_.lock();
}
~lockguard() {
m_.unlock();
}
};
The lockguard
makes use of the fact that objects are destructed whenever
they go out of scope.
Next up: channels. There is a language that totally blew me away when it
comes to parallel programming, and it’s Go. In Golang you simply write
go(func)
to launch a function as a separate thread. Moreover, threads
communicate over channels. A channel acts much like a pipe; one end listens
for messages, while at the other end messages are put onto the channel.
It’s producer-consumer made easy. Go has a confusing syntax for this
(the arrow notation) that we won’t mimic—I guess we could use operator
<<
and >>
for this, but I prefer to stick to methods chan.put()
and
chan.get()
.
I wrote about this earlier way back in 2012, but a channel is basically a work queue. There is a condition variable where the producer signals that an item has been put onto the channel, and consumers wake up so that they may consume the item.
template <typename T>
class chan {
mutex mtx;
condition_variable not_full, not_empty;
array<T> items;
uint max_items;
... constructors and what not ...
void put(const T& item) {
mtx.lock();
if (max_items > 0) {
while(items.len() >= max_items) {
// spurious wakeup
not_full.wait(mtx);
}
}
items.append(item);
not_empty.notify();
mtx.unlock();
}
T get(void) {
mtx.lock();
while(!items.len()) {
// spurious wakeup
not_empty.wait(mtx);
}
T item = items.pop(0);
not_full.notify();
mtx.unlock();
return item;
}
};
As you can see it’s not too hard to implement a channel once you have a simple API for mutexes and condition variables. Channels make parallel programming tons easier.
The go()
function launches a detached thread, and guarantees that the thread
is running by the time go()
returns to the caller. A detached thread is
never joined by its parent, but if you do need such a mechanism, it’s
easily implemented using a channel:
chan<bool> done_channel;
for(int i = 0; i < nthreads; i++) {
go(test_go, &done_channel);
}
// wait for threads to exit
for(int i = 0; i < nthreads; i++) {
done_channel.get();
debug("a go-routine is done");
}
void test_go(void *arg) {
debug("#%u go-routine started", thread_id());
chan<bool> *done = (chan<bool> *)arg;
do_long_calculation();
debug("#%u go-routine ending", thread_id());
done->put(true);
}
Lastly, I like to show how threads are assigned a unique id. You can print
the value from pthread_self()
, but it won’t look pretty as it’s probably
just the address of some internal structure. Instead, I like to give threads
a more intuitive rank number. This rank is assigned when the thread launches.
// main thread always has thread_id #0
#if 1
thread_local static uint thread_local_rank = 0;
#else
// older Apple clang does not support 'thread_local'
// but it does know about __thread
__thread static uint thread_local_rank = 0;
#endif
static std::atomic<uint> eternal_rank(0);
Because the rank is local to the thread and eternal_rank
is atomic,
assigning a rank is just:
thread_local_rank = ++eternal_rank;
The C programming language has found itself getting caught up by the times. Multi-core CPUs are no longer the future; they are here now and parallel codes are the way to get that extra power. Do we need that extra power? Well yes, after all we are programming in C for a reason. And let’s keep it fun while we are doing it.