Sunday, September 3, 2023

A generic Vec type in plain C

Generics in programming is about parameterized types; code and data structures written to work for any type T. The compiler will substitute type T and ensure that the resulting code is type-safe. Generics are extremely useful for implementing container classes. You can go a long way in programming by using only two indispensible container classes: a growable array and a dictionary. Plain C lacks classes and templates, however in this post we will explore a type-safe generic dynamic array in plain C.

C++ has std::vector<T>, in Rust the similar type is called Vec<T>. For plain C, we will base the generic Vec type on this structure:

struct Vec<T> {
    T *data;
    size_t cap;
    size_t len;
};

The only problem is, Vec<T> is a syntax error in C. We can rewrite it by using macros:

#define Vec(T)  Vec_##T

#define decl_Vec(T)         \
    typedef struct {        \
        T *data;            \
        size_t cap, len;    \
    } Vec(T)

The next logical step is to define new and delete functions for this type. These functions will be along these lines:

Vec(T) *new_Vec(T)(void) {
    Vec(T) *v = calloc(1, sizeof(Vec(T)));

    /* have a default capacity */
    const size_t default_cap = 8;
    v->data = calloc(default_cap, sizeof(T));
    v->cap = default_cap;
    return v;
}

void delete_Vec(T)(Vec(T) *v) {
    if (v != NULL) {
        free(v->data);
        free(v);
    }
}

And we can use it like this:

void my_func(void) {
    Vec(int) *v = new_Vec(int)();

    /* do useful stuff here ... */
    push_Vec(int)(v, 1234);
    printf("length: %zu\n", v->len);

    delete_Vec(int)(v);
}

Note that this makes our Vec(T) a heap-allocated type and we always and only use it through a pointer. I don’t have a problem with that, but consider having a local Vec(T) allocated on the stack:

void my_func(void) {
    Vec(int) v;

    /* do useful stuff here ... */
    push_Vec(int)(&v, 1234);
    printf("length: %zu\n", v.len);
}

This is a problem because plain C lacks a constructor and destructor; a good solution (IMO) is to shun stack-allocated Vec(T) objects altogether. The other solution is to diligently insert the constructor/destructor calls by hand. It then makes sense to rewrite new as “alloc and init”:

void init_Vec(T)(Vec(T) *self) {
    memset(self, 0, sizeof(Vec(T)));
    const size_t default_cap = 8;
    self->data = calloc(default_cap, sizeof(T));
    self->cap = default_cap;
}

Vec(T) *new_Vec(T)(void) {
    Vec(T) *v = malloc(sizeof(Vec(T)));
    init_Vec(T)(v);
    return v;
}

void deinit_Vec(T)(Vec(T) *self) {
    if (self != NULL) {
        free(self->data);
    }
}

void delete_Vec(T)(Vec(T) *v) {
    if (v != NULL) {
        deinit_Vec(T)(v);
        free(v);
    }
}

Now, using a stack-allocated Vec(T):

void my_func(void) {
    Vec(int) v;
    init_Vec(int)(&v);

    /* do useful stuff here ... */
    push_Vec(int)(&v, 1234);
    printf("length: %zu\n", v.len);

    deinit_Vec(int)(&v);
}

I have already teased a push() function. Sloppy implementations of push() and pop() are:

void push_Vec(T)(Vec(T) *self, T item) {
    if (self->len >= self->cap) {
        self->cap *= 2;
        self->data = realloc(self->cap * sizeof(T));
    }
    self->data[self->len++] = item;
}

T pop_Vec(T)(Vec(T) *self) {
    assert(self->len > 0);
    return self->data[--self->len];
}

If you are a seasoned C programmer, you may have noticed that the above functions do not compile and thus we don’t reach our goal of having a generic Vec in C. The secret ingredient that I left out until now is that we need to write these functions as templates—or rather, as macros.

Declarations go into header files, and implementations go into modules. Therefore the meta programming is split in two: decl_Vec(T) and impl_Vec(T).

#define Vec(T)  Vec_##T

#define init_Vec(T)     init_Vec_##T
#define deinit_Vec(T)   deinit_Vec_##T
#define new_Vec(T)      new_Vec_##T
#define delete_Vec(T)   delete_Vec_##T
#define push_Vec(T)     push_Vec_##T
#define pop_Vec(T)      pop_Vec_##T

#define decl_Vec(T)                 \
    typedef struct {                \
        T* data;                    \
        size_t cap;                 \
        size_t len;                 \
    } Vec(T);                       \
    void init_Vec(T)(Vec(T)*);      \
    Vec(T)* new_Vec(T)(void);       \
    void deinit_Vec(T)(Vec(T)*);    \
    void delete_Vec(T)(Vec(T)*);    \
    void push_Vec(T)(Vec(T)*, T);   \
    T pop_Vec(T)(Vec(T)*)

#define impl_Vec(T)                         \
    void init_Vec(T)(Vec(T) *self) {                    \
        memset(self, 0, sizeof(Vec(T)));                \
        const size_t default_cap = 8;                   \
        self->data = calloc(default_cap, sizeof(T));    \
        self->cap = default_cap;                        \
    }                                    \
    Vec(T) *new_Vec(T)(void) {           \
        Vec(T) *v = malloc(sizeof(Vec(T))); \
        init_Vec(T)(v);                  \
        return v;                        \
    }                                    \
    ... etcetera ...

Armed with this, we can now finally write a working demo program:

#include "vec.h"

#include <stdio.h>
#include <inttypes.h>

decl_Vec(int32_t);
decl_Vec(int64_t);

impl_Vec(int32_t);
impl_Vec(int64_t);

int main(void) {
    Vec(int32_t) *v = new_Vec(int32_t)();
    for (int i = 0; i < 100; i++) {
        push_Vec(int32_t)(v, (int32_t)i);
    }
    printf("Vec(int32_t) cap: %zu   len: %zu\n", v->cap, v->len);
    delete_Vec(int32_t)(v);

    Vec(int64_t) *w = new_Vec(int64_t)();
    for (int i = 0; i < 100; i++) {
        push_Vec(int64_t)(w, (int64_t)i);
    }
    printf("Vec(int64_t) cap: %zu   len: %zu\n", w->cap, w->len);
    delete_Vec(int64_t)(w);
    return 0;
}

It works beautifully. One pitfall is that plain C lacks destructors; the Vec(T) container works nicely with POD types, but take care with more complex structures. Adding a generic function that takes a Vec(T) argument is a bit of a hassle, but it can be done. And if you make a mistake, you get some horrible error messages just like when writing C++ templates. In that sense, we have made a perfect emulation of C++ templates in plain C.

Rather than using macros, I can see how this would work better with a meta-compiler that translates Vec(T) templates to legit C code. That would be an opportunity to change the syntax to Vec<T> (angular brackets).

Other than that I’m quite impressed with how well it handles in practice. Once upon a long ago I had a large C code that did something similar with linked lists, but I did not take it quite this far. I’m sure that code would have benefited from using generics in this way. It’s a shame that Vec(T) comes late to the party—I have well moved on to C++ and Rust by now—but if I had a time machine, I would go back and teach this trick to my younger self.