A generic Vec type in plain C
Generics in programming is about parameterized types; code and data structures
written to work for any type T
. The compiler will substitute type T
and
ensure that the resulting code is type-safe.
Generics are extremely useful for implementing container classes.
You can go a long way in programming by using only two indispensible
container classes: a growable array and a dictionary. Plain C lacks
classes and templates, however in this post we will explore a type-safe
generic dynamic array in plain C.
C++ has std::vector<T>
, in Rust the similar type is called Vec<T>
.
For plain C, we will base the generic Vec
type on this structure:
struct Vec<T> {
T *data;
size_t cap;
size_t len;
};
The only problem is, Vec<T>
is a syntax error in C.
We can rewrite it by using macros:
#define Vec(T) Vec_##T
#define decl_Vec(T) \
typedef struct { \
T *data; \
size_t cap, len; \
} Vec(T)
The next logical step is to define new
and delete
functions for this
type. These functions will be along these lines:
Vec(T) *new_Vec(T)(void) {
Vec(T) *v = calloc(1, sizeof(Vec(T)));
/* have a default capacity */
const size_t default_cap = 8;
v->data = calloc(default_cap, sizeof(T));
v->cap = default_cap;
return v;
}
void delete_Vec(T)(Vec(T) *v) {
if (v != NULL) {
free(v->data);
free(v);
}
}
And we can use it like this:
void my_func(void) {
Vec(int) *v = new_Vec(int)();
/* do useful stuff here ... */
push_Vec(int)(v, 1234);
printf("length: %zu\n", v->len);
delete_Vec(int)(v);
}
Note that this makes our Vec(T)
a heap-allocated type and we always
and only use it through a pointer. I don’t have a problem with that,
but consider having a local Vec(T)
allocated on the stack:
void my_func(void) {
Vec(int) v;
/* do useful stuff here ... */
push_Vec(int)(&v, 1234);
printf("length: %zu\n", v.len);
}
This is a problem because plain C lacks a constructor and destructor;
a good solution (IMO) is to shun stack-allocated Vec(T)
objects altogether.
The other solution is to diligently insert the constructor/destructor calls
by hand. It then makes sense to rewrite new
as “alloc and init”:
void init_Vec(T)(Vec(T) *self) {
memset(self, 0, sizeof(Vec(T)));
const size_t default_cap = 8;
self->data = calloc(default_cap, sizeof(T));
self->cap = default_cap;
}
Vec(T) *new_Vec(T)(void) {
Vec(T) *v = malloc(sizeof(Vec(T)));
init_Vec(T)(v);
return v;
}
void deinit_Vec(T)(Vec(T) *self) {
if (self != NULL) {
free(self->data);
}
}
void delete_Vec(T)(Vec(T) *v) {
if (v != NULL) {
deinit_Vec(T)(v);
free(v);
}
}
Now, using a stack-allocated Vec(T)
:
void my_func(void) {
Vec(int) v;
init_Vec(int)(&v);
/* do useful stuff here ... */
push_Vec(int)(&v, 1234);
printf("length: %zu\n", v.len);
deinit_Vec(int)(&v);
}
I have already teased a push()
function. Sloppy implementations of
push()
and pop()
are:
void push_Vec(T)(Vec(T) *self, T item) {
if (self->len >= self->cap) {
self->cap *= 2;
self->data = realloc(self->cap * sizeof(T));
}
self->data[self->len++] = item;
}
T pop_Vec(T)(Vec(T) *self) {
assert(self->len > 0);
return self->data[--self->len];
}
If you are a seasoned C programmer, you may have noticed that the
above functions do not compile and thus we don’t reach our goal of having
a generic Vec
in C. The secret ingredient that I left out until now
is that we need to write these functions as templates—or rather,
as macros.
Declarations go into header files, and implementations go into modules.
Therefore the meta programming is split in two: decl_Vec(T)
and
impl_Vec(T)
.
#define Vec(T) Vec_##T
#define init_Vec(T) init_Vec_##T
#define deinit_Vec(T) deinit_Vec_##T
#define new_Vec(T) new_Vec_##T
#define delete_Vec(T) delete_Vec_##T
#define push_Vec(T) push_Vec_##T
#define pop_Vec(T) pop_Vec_##T
#define decl_Vec(T) \
typedef struct { \
T* data; \
size_t cap; \
size_t len; \
} Vec(T); \
void init_Vec(T)(Vec(T)*); \
Vec(T)* new_Vec(T)(void); \
void deinit_Vec(T)(Vec(T)*); \
void delete_Vec(T)(Vec(T)*); \
void push_Vec(T)(Vec(T)*, T); \
T pop_Vec(T)(Vec(T)*)
#define impl_Vec(T) \
void init_Vec(T)(Vec(T) *self) { \
memset(self, 0, sizeof(Vec(T))); \
const size_t default_cap = 8; \
self->data = calloc(default_cap, sizeof(T)); \
self->cap = default_cap; \
} \
Vec(T) *new_Vec(T)(void) { \
Vec(T) *v = malloc(sizeof(Vec(T))); \
init_Vec(T)(v); \
return v; \
} \
... etcetera ...
Armed with this, we can now finally write a working demo program:
#include "vec.h"
#include <stdio.h>
#include <inttypes.h>
decl_Vec(int32_t);
decl_Vec(int64_t);
impl_Vec(int32_t);
impl_Vec(int64_t);
int main(void) {
Vec(int32_t) *v = new_Vec(int32_t)();
for (int i = 0; i < 100; i++) {
push_Vec(int32_t)(v, (int32_t)i);
}
printf("Vec(int32_t) cap: %zu len: %zu\n", v->cap, v->len);
delete_Vec(int32_t)(v);
Vec(int64_t) *w = new_Vec(int64_t)();
for (int i = 0; i < 100; i++) {
push_Vec(int64_t)(w, (int64_t)i);
}
printf("Vec(int64_t) cap: %zu len: %zu\n", w->cap, w->len);
delete_Vec(int64_t)(w);
return 0;
}
It works beautifully. One pitfall is that plain C lacks destructors;
the Vec(T)
container works nicely with POD types, but take care with
more complex structures.
Adding a generic function that takes a Vec(T)
argument is a bit of a hassle,
but it can be done. And if you make a mistake, you get some horrible
error messages just like when writing C++ templates. In that sense, we have
made a perfect emulation of C++ templates in plain C.
Rather than using macros, I can see how this would work better with a
meta-compiler that translates Vec(T)
templates to legit C code.
That would be an opportunity to change the syntax to Vec<T>
(angular
brackets).
Other than that I’m quite impressed with how well it handles in practice.
Once upon a long ago I had a large C code that did something similar with
linked lists, but I did not take it quite this far. I’m sure that code
would have benefited from using generics in this way.
It’s a shame that Vec(T)
comes late to the party—I have well
moved on to C++ and Rust by now—but if I had a time machine,
I would go back and teach this trick to my younger self.