Tracking memory allocations
A classic problem in C programming is managing memory.
How often have we seen leaky programs that rake up gigabytes of memory
after prolonged runtime? It is a human problem, and it can be very hard
to debug, too. This problem has led to various solutions in other languages;
Java and Python feature a garbage collector,
Objective-C had manual reference counting (and it was equally a nightmare),
C++ has constructor/destructor and stack unwinding (but may still leak
when doing dynamic memory allocations), Go has a garbage collector,
Rust manages ownership and can therefore automatically insert deallocation
calls. Rather than mimicking any of these complicated schemes in plain C,
we are going to do something relatively simple: track malloc()
calls
so that we know what is allocated, and when. By knowing what is allocated,
we gain a very powerful feature: deferred deallocations.
In principle, every bit of memory allocated should be deallocated at some time.
This is called pairing; with every malloc()
comes a free()
.
This is a somewhat naive view of things, as in practice it is easy to have
an if-then-else branching code soup where you forget a free()
call, or
you might have a situation in which an object gets allocated in one part
of the program, but deallocated in a totally different part of the program.
It is these intermediately long-lived objects that can be particularly
hard to track down when they are leaking. Or we might have objects that are
shared, and the code gets really confusing.
The tmalloc
tracking-malloc does a couple of things:
- register the point in the code where the allocation was made
- put a limit on total memory usage, so that it can’t leak indefinately
- detect double frees (note, glibc also already does that)
- group allocations together that belong together. It acts like a pool, without it being an explicit pool where you allocate from
- the pool may be released as if it were a single entity
How do we do this? In short, we add metadata in front of the allocated block. This metadata remains hidden to the calling code. Additionally, we link these metadata structures together in a list. That way it becomes possible to traverse any allocations made.
Keeping record of where
We use a macro to record the source filename, line number, and function
where the tmalloc()
call takes place:
#define tmalloc(pool, size) \
tmalloc_alloc((pool), (size), __FILE__, __LINE__, __func__)
This information gets stored in a metadata structure:
typedef struct TMallocMeta_tag TMallocMeta;
struct TMallocMeta_tag {
TMallocMeta* prev, *next;
char filename[32], funcname[32];
int lineno;
size_t size;
};
The actual allocation routine does some things behind the curtain;
it allocates extra space, fills in this metadata structure, and returns
the pointer as normal malloc()
would do.
void* tmalloc_alloc(pool, size_t size,
const char* filename, int lineno, const char* funcname) {
size += sizeof(TMallocMeta);
/* alignment */
size = (size + 7) & ~7;
TMallocMeta* meta = malloc(size);
meta->size = size;
/* fill in meta->filename, lineno, funcname */
...
/* link metadata block into pool */
...
/* skip the metadata block */
meta++;
/* return address of allocated space */
return meta;
}
The pool
In tmalloc
allocations are grouped together in a pool. It is up to
the programmer to choose this grouping, but it makes sense to put temporary
allocations together, separating them from long-lived allocations
in the program.
It is tempting to code the pool as an arena, where you reserve a big block
of space and carving out the allocations by yourself. A good option is
a SLUB allocator, that groups small allocations together in pages.
But the thing is, you don’t have to—regular malloc
suffices
[assuming that you’re on a platform where it is available].
The pool in tmalloc()
is not much more than a list-head and a usage counter;
rather than allocating from a pool, we add allocations to a pool.
The type of pool is dubbed TMalloc
:
typedef struct {
TMallocMeta* root, *tail;
size_t limit;
size_t in_use;
} TMalloc;
The pool is a good place to keep a count on memory usage; if it spins out of control, we abort the program. Aborting the program is, in my opinion, infinitely better than letting it leak and consume all system resources.
Since the pool links all allocated blocks together, we can also release
a pool in one go, without having to remember to issue free
on every
individual alloc
. In practice this acts like a pseudo-garbage collector.
Usage
As an example, here is a subroutine that does a number of temporary allocations. At the end of the function, it lets go of that memory.
void func1(void) {
TMalloc pool;
tmalloc_init(&pool, /* limit = */ MB);
for (int i = 0; i < 10; i++) {
MyObject* obj = tmalloc(&pool, sizeof(MyObject));
init_object(obj);
do_something(obj);
char* tmp = tmalloc(&pool, 128);
something_something(tmp);
/* tfree(obj) ... don't mind */
}
tmalloc_release(&pool);
}
Note that we never explicitly deallocated the individual temporary objects; we simply drop the entire pool by the end of the function.
We also could have chosen to put MyObject
allocations in their own pool,
and/or we can have a separate pool for long-lived objects in the program.
I hope you can see the flexibility and the added convenience of using
pools like this.
Winding down
The final implementation improves on this by adding redzones for detecting corruption; overwriting the edge of an allocated block.
Tracking malloc is a simple method for improving quality of life when it
comes to doing memory management in a C program.
Grouping allocations in pools organizes memory usage of the program, and
thereby helps you organize your mind. Moreover, when allocations spiral
out of control, we can inspect the metadata and see where all those
allocations came from.
tmalloc
is not complicated, yet makes life in C so much better.