File packing
Every now and then I like to watch people code on their Twitch channels. One of them is a gamedev guru Zen-master type, another one is a super talented nerdy kid. One day the guru showed how to pack files together into a single large file. You don’t want to ship a game with many separate sound and texture files dumped in a directory, so pack those files together, and then you can load them all at once. Soon thereafter, the kid did the same thing on his Twitch channel.
Making your own custom pack file format is easy and fun. Like often, it’s all in the details. I have coded custom pack formats a fair number of times now, and I still find it an interesting topic.
The WAD file
In December 1993 the world was shocked by the game DOOM. It’s famous for
its brutality, addictive gameplay, and its technical prowess. The shareware
version of DOOM (first episode: Knee Deep In The Dead) fit on a single
1.44 MB floppy disk. A large portion of that was taken up by DOOM.WAD
,
a single file of over 1 megabyte in size (!) which was absolutely
astronomically large at the time. What was in that file? Simply put,
game assets — everything but the game engine.
The file extension .WAD
stands for “where is all the data?”. The format
of the file is basically this: a concatenation of the data files, plus a
directory that serves as an index to the data.
The exact WAD format is too simplistic for practical uses nowadays, but it does teach us one thing: Keep It Simple, Stupid.
Our custom PACK file
Personally I like to put the directory first, rather than at the end
like how WAD
is structured. It’s a personal preference that just looks
much nicer to me.
To construct the pack file, first collect all metadata information and write out the index entries. Next append the data members, one by one. When loading the pack file, you can immediately load all index entries without having to seek to where the directory is.
For bonus points, let’s use variable length filenames in index entries.
It makes sense to store them as (length, string)
pairs.
The length can be a single byte if you accept (and check!) that all member
names are less than 256 bytes long. The problem is now that index entries
are no longer of a fixed size. When loading the pack directory, it is really
only a matter of doing buffered I/O right: read into a larger buffer, and
grab only what you need.
The filenames are padded to align on 4 bytes. This is done so that an index entry is always aligned, otherwise you will experience program crashes on certain CPU architectures. Moreover, be aware of endianness when loading raw binary numbers (such as offsets and sizes, like we are doing here).
An easy way of having sub folders is simply allowing names to contain a slash. This alone is enough to create the illusion of having folders; there is no real need to create any additional structures in the pack file format. Again, keep it simple.
The standard TAR file
On a different note, a well-known packing format is tar
(especially popular
on UNIX/Linux). Tar is a tape streaming format; the data is structured as
a stream of blocks. Consequently tar
does not have an explicit directory;
it literally is a sequential stream of:
- member metadata (including filename and size)
- member data
- zero padding; align to block size
Tar simply appends the next member right after. The archive usually ends with two blocks of zeroes, essentially indicating empty metadata structures.
When reading back a tar
archive, there is no easy way of knowing how many
members there are, and what to find where — because it lacks an explicit
directory. A streaming format does not seek-to-offset; you can only
read from beginning to end, and see what you find.
That said, you can still code your own tar
loader that constructs an index
in memory. Writing a tar
loader is a nice exercise, but it’s not very
productive when you are developing a game. The basic tar
format is
old-fashioned, and modern tar
is full of peculiar details.
While it’s certainly possible (I do have a working implementation somewhere)
I still can’t really recommend using the tar
format in this case.
Besides, it’s much more fun going the DIY route and making a custom format
like outlined above.