The Developer’s Cry

a blog about computer programming

The uType Type System

Despite having said I would never go back to C++, guess what I did for my latest toy project. Rust is a very good language, but I haven’t gotten up to speed with it yet. The cause of this is not even the borrow checker. Rust’s error handling with Ok/Err and Some/None can get tedious. But you know, we take with us that what we learned. I will surely pick up Rust again later, but for this particular project I chose C++.

The problem with C++ is not that it’s a bad language (well, it’s not all bad), it’s the standard library that makes it a drag. You can mold C++ and make it more pleasant to work with, but it usually does mean stacking another layer on top. Now before you start screaming that adding layer upon layer makes all software slow, consider that we are not a scripting language, and that we have immense cpu power at our disposal. A pleasant framework helps you write good code faster, and there is a lot of value in that.

Primitive types

We start out by taking over some of Rust’s primitive types. Note that bool is already in C++. There are currently a number of new languages that have adopted these type names. I’m not sure there is a name for it; I have dubbed my version “the uType type system.”

using i8 = std::int8_t;
using i16 = std::int16_t;
using i32 = std::int32_t;
using i64 = std::int64_t;

using u8 = std::uint8_t;
using u16 = std::uint16_t;
using u32 = std::uint32_t;
using u64 = std::uint64_t;

using f32 = float;
using f64 = double;

This simple trick alone already improves our quality of life. While we are at it, I’m just going to bring over str as well:

using str = const char*;

This obfuscates the fact that it’s a pointer, but for now I’m OK with it.

There is a convenient type in Go for unicode characters, that will come in handy at some point:

using rune = std::uint32_t;

Printing

We can define some print functions:

void println(str fmt, ...);
void eprintln(str fmt, ...);
void print(str fmt, ...);
void eprint(str fmt, ...);

These are simple wrappers around vprintf(), but you might also use libfmt and have "{}"-style format strings.

The print functions are not so exciting by themselves, but they pave the way for the following:

#define panic(...)  panic_println(__FILE__, __LINE__, \
                            __FUNCTION__, __VA_ARGS__)

#define debug(...)  debug_println(__FILE__, __LINE__, \
                            __FUNCTION__, __VA_ARGS__)

#define assert(x)                       \
    if (! (x)) {                        \
        panic("assert fail: %s", #x);   \
    }

Growable array

A convenient type to always have around is a dynamically growing array. You can implement it all by yourself, or you can be lazy quicker and simply wrap std::vector.

template <typename T>
class Vec {
    std::vector<T> v;

public:
...

    size_type len(void) const { return v.size(); }

    size_type cap(void) const { return v.capacity(); }

    T& operator[](size_type idx) {
        // do bounds checking
        assert(idx >= 0 && idx < len());
        return v.operator[](idx);
    }

    const T& operator[](size_type idx) const {
        // do bounds checking
        assert(idx >= 0 && idx < len());
        return v.operator[](idx);
    }

Our Vec operator[] does bounds checking. Mind that our version of assert() is “always-on”; asserts are enabled even in release builds. Also note that we do not throw any exception; the program simply panics in case of an error deep in the library. More on error handling later.

Not all code for the implementation of Vec is shown here, but you get the idea. We take what we like from other languages, personally I like Python’s API a lot, so let’s add .pop(idx) and .extend():

void push(const T& item) { v.push_back(item); }

T pop(void) {
    // panics when empty
    assert(len() >= 1);
    T item = std::move(v.back());
    v.pop_back();
    return item;
}

T pop(isize idx=-1) {
    // panics when empty
    assert(len() >= 1);

    if (idx < 0) {
        idx += len();
    }
    assert(idx >= 0);
    assert(idx < len());
    return remove(idx);
}

void extend(const Vec<T>& o) {
    v.reserve(len() + o.len());
    v.insert(v.end(), o.v.begin(), o.v.end());
}

For sorting arrays I like using a compare function that returns minus one, zero, or one:

// using std::function also enables use of lambda's
// as compare functions
using Fn = std::function<int(const T&, const T&)>;

void sort(const Fn& cmp) {
    if (!len()) {
        return;
    }

    auto lambda = [cmp](const T&a, const T&b) -> bool {
        return cmp(a, b) == -1;
    };
    std::stable_sort(v.begin(), v.end(), lambda);
}

For enabling iterators we can simply do:

auto begin(void) { return v.begin(); }
auto end(void) { return v.end(); }

auto begin(void) const { return v.begin(); }
auto end(void) const { return v.end(); }

There is one good use-case for iterators in C++ : the ranged for-loop:

for (auto& item : my_vec) {
    ...
}

The next thing to add would be array slices. But since I have no need for them right now, slices are left until a later time.

String type

The C++ std::string works, but it’s not very great. We can make a String class and give it largely the same treatment as Vec. Again, I find Python’s API quite nice. There are too many methods to be (re)implemented, so put off doing any of that until you actually need that functionality.

Implementing a unicode-aware string class can be a bit tricky, as you have to do the unicode encode/decode dance by yourself. A common way of dealing with unicode is:

For example, some methods dealing with adding/removing characters:

void String::push(rune r);
rune String::pop(void);
rune String::remove(isize idx);

One thing not to forget with strings are binary plus operators, so that you can write (literal) string additions.

friend String operator+(const std::string& s1, const String&);

friend String operator+(str s1, const String&);

Strings are often implemented as immutable objects, using a functional styled API. This is especially nice when the string is a key in an associative array (dictionary).

The backing store of String does not necessarily have to end with a null terminator. But since we are in C/C++ land, for interoperability there must be a .cstr() method returning a const char pointer, that does ensure a terminating null byte.

Error handling

So far it’s been easy. Now we get to error handling, and it’s an important topic. Error handling is a defining component in how a programming language feels.

In plain C the error handling are just integers that are returned, often minus one, or sometimes the errno itself is returned. This system is very limited, and leads to problems where you can’t distinguish a good value from an error (for example, atoi()). In such cases you can use an “out” parameter, but it’s not great.

In Python the error handling is entirely based on exceptions. You can try doing the same thing in C++ and throw exceptions, but there it totally does not work unless you are extremely careful and employ RAII everywhere. In theory it’s great, but in practice you will fail to implement it correctly (believe me, people have tried) simply because the entire framework must be built from the ground up using RAII and exceptions. This works well in Python, but not in C/C++ land.

In Rust the error handling is based on the Result type, that can hold an “Ok” value, and/or an “Err” value. Users can match the result against Ok or Err, and extract the respective values. On top of that Rust has an Option type, that can contain “Some” value or “None”. We can mimic both idioms in C++ by writing clever template classes. While pretty sophisticated, personally I find dealing with these wrapper types to be tiresome enough that I want something else here.

In Go, functions can have multiple return values. A function can therefore return an “ok, value”, or a “value, error”. There is an error type that holds the error message. It’s simple and effective, but requires the language to support multiple return values. We can actually do this in C++ by returning a tuple:

tuple<bool, Error> path_isdir(str path) { ... }

auto [isdir, err] = path_isdir(path);
if (err) {
    eprintln("error: %s", err.msg());
    return -1;
}
if (isdir) {
    ...
}

The syntax with auto tuple is in C++17. Note that in this example I print the error message, and then return minus one. You can mix styles as you see fit, the alternative is to bubble up and return the error object. The Error class contains just a string. It has an operator bool() so that it can be used as an expression in an if-statement.

Smart pointers

I can be brief about smart pointers:

template <typename T>
using Box = std::unique_ptr<T>;

template <typename T>
using Rc = std::shared_ptr<T>;

Just use type aliases.

Quality of life

By employing a number of simple tricks we have greatly improved upon our quality of life. The perfect programming language does not exist. The fact that we are doing this is telling that we are not 100% happy with C++. At the same time, C++ does let us fly. The language is customizable enough so that we can mimic some handy types from other languages.

Making your own framework like this has its pros and cons. Arguably it’s a layer of code that doesn’t do anything substantial at the machine level. And yet it is a joy to work with, allowing you to write useful code a lot smoother than with the vanilla standard library. This kind of framework are often very personal codes; the fun part is that you get to decide what goes in there, but don’t expect anyone else to ever use it, even if open sourced.

You should be careful not to overdo it. Implement only what you truly do use, and use right now; do not write any code that sits there just for the sake of offering functionality, but is ultimately left unused. That is a huge waste of time. Instead, write the other (useful) program that you want to make, and add to the personal framework whenever you run into the situation where you need something new.

Finally, test and debug like your life depends on it. You don’t want to have bugs in a framework that forms the basis of other code.