Sunday, November 12, 2017

C++ STL-compatible iterators

Last time we had a look at the deque, a container class that basically is a linked list of arrays. We made our own implementation, and one thing missing there was an accompanying iterator class. Iterators have become so ingrained in C++ coding styles, that it’s only logical that custom containers must support STL-compatible iterators. As you might know I’m not the greatest fan of the STL (C++ Standard Template Library), but even so I wouldn’t want to burden anyone with having to deal with non STL-compatible iterators. Therefore we will embrace it for once (a little) and write iterators, done the STL way.

An iterator is an object that allows you to easily loop over each item in a container. There are many kinds of container classes, and each has its own way of organizing its contents internally. The iterator abstracts away the details of accessing items, allowing to access containers in a generic way. No matter whether the container at hand internally is an array, linked list, binary tree or what not, the C++ idiom to loop over all items is:

MyContainer<int> c = ...;

for (auto it = c.begin(); it != c.end(); ++it) {
    int item = *it;
    ...
}

// or even, in C++11:

for (int item : c) {
    ...
}

Also note the use of the generic auto type to avoid having to declare the exact, complicated iterator type.

This generic programming trick opens up the way to STL algorithms: a large collection of convenience functions for containers; sorting, searching, copying, transforming, and more. Just by adding an iterator to class MyContainer, we can now sort and search MyContainers. How cool is that? Although iterators do add another level of abstraction (and thus another level of indirection), the benefits outweigh the cost.

How to implement an iterator? We will start out simple and build it from there. The iterator class is basically a pointer into the container. It implements certain operators, most importantly operator++() (increment position) and operator*() (get item at position). Since the iterator works in tandem with the container, its class declaration will be nested in the container class:

template <typename T>
class MyContainer {

    class iterator {
        T *p;

    public:
        // constructors
        iterator() : p(nullptr) { }
        iterator(T *start) : p(start) { }

        ~iterator() { }

        // equality
        bool operator==(const iterator& o) const {
            return p == o.p;
        }

        // inequality
        bool operator!=(const iterator& o) const {
            return !operator==(o);
        }

        // pre-increment position
        iterator& operator++() {
            if (p == nullptr) {
                // ignored; no-op
                return *this;
            }
            // FIXME we never detect the end
            p++;
            return *this;
        }

        // get item reference at
        T& operator*() const {
            assert(p != nullptr);
            return *p;
        }
    };

    T *data;

    ...
};

Class MyContainer has a pointer to some allocated memory for storing items. The iterator is declared as a nested class, which has a raw pointer directly into MyContainer’s space. Note that in this case I use a pointer, but you might as well use an integer as an index here—it’s easy to think of iterators as abstracted pointers, so I think it works better as an example. We check for nullptr, and I use assert to bail out on an invalid operation. You may also choose to throw an exception, but there seems to be no standard way of doing this, probably use std::runtime_error or so.

There are various other comments to make about this code, but for simplicity’s sake, this is the bare skeleton for beginning an iterator class. Indeed, iterator classes may become several pages long depending on what functionality needs to be included.

To create an iterator instance we must call the begin() or end() method:

template <typename T>
class MyContainer {

    class iterator {
        ...
    };

    T *data;

public:
    ...

    iterator begin(void) {
        return iterator(data);
    }

    iterator end(void) {
        return iterator();
    }
};

So, begin() creates an iterator to the beginning of data, while end() creates a “null” iterator. Thus we can write:

// loop over all items
for (auto it = c.begin(); it != c.end(); ++it) {
    item = *it;
}

We should improve our implementation of the iterator, and add some more operators:

    // copy constructor
    iterator(const iterator& o) : p(o.p) { }

    // move constructor
    iterator(iterator&& o) : p(o.p) {
        o.p = nullptr;
    }

    // copy assignment
    iterator& operator=(const iterator& o) {
        if (this == &o) {
            return *this;
        }
        p = o.p;
        return *this;
    }

    // move assignment
    iterator& operator=(iterator&& o) {
        p = o.p;
        o.p = nullptr;
        return *this;
    }

    // post-increment
    iterator operator++(int) {
        iterator old(*this);
        this->operator++();
        return old;
    }

    // dereference
    T* operator->() const {
        assert(p != nullptr);
        return p;
    }

Pay extra attention to the return type (and value) of the post-increment operator: it returns a copy of the iterator as it was before the increment was done.

With all of this in place, we now have a so-called “forward iterator”. It can be made bidirectional by adding pre- and post-decrement operators.

In its current state it’s a pretty decent iterator class, but still isn’t STL-compatible. Making it STL-compatible is important for getting it to work with C++ <algorithm>. To make it work, just copy-paste from the boilerplate:

#include <iterator>

class iterator : public std::iterator<std::forward_iterator_tag, T> {
    T *p;

public:
    typedef T value_type;
    typedef T& reference;
    typedef T* pointer;
    typedef int difference_type;
    typedef std::forward_iterator_tag iterator_category;

    ...
};

Finally, we should be able to do something useful, like this:

#include <algorithm>

void example1(MyContainer<int>& c) {    // or: const&
    // find element
    auto it = std::find(c.begin(), c.end(), 10);
    if (it != c.end()) {
        // found value: *it
    } else {
        // not found
    }
}

void example2(MyContainer<int>& c) {
    // sort all items
    std::sort(c.begin(), c.end());
}

The first example works, but the second doesn’t. Alas, std::sort() requires a random-access iterator. A random-access iterator must implement all of the following:

copy-construct and copy-assign;
equality and inequality;
pre- and post-increment;
pre- and post-decrement;
arithmetic add and subtract;
compound assignment for add and subtract;
all comparison operators;
dereferencing;
indexing

It should be clear that writing a custom random-access iterator is, simply put, a lot of work. Right about here is where it ends being fun, but let me ramble on a little more for sake of completeness.

We haven’t yet touched upon constness. C++ is very picky about objects being const. It’s not just being pedantic, it’s important for code correctness as it allows the compiler to do certain optimizations if it knows that an object is not going to change state along the way. Therefore we should implement const_iterator, which is the ‘const’ version of iterator. A technique to avoid code duplication is to first write const_iterator, and then have iterator inherit from const_iterator:

class const_iterator : public std::iterator<std::forward_iterator_tag, T> {
    T *p;

public:
    typedef T value_type;
    typedef T& reference;
    typedef const T& const_reference;
    typedef T* pointer;
    typedef const T* const_pointer;
    typedef int difference_type;
    typedef std::forward_iterator_tag iterator_category;

    const_iterator() : p(nullptr) { }

    ...

    const_iterator& operator++() { ... }
    const_iterator operator++(int) { ... }

    const T& operator*() const { ... }
    const T* operator->() const { ... }
};

// the non-const version
class iterator : public const_iterator {
public:
    typedef T value_type;
    typedef T& reference;
    typedef T* pointer;
    typedef int difference_type;
    typedef std::forward_iterator_tag iterator_category;

    // can not use member initializers here :(
    iterator() {
        p = nullptr;
    }

    ...

    // reuse code from const_iterator whenever you can

    using const_iterator::operator==;
    using const_iterator::operator!=;

    iterator& operator++() {
        const_iterator::operator++();
        return *this;
    }

    iterator operator++(int) {
        iterator old(*this);
        this->operator++();
        return old;
    }

    T& operator*() const { ... }
    T* operator->() const { ... }
};

...

const_iterator cbegin(void) const {
    return const_iterator(data);
}

const_iterator cend(void) const {
    return const_iterator();
}

Note how pointer p in const_iterator is not ‘const’ by itself. This trick allows us to be liberal and easily call const_iterator code from the non-const iterator code.

Lastly, bidirectional iterators allow for traversal in reverse. You may create a special reverse iterator for this. The reverse iterator can be generated from a standard template, all you need to do is add one magic line to the container class:

typedef std::reverse_iterator<iterator> reverse_iterator;

In theory you can also create a const_reverse_iterator, but unfortunately it only gave me compiler errors. I suspect an issue with the STL on my dev platform, or maybe it happens because we took some liberties with const_iterator. There isn’t a lot of information to be found the web about const_reverse_iterator, which makes me even more suspicious. Personally, I find reverse iterators to be confusing. Why not simply steer the bidirectional iterator in opposite direction, and avoid reverse iterators altogether.

All in all, iterators are a pretty complicated subject. Easy on the outside, but unexpectedly difficult on the inside. When done right, it gets quite tedious and full of C++ gagh. Just for your amusement, I have left in a glaring bug (see the FIXME remark) in order to show that writing iterators requires careful planning and minute testing.