Sunday, December 4, 2011

libobjects++ : A Python-esque library for C++

Nearly exactly a year ago, I wrote about C+Objects: A Python-esque library for C/C++. The goal of this library was to ease programming in C/C++ in such a way that the code would resemble Python (to a certain extent) and that it would make programming in C/C++ as easy as it is in Python. I implemented the library in C++, which wasn’t very successful, and last summer I implemented a version of it in standard C. The standard C version handles very much like Objective-C (rather than like Python); all objects are pointers. All objects are manually reference counted using retain() and release() calls. There is a print() function that prints any Object. It has runtime typing information and even features inheritance (in standard C!).

After a while I did get a bit tired of the interface though. Standard C has no this pointer for objects so you have to pass a pointer along as first argument to the method functions. Secondly, there is no operator overloading. For example, for the Array class you would call arrayItemAt(arr, idx) rather than just arr[idx]. This was particularly bothersome because in C++ you can overload the subscript operator and make it accept this syntax. Furthermore, standard C has no references like C++ has, so you can’t address an array item and modify it on the spot.

So even though C++ is not my favorite language (C++ is too complicated to get any work done quickly), it does have some neat features that I needed for libobjects to work the way I wanted. A recap of the would-like-to-haves:

everything is an Object
objects can be stored into containers: arrays, lists, dictionaries
containers are also Objects (see first bullet)
container classes like arrays should only store pointers
objects are automatically reference counted
strings should be copy-on-write
number objects should be copied, not referenced
adding new user-defined classes should be made easy

For libobjects++ (the all new and shiny C++ rewrite of libobjects) I made a couple of design decisions that changed the game for the better.

One of the problems I was wrestling with was this: in C++ you can have an instance go out of scope, and it will self-destruct. If you wish to retain this object, you would have to make a copy using new and keep that pointer around somehow. For this reason, in the old C++ implementation, every Object was a proxy that had a pointer to a reference counted backend. It didn’t work very nicely because it required you to add two classes for every newly defined class: one for the proxy and one for the backend.

The solution to this problem has multiple aspects. Firstly, the new implementation has the convention that the backend is only a struct. From this follows that all the method code that fiddles with the structure fields is present in the proxy class. Secondly, the Object proxy class itself has no real data fields; it only has a pointer to the data structure. Thirdly, the data structure has been encapsulated in a reference counting back end object. This encapsulation has been realized by means of a C++ template. Templates can be hard to write but they ensure correct typing, which is a good thing when you are building robust code. The Object class acts much like a smart pointer to the original data structure. The user of the Object class sees only the data structure without ever knowing that it is automatically being reference counted. I said it acts much like a smart pointer because it isn’t really one, the user of the class doesn’t bother with pointers at all. The user mostly uses the Integer number class, the String class, the Array class, which are derived from Object.

A note about reference counting; in my opinion it is best implemented using intrusive reference counting. Intrusive means the counter itself is part of the data structure; the object that is being reference counted. The most non-intrusive way of adding intrusive reference counting is to wrap the whole data structure by a new class that includes the reference count. Note how this is different from having a reference counted base class that the object must be derived from (see NSObject in Objective-C—which doesn’t have templates, and thus solved it this way). Another note, I started out with a non-intrusive reference counting design, in which the reference count lives in a smart pointer that points at the referenced object. This costs another “hop” and therefore (a little) performance.

Although the internals of libobjects++ were kind of hard to write, it is a joy to use and see in action. I’d post a snippet of demo code, but libobjects++ is not fully finished at this point and not available for download yet either. The first things to be added next are the Array and Dictionary classes. After that I would like to add a File, Buffer, Thread, and maybe Socket. These would be simply re-implementations of older code I have lying around. Adding multi-threading to a reference counting library is another problematic topic I won’t go into deeper right now.

Best way to think of libobjects++ is as a useful replacement for STL. All this basic functionality is available in STL and boost, but somehow those never quite did it for me. Their interfaces simply do not handle as easily as Python’s or libobjects++’s.

I’d like to add that although libobjects++ is really nice (and fully type safe), Python does handle things differently quite fundamentally. In Python, a variable is a pointer to a backend of any type. When you for example address the pointer as if it were an array, Python will try to treat it as an array. If you then address the variable like it’s a string, Python will treat it like a string. A runtime type check is made whether the object really supports a given method or operator, and if it doesn’t, it will throw an exception. Think for a moment about how different that approach is compared to C++’s compile time type checking and my (now seemingly straightforward) implementation of libobjects++.
I generally consider the loose typing nature of Python one of its great strengths, but sometimes all this runtime checking can be a major nuisance. Other than a basic syntax check there appears to be hardly any compile time checking performed at all.

I could go on ranting about this topic for days, but to conclude this blog entry, I would like to give you something to ponder: C++ is becoming a language of the past decades, while it seems Python stays overshadowed by Java and JavaScript.
(Not that it matters much, every language has its uses).

Update: I ran into a problem with the templated backend. C++ has strict typing, so it became near impossible to create a generic container class that can hold any kind of item. There are ways around this … I actually started over yet again with a design that is a compromise between the old and the new way: it has two classes, a front end and a back end. By convention, the back end does not implement many methods. The back end is friends with the front end, which can manipulate the back end directly. This works pretty good so far …