The Developer’s Cry

Yet another blog by a hobbyist programmer

Rust In Practice - Ownership & Safety

The Rust programming language is known for its safety; Rust programs are robust and in principle do not contain any memory corruption bugs. This safety emanates from the borrow checker and its strict rules. Underlying the borrow checker is the ownership model. Ownership exists in many other languages too, but appears more prevalent in Rust. Notice how you allocate new objects all the time, but never explicitly free them? Rust’s automatic memory management builds on its ownership model. We typically pass references to functions, rather than passing the objects themselves. Why is Rust like that? How does it work? To answer, we must take a deep dive.

Computer programming is about telling a computer what to do; in its most primitive form, programming is just putting the right bytes in the right memory locations. As an exercise, take a piece of graph paper, and jot down a short grocery list:

The first item, “bananas”, takes up seven characters. The second, “apples”, is six characters. “Milk” is only four; we have variable length strings. What if we wish to store these strings in a dynamic array? Do we copy the entire strings? Do we only store pointers?

Suppose we only store pointers, what happens when we free the dynamically allocated array? Do we only free the array, or do we first free each individual pointer to the strings? In other words, who owns what memory?

The ownership problem has been solved in other languages by employing garbage collectors and/or reference counting frameworks. But it is and still remains a difficult problem in systems programming. Rust solves it in its own way, and it forms one of the pillars of safe programming.

At the lowest level, the computer doesn’t know and doesn’t care what you do. The high-level programming language is there to help you maintain variables, which is basically organizing computer memory in data structures. For example, a string in Rust looks a lot like this:

struct String {
    ptr: *u8,
    length: usize,
    capacity: usize,
}

The computer memory may look something akin to this:

@00001000 { @00002134, 7, 16 }

@00002134 'b' 'a' 'n' 'a' 'n' 'a' 's'

A single owned String consists of two parts; an instance containing metadata, and a backing store containing the value.

Now let’s push this string into a dynamic array and see what happens.

The dynamic array in Rust is a Vec of generic (any) type T:

struct Vec<T> {
    items: *T,
    length: usize,
    capacity: usize,
}

The Vec instance contains a pointer to its backing store holding the items. Each stored item is a complete instance, not just a pointer. In our case, each item has size size_of(String), which is typically 24 bytes.

Variables in Rust can have only one owner at a time. When you assign to a different variable, the ownership will be moved. This means that when we push the string into the vector, then the string value will move into the vector. The vector will be the new owner of the string.

Computer memory will now look something like this:

@00001000 00 00 00 00 00 00 00 00 (invalid)

@00001200 { @00001640, 1, 4 }

@00001640 { { @00002134, 7, 16 }, {}, {}, {} }

@00002134 'b' 'a' 'n' 'a' 'n' 'a' 's'

After moving, the string variable at address 01000 is no longer valid. I have depicted it here as memory being neatly zeroed out, but it may as well contain left-overs or random garbage. The compiler tracks the ownership and lifetime of the variable, and it knows the variable is now invalid, after it was moved. The string instance moved into the first slot of the dynamic array, which in this example has a capacity of four. The string value, “bananas”, was completely untouched in the process. The backing store (of the source) is not involved when moving a variable. Also, note how the addresses don’t have to neatly line up; computer memory may appear a fragmented, jumbled mess.

If you call a function in Rust, by default the arguments are pass-by-value. The same thing happens; the instance gets moved to the call stack. Now the function is the owner of the variable. When the function ends, the variable goes out of scope, and the variable gets dropped. Tada, we have a solid mechanism for automatic memory management.

Often when you call a subroutine, you actually don’t wish to give away ownership. What we can do instead is pass a borrowed reference, which is like a pointer, with an added annotation of temporary ownership. The subroutine can have the variable for now, but after it is done, we want it back.

So far, we can emulate this behavior in C++ by using move semantics, and we even can emulate it in plain C using raw pointers. But the Rust compiler takes things further to ensure safety.

In order to prevent data races, the Rust compiler checks mutability. There can be only one mutable reference to the data at once, ensuring that memory isn’t modified behind your back. Even in single threaded code it’s possible to create situations in which data accidentally gets clobbered. One example is iterating over an array (or vector), whilst removing items from that same array (vector) — it’s bound to give problems. The Rust compiler will rightfully stop you from doing that.

An easy way of visualizing how data races happen is again to take a piece of graph paper, and coloring all the cells that are given out as mutable. If you are touching an already colored cell in the process, then that’s the point where data corruption would occur in other languages. Rust’s set of rules will not allow that to happen; it will simply not compile. Rather than checking every byte, Rust knows that the variable already passed as mutable. This method is sometimes too coarse-grained, resulting in structural headaches. Take for example, large composite structs. Another infamous example is double linked lists. Nonetheless, Rust does succeed in staying safe.

How does the compiler know that we’re going to modify this data? Ironically, we told it the variable was mutable ourselves. Rust is safe because the programmer did the hard work of marking the variables as writeable.