The Developer’s Cry

a blog about computer programming

Rust In Practice - Being lazy and static

Applications often come with configuration settings that influence the exact behavior of the program; they can be command-line switches, or settings that are in a config file. These settings are initialized early at program startup, but typically don’t change (much) afterwards. Maybe it’s just my lazy C coding style, but more often than not I’m just going to put such data as a global variable in the top of the main source file. Here comes the crowd that claims it’s bad to use global variables. Well, they’re not exactly “variable”; they are static and immutable for 99 percent of the lifetime of the program. What often happens is that you need some flags six subroutines down, and the function signatures of everything above that level are much cleaner without passing the options as parameters each and every time. But what about side-effects, function reproduceability and reusable code? The reality is that sometimes it’s easier to be lazy and static.

The time is now

As a first example, I was writing a very simple directory listing code. Depending on how old the files are, I want to format the last modification time of the files differently; if a file is recent, add the time in HH:MM format, otherwise show only the year if the file is old. So, each entry needs to be compared to the current time. We can choose to issue a library call to get the current time for each and every entry, or we can stick it in a variable and pass that down every subroutine call. Or … in C I would just use a static variable. Rust is not C, but hear me out;

use chrono::{DateTime, Local};
use lazy_static::lazy_static;

lazy_static! {
    static ref NOW: DateTime<Local> = chrono::Local::now();
}

The lazy_static! macro is funny animal. Unlike C, this code does not run before main(), despite being in the top of source. Instead, this code runs when NOW is being used for the first time, and it runs only once. In other words, the static is being lazily evaluated.

As proof, a silly playground program that demonstrates the behavior;

use chrono::{DateTime, Local};
use lazy_static::lazy_static;
use std::{thread, time};

lazy_static! {
    static ref NOW: DateTime<Local> = chrono::Local::now();
}

fn main() {
    let current_time = chrono::Local::now();
    println!("current time is: {}", current_time);

    // sleep 3 seconds
    thread::sleep(time::Duration::new(3, 0));

    // NOW is later than current_time
    println!("NOW is: {}", *NOW);

    // sleep 3 seconds
    thread::sleep(time::Duration::new(3, 0));

    // NOW did not change again
    println!("NOW is: {}", *NOW);
}

Note that the static must be a ref; this ensures that the value can not be moved.

Local and mutable

The lazy static doesn’t have to be global per se. It is useful as a function’s local cache, too. For example, I wanted the directory listing to show the UNIX permission bits. The file mode is an integer in octal representation that is decoded to a string that looks like -rwxr-xr-x (e.g, for executable files). The modes are identical for many files, and even though decoding them is not computationally expensive, we can squeeze out some more performance by caching the permission string mappings. For this we need a mutable hashmap.

#[cfg(unix)]
fn format_permissions(perms: &Permissions) -> String {
    use std::os::unix::fs::PermissionsExt;

    let mode = perms.mode() as u32;

    lazy_static! {
        static ref mut CACHE: HashMap<u32, String> = HashMap::new();
                       ^^^^^ no rules expected this token in macro call
    }

The UNIX permission bits are a platform-specific thing, so we use a Rust cfg directive and and an os::unix permissions extension. But other than that, our mutable lazy static hashmap does not compile! The lazy_static! macro does not allow the syntax.

There is a way around this, and it is wrapping the hashmap in a mutex. Yes, despite this being only a single threaded program, we are going to lock a mutex whenever we access the hashmap. Memory safety above all, I guess.

lazy_static! {
    static ref CACHE: Mutex<HashMap<u32, String>> = Mutex::new(HashMap::new());
}

let mut cache = CACHE.lock()
    .expect("failed to lock mutex on internal cache");

if let Some(perms_string) = cache.get(&mode) {
    // cache hit
    return perms_string.clone();
}

// cache miss; make new permissions string
let perms_string = format_new_permissions_string(mode);

// update cache with (clone of) new permissions string
cache.insert(mode, perms_string.clone());

// return the permissions string
perms_string

So, we lock the mutex to gain access to the hashmap. We can get mutable access even though we never explicitly specified it as being mutable; the write access is guaranteed to be safe because we hold an exclusive lock. The lock() call may actually fail, but it’s a rare condition of the kind “the operating system is in a bad state, it’s time to reboot”. This is one of few places where it’s okay to .unwrap(), ie. panic on failure. I rather like using .expect() [which in my mind should have been called .unexpected(), really]. The mutex unlocks automatically upon function return, when it goes out of scope.

You might think it should be possible to code this so that we return a reference to the cached string inside the hashmap. Rust won’t let us do that however. Despite being a lazy static, Rust still treats CACHE as a local variable—and you can’t return a reference to local. Then we might try making CACHE a global lazy static, which gave me lifetimes and borrow-check hell. Returning references in Rust usually leads to nowhere, so I’m fine with cloning the permissions string.

Luxury

In the first example I used lazy_static! for initializing a global variable only once, and in the second I used it for a local cache. If you’re only going to set it once, we can do that in a more luxurious way, using a OnceCell.

use once_cell::sync::OnceCell;

static NOW: OnceCell<DateTime<Local>> = OnceCell::new();
let now = NOW.get_or_init(|| chrono::Local::now());
println!("NOW is: {}", now);

Here, the .get_or_init() call ensures that the variable gets set only once. The static can be local, but what’s particularly nice about OnceCell is that it’s more clear, I mean explicit, about being set only once. There is less magic going on than with lazy_static.

This, and some other codes, running through my mind, lazin’ on a sunny afternoon, in the summertime.