Rust In Practice - Being lazy and static
Applications often come with configuration settings that influence the exact behavior of the program; they can be command-line switches, or settings that are in a config file. These settings are initialized early at program startup, but typically don’t change (much) afterwards. Maybe it’s just my lazy C coding style, but more often than not I’m just going to put such data as a global variable in the top of the main source file. Here comes the crowd that claims it’s bad to use global variables. Well, they’re not exactly “variable”; they are static and immutable for 99 percent of the lifetime of the program. What often happens is that you need some flags six subroutines down, and the function signatures of everything above that level are much cleaner without passing the options as parameters each and every time. But what about side-effects, function reproduceability and reusable code? The reality is that sometimes it’s easier to be lazy and static.
The time is now
As a first example, I was writing a very simple directory listing code.
Depending on how old the files are, I want to format the last modification
time of the files differently; if a file is recent, add the time in HH:MM
format, otherwise show only the year if the file is old. So, each entry needs
to be compared to the current time. We can choose to issue a library call
to get the current time for each and every entry, or we can stick it
in a variable and pass that down every subroutine call. Or …
in C I would just use a static variable. Rust is not C, but hear me out;
use chrono::{DateTime, Local};
use lazy_static::lazy_static;
lazy_static! {
static ref NOW: DateTime<Local> = chrono::Local::now();
}
The lazy_static!
macro is funny animal. Unlike C, this code does not run
before main()
, despite being in the top of source. Instead, this code runs
when NOW
is being used for the first time, and it runs only once.
In other words, the static is being lazily evaluated.
As proof, a silly playground program that demonstrates the behavior;
use chrono::{DateTime, Local};
use lazy_static::lazy_static;
use std::{thread, time};
lazy_static! {
static ref NOW: DateTime<Local> = chrono::Local::now();
}
fn main() {
let current_time = chrono::Local::now();
println!("current time is: {}", current_time);
// sleep 3 seconds
thread::sleep(time::Duration::new(3, 0));
// NOW is later than current_time
println!("NOW is: {}", *NOW);
// sleep 3 seconds
thread::sleep(time::Duration::new(3, 0));
// NOW did not change again
println!("NOW is: {}", *NOW);
}
Note that the static must be a ref
; this ensures that the value can not be
moved.
Local and mutable
The lazy static doesn’t have to be global per se. It is useful as a
function’s local cache, too. For example, I wanted the directory listing
to show the UNIX permission bits. The file mode is an integer in octal
representation that is decoded to a string that looks like -rwxr-xr-x
(e.g, for executable files). The modes are identical for many files, and
even though decoding them is not computationally expensive, we can squeeze
out some more performance by caching the permission string mappings.
For this we need a mutable hashmap.
#[cfg(unix)]
fn format_permissions(perms: &Permissions) -> String {
use std::os::unix::fs::PermissionsExt;
let mode = perms.mode() as u32;
lazy_static! {
static ref mut CACHE: HashMap<u32, String> = HashMap::new();
^^^^^ no rules expected this token in macro call
}
The UNIX permission bits are a platform-specific thing, so we use a Rust
cfg
directive and and an os::unix
permissions extension. But other than
that, our mutable lazy static hashmap does not compile!
The lazy_static!
macro does not allow the syntax.
There is a way around this, and it is wrapping the hashmap in a mutex. Yes, despite this being only a single threaded program, we are going to lock a mutex whenever we access the hashmap. Memory safety above all, I guess.
lazy_static! {
static ref CACHE: Mutex<HashMap<u32, String>> = Mutex::new(HashMap::new());
}
let mut cache = CACHE.lock()
.expect("failed to lock mutex on internal cache");
if let Some(perms_string) = cache.get(&mode) {
// cache hit
return perms_string.clone();
}
// cache miss; make new permissions string
let perms_string = format_new_permissions_string(mode);
// update cache with (clone of) new permissions string
cache.insert(mode, perms_string.clone());
// return the permissions string
perms_string
So, we lock the mutex to gain access to the hashmap. We can get mutable access
even though we never explicitly specified it as being mutable; the write access
is guaranteed to be safe because we hold an exclusive lock. The lock()
call may actually fail, but it’s a rare condition of the kind “the operating
system is in a bad state, it’s time to reboot”. This is one of few places
where it’s okay to .unwrap()
, ie. panic on failure. I rather like using
.expect()
[which in my mind should have been called .unexpected()
, really].
The mutex unlocks automatically upon function return, when it goes
out of scope.
You might think it should be possible to code this so that we return
a reference to the cached string inside the hashmap. Rust won’t let us do
that however. Despite being a lazy static, Rust still treats CACHE
as
a local variable—and you can’t return a reference to local.
Then we might try making CACHE
a global lazy static, which gave me
lifetimes and borrow-check hell. Returning references in Rust usually leads
to nowhere, so I’m fine with cloning the permissions string.
Luxury
In the first example I used lazy_static!
for initializing a global variable
only once, and in the second I used it for a local cache.
If you’re only going to set it once, we can do that in a more luxurious way,
using a OnceCell
.
use once_cell::sync::OnceCell;
static NOW: OnceCell<DateTime<Local>> = OnceCell::new();
let now = NOW.get_or_init(|| chrono::Local::now());
println!("NOW is: {}", now);
Here, the .get_or_init()
call ensures that the variable gets set only once.
The static can be local, but what’s particularly nice about OnceCell
is
that it’s more clear, I mean explicit, about being set only once.
There is less magic going on than with lazy_static
.
This, and some other codes, running through my mind, lazin’ on a sunny afternoon, in the summertime.