Rust Variant: Tagged Union
Many scripting languages appear to be typeless, by which I mean they allow
you to assign to a variable whatever type of value you like; a variable might
contain an integer value, and then there is no problem re-assigning that
same variable with a string value. The lack of strict typing is a source
of bugs in scripts … which we then sometimes patch up with type annotations.
We can simulate the typeless behavior of interpreted scripting languages
by using a variant type.
I was never gonna touch Rust again, but never say never again, I guess.
The variant type in Rust is enum (really!). Enums can be integers, floats,
strings, whatever you want them to be. The syntax is as such:
#[derive(Debug, Clone)]
pub enum Value {
Integer(i64),
Float(f64),
String(String),
}
WUT.
“Enum” screams integer to me, so what is going here? In order to explain, we must take a look at how this works in C—I can’t explain it otherwise.
If you know your onions …
In the C programming language, a union is a struct-like type that overlays
each member in the same memory space. This allows you to address the same
bytes of memory as a different type.
Besides using a union for simulating CPU registers (an exercise for the avid reader), a practical application of unions is making a variant type for scripting variables that may hold an integer, float, or string value.
The union needs to be paired with a “tag” that says what type of value it is; hence the term “tagged union”.
typedef enum {
T_INT,
T_FLOAT,
T_STRING
} Tag;
typedef struct {
union {
long i;
double f;
char* s;
} value;
Tag tag;
} Value;
Now, whenever dealing with a Value, we can select based on the tag, and
use the value in the correct way:
switch (v.tag) {
case T_INT:
printf("%ld", v.i);
break;
case T_FLOAT:
printf("%f", v.f);
break;
case T_STRING:
printf("%s", v.s);
break;
default:
panic("invalid tag!");
}
Easy nuff, so what’s the fuzz.
Apples and onions
Coming from C, “enum” to me is synonymous with “integer”. The Rust enum type
is semantically not an integer at all, but as a C programmer first, it’s hard
to see it any other way.
The key to grasping the Rust enum is to look at it this way: if a thing
can be one of many, then it must be an enumerated type.
It makes perfect sense: the tagged union in C is a runtime solution to the problem of C not fully supporting variant types all the way through. The Rust compiler on the other hand does; it enumerates all known variants at compile time. The “tag” is a compiler detail; there is no explicit tag member in the source code.
There is (most certainly) a generated tag under the hood however.
To address the correct form we need to switch match the type.
For example, when implementing the Display trait:
impl fmt::Display for Value {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Value::Integer(x) => write!(f, "{}", x),
Value::Float(x) => write!(f, "{}", x),
Value::String(x) => write!(f, "{}", &x),
}
}
}
The Rust enum starts looking more magical when the contained type is
a struct. This allows you to abstract types in a way that looks like OOP
inheritance or dynamic binding, but really isn’t any of that.
A good example of this in the Rust standard library is std::net::IpAddr:
pub enum IpAddr {
V4(Ipv4Addr),
V6(Ipv6Addr),
}
The Ipv4Addr and Ipv6Addr types are completely different structs, but
the IpAddr can be either one of them.
TL;DR
The key to grasping the Rust enum is to look at it this way: if a thing
can be one of many, then it must be an enumerated type.