How do Trait Objects work?

By Clemens Tiedt

2021-08-27

As all good deep dives do, this article starts with a slightly contrived code example. Let's say, I have this file containing a number and want to read that. Here's a (very) naive Rust implementation:

fn read_numbers_from_file(path: String) -> u32 {
    std::fs::read_to_string(path).unwrap().parse().unwrap()
}

And this works, as long as my path is correct and all the numbers are well-behaved (e.g. no trailing whitespace). If that's not the case, it'll panic and take the problem with it. So, let's fix the function to return a Result:

fn read_number_from_file(path: String) -> Result<u32, std::io::Error> {
    Ok(std::fs::read_to_string(path)?.parse().unwrap)
}

And this looks better, but we only covered one possible error. The reason for that is rather simple: read_to_string and u32::parse return different error types, but our Result can only return one... Or can it? This is where trait objects make their entrance:

fn read_number_from_file(path: String) -> Result<u32, Box<dyn std::error::Error>> {
    Ok(std::fs::read_to_string(path)?.parse()?)
}

By returning a Box<dyn std::error::Error>>, we can use the ? operator on both of our Results.

This is Rust's way of implementing polymorphism: Both kinds of error we're dealing with here implement the Error trait. If we want to treat them both the same way, we can treat them as if they only implemented this trait. Since we only know that we're going to return some type implementing Error, but not how big it is, we need to add a pointer as a layer of indirection (For some background, see my article on pointers).

How to trait your objects

But how does all of this work under the hood? Let's build a toy example:

pub trait MyTrait {
    fn value(&self) -> u32;
}

impl MyTrait for u32 {
    fn value(&self) -> u32 {
        *self
    }
}

impl MyTrait for i32 {
    fn value(&self) -> u32 {
        *self as u32
    }
}

pub fn do_thing_with_value(v: &dyn MyTrait) {
    v.value();
}

pub fn do_other_thing_with_value(v: impl MyTrait) {
    v.value();
}

pub fn main() {
    do_thing_with_value(&0u32);
    do_other_thing_with_value(0u32);
    do_thing_with_value(&0i32);
    do_other_thing_with_value(0i32);
}

We have a trait MyTrait that is implemented for u32 and i32 and we have two functions that take an argument that implements this trait using two different syntax variants. You can also see I switched from Box<dyn Trait> to &dyn Trait. As a reference is also a pointer, it will work here and it has the advantage of making the assembly code generated on Compiler Explorer¹ much more readable. Did I mention we were going to look at assembly code? I probably should have.

core::ptr::drop_in_place<i32>:
        ret

core::ptr::drop_in_place<u32>:
        ret

<u32 as example::MyTrait>::value:
        mov     eax, dword ptr [rdi]
        ret

<i32 as example::MyTrait>::value:
        mov     eax, dword ptr [rdi]
        ret

example::do_thing_with_value:
        push    rax
        call    qword ptr [rsi + 24]
        pop     rax
        ret

example::do_other_thing_with_value:
        sub     rsp, 24
        mov     dword ptr [rsp + 4], edi
        mov     rax, qword ptr [rip + <i32 as example::MyTrait>::value@GOTPCREL]
        lea     rdi, [rsp + 4]
        call    rax
        jmp     .LBB5_1
.LBB5_1:
        jmp     .LBB5_5
.LBB5_2:
        jmp     .LBB5_4
        mov     rcx, rax
        mov     eax, edx
        mov     qword ptr [rsp + 8], rcx
        mov     dword ptr [rsp + 16], eax
        jmp     .LBB5_2
.LBB5_4:
        mov     rdi, qword ptr [rsp + 8]
        call    _Unwind_Resume@PLT
        ud2
.LBB5_5:
        add     rsp, 24
        ret

example::do_other_thing_with_value:
        sub     rsp, 24
        mov     dword ptr [rsp + 4], edi
        mov     rax, qword ptr [rip + <u32 as example::MyTrait>::value@GOTPCREL]
        lea     rdi, [rsp + 4]
        call    rax
        jmp     .LBB6_1
.LBB6_1:
        jmp     .LBB6_5
.LBB6_2:
        jmp     .LBB6_4
        mov     rcx, rax
        mov     eax, edx
        mov     qword ptr [rsp + 8], rcx
        mov     dword ptr [rsp + 16], eax
        jmp     .LBB6_2
.LBB6_4:
        mov     rdi, qword ptr [rsp + 8]
        call    _Unwind_Resume@PLT
        ud2
.LBB6_5:
        add     rsp, 24
        ret

example::main:
        push    rax
        lea     rdi, [rip + .L__unnamed_1]
        lea     rsi, [rip + .L__unnamed_2]
        call    qword ptr [rip + example::do_thing_with_value@GOTPCREL]
        xor     edi, edi
        call    qword ptr [rip + example::do_other_thing_with_value@GOTPCREL]
        lea     rdi, [rip + .L__unnamed_1]
        lea     rsi, [rip + .L__unnamed_3]
        call    qword ptr [rip + example::do_thing_with_value@GOTPCREL]
        xor     edi, edi
        call    qword ptr [rip + example::do_other_thing_with_value@GOTPCREL]
        pop     rax
        ret

.L__unnamed_1:
        .zero   4

.L__unnamed_2:
        .quad   core::ptr::drop_in_place<u32>
        .quad   4
        .quad   4
        .quad   <u32 as example::MyTrait>::value

.L__unnamed_3:
        .quad   core::ptr::drop_in_place<i32>
        .quad   4
        .quad   4
        .quad   <i32 as example::MyTrait>::value

That's quite a bit of assembly and might look a bit intimidating, so let's got through it bit by bit. First, we have this:

core::ptr::drop_in_place<i32>:
        ret

core::ptr::drop_in_place<u32>:
        ret

A quick look at the Rust docs shows us that core::ptr::drop_in_place is a function that can be used to manually drop a pointer. The Rust compiler adds them in to use as destructors for our trait objects. As we have two implementations of our trait, we need two concrete drop_in_place implementations for u32 and i32. This is the first time we see Rust deal with generics by using Monomorphization: We use the function with two concrete types in place of the generic, so Rust generates two concrete implementations. Afer that, we see our trait implementations:

<u32 as example::MyTrait>::value:
        mov     eax, dword ptr [rdi]
        ret

<i32 as example::MyTrait>::value:
        mov     eax, dword ptr [rdi]
        ret

These functions actually seem to do something! Even if you're unfamiliar with x86 assembly, you can probably guess that mov stands for "move". In intel syntax (one of the ways to write x86 assembly), the destination is the first argument and the source the last. So we're moving something from dword ptr [rdi] to eax. A quick google search tells us that dword ptr is a "size directive" to only use 32 bits of the rdi register. Aha, so rdi is a register! Another trip to your search engine of choice leads us to a description of all the register and confirms the suspicion that eax could also be one. Now we can piece together what the trait implementations do exactly: They take the lower 32 bits of the rdi register and move them into eax. But why those specific registers? Because of calling conventions! Calling conventions tell us that for a function call we should put the arguments in specific registers and read the return value from a specific register. So rdi must contain the reference to self that our function takes. Mystery solved! Our next bit of code is this:

example::do_thing_with_value:
        push    rax
        call    qword ptr [rsi + 24]
        pop     rax
        ret

You can probably guess what it is. It starts by saving the current value of the rax register on the stack. The eax register from before is really just the lower 32 bits of the rax register. Then it calls some function that has an address 24 bytes offset from the value of the rsi register (don't worry, we'll find out what that is later). Finally, it writes the last thing on the stack back into rax. Pushing and popping rax is really just cleanup around the call instruction. Next, we get this (slightly longer) assembly for our function using the impl Trait argument syntax:

example::do_other_thing_with_value:
        sub     rsp, 24
        mov     dword ptr [rsp + 4], edi
        mov     rax, qword ptr [rip + <i32 as example::MyTrait>::value@GOTPCREL]
        lea     rdi, [rsp + 4]
        call    rax
        jmp     .LBB5_1
.LBB5_1:
        jmp     .LBB5_5
.LBB5_2:
        jmp     .LBB5_4
        mov     rcx, rax
        mov     eax, edx
        mov     qword ptr [rsp + 8], rcx
        mov     dword ptr [rsp + 16], eax
        jmp     .LBB5_2
.LBB5_4:
        mov     rdi, qword ptr [rsp + 8]
        call    _Unwind_Resume@PLT
        ud2
.LBB5_5:
        add     rsp, 24
        ret

First, it decreases the stack pointer, so we know this function uses of local variables. Then it moves the value in edi somewhere. This register contains the argument to do_other_thing_with_value. This argument is moved to rsp + 4 which is four bytes "up" on the stack. Next, the location of the MyTrait::value function is loaded into the rax register. Then, a pointer to the argument we just placed on the stack is loaded into the rdi register and finally the function from rax is called. Now, we see multiple labels. If everything went right, the program should jump to .LBB5_1 which then jumps to .LBB5_5 where the stac pointer is reset and the function exits. The other two labels are used in case of an error.

Below this code, you will see another almost identical function, just for u32 instead of i32 - this is again Rust's monomorphization at work. You may remember that the &dyn Trait version only generated one function in assembly. This is because &dyn Trait is not a placeholder like impl Trait. The compiler doesn't have to figure out how to deal with different concrete types implementing the trait, because it gets an object of type &dyn Trait directly. After this, only the main function and some data follows:

example::main:
        push    rax
        lea     rdi, [rip + .L__unnamed_1]
        lea     rsi, [rip + .L__unnamed_2]
        call    qword ptr [rip + example::do_thing_with_value@GOTPCREL]
        xor     edi, edi
        call    qword ptr [rip + example::do_other_thing_with_value@GOTPCREL]
        lea     rdi, [rip + .L__unnamed_1]
        lea     rsi, [rip + .L__unnamed_3]
        call    qword ptr [rip + example::do_thing_with_value@GOTPCREL]
        xor     edi, edi
        call    qword ptr [rip + example::do_other_thing_with_value@GOTPCREL]
        pop     rax
        ret

.L__unnamed_1:
        .zero   4

.L__unnamed_2:
        .quad   core::ptr::drop_in_place<u32>
        .quad   4
        .quad   4
        .quad   <u32 as example::MyTrait>::value

.L__unnamed_3:
        .quad   core::ptr::drop_in_place<i32>
        .quad   4
        .quad   4
        .quad   <i32 as example::MyTrait>::value

By now, you should be able to tell what most of the main function does - it loads some arguments and calls some functions. Something that has not happened before are the xor edi, edi parts. These are a clever way of zeroing registers, since any value xor-ed with itself will be zero.

The last thing here are the .L__unnamed_* sections. These are the variables and the vtables for the do_thing_with_value(&0u32) and do_thing_with_value(&0i32) calls - and they're the reason Rust only needs to generate one implementation for do_thing_with_value. The vtables contain the locations of the functions required by our trait. When we call do_thing_with_value, this is what happens:

.L__unnamed_1 (the actual value - 32 bits of zeroes) is loaded into rdi
.L__unnamed_2 (the vtable) is loaded into rsi
The do_thing_with_value function is called
The function at the location of the vtable with an offset of 24 bytes (which in this case is <u32 as example::MyTrait>::value) is called
<u32 as example::MyTrait>::value writes its return value into eax

So any function dealing with a trait object only needs to know the layout of its vtable, not the specifics of the concrete type.

What makes a trait object-safe?

If you read the chapter on trait objects in the Rust book, you'll find that a trait needs to be object-safe if you want to make it into a trait object. There are two rules for this:

The return type isn’t Self.
There are no generic type parameters.

With our knowledge about how trait objects work internally, we can figure out the reason behind these. Let's start with the first one: Self is not a type. It's a stand-in for a concrete type. When a function using a trait object calls the implementation of a trait method, it needs to know what that method is going to return. Why does the same not apply to the &self reference all methods take as an argument (also called the "receiver")? Since the concrete implementation that is called depends on the type of &self, we can guarantee that it can handle a reference of type &Self.

The second rule is explained by Rust's use of monomorphization: For each type implementing some trait with generics, you could theoretically generate monomorphized variants of each of the trait's methods, but that would in most cases require a huge number of concrete implementations, driving the size of your program way up. You can of course circumvent this by using a trait object instead of a generic if your context allows it.

The last rule is actually not listed in the Rust book, but one that I ran across on an actual project: All functions in an object-safe trait need to have a receiver. Actually, I mentioned this kind of as a given in an earlier paragraph - but why is a receiver required? I was actually stumped on this one, so I tried building an example and letting rustc tell me what I was doing wrong:

trait Trait {
    fn doesnt_work();
}

fn do_something(_: Box<dyn Trait>) {   
}

This minimal example gave me the following error:

error[E0038]: the trait `Trait` cannot be made into an object
 --> src/main.rs:5:20
  |
5 | fn do_something(_: Box<dyn Trait>) {
  |                    ^^^^^^^^^^^^^^ `Trait` cannot be made into an object
  |
note: for a trait to be "object safe" it needs to allow building a vtable to allow the call to be resolvable dynamically; for more information visit <https://doc.rust-lang.org/reference/items/traits.html#object-safety>
 --> src/main.rs:2:8
  |
1 | trait Trait {
  |       ----- this trait cannot be made into an object...
2 |     fn doesnt_work();
  |        ^^^^^^^^^^^ ...because associated function `doesnt_work` has no `self` parameter
help: consider turning `doesnt_work` into a method by giving it a `&self` argument
  |
2 |     fn doesnt_work(&self);
  |                    ^^^^^
help: alternatively, consider constraining `doesnt_work` so it does not apply to trait objects
  |
2 |     fn doesnt_work() where Self: Sized;
  |                      ^^^^^^^^^^^^^^^^^

error: aborting due to previous error

For more information about this error, try `rustc --explain E0038`.

Okay, so the compiler really just tells me "This isn't allowed" without giving a clear reason. But then it recommends the rustc --explain subcommand. I'm not a huge fan of looking at unparsed Markdown in my terminal, so I'll instead go to Rust's compiler error index and search for E0038. And there you will see exactly what we were looking for: "Method has no receiver". As the site explains, not having a receiver could lead into a scenario where it is impossible to pick an implementation - good to know!

Specifying Trait Bounds

The focus of this article were trait objects, but they are (as you have seen) not the only way to specify which trait(s) a type must implement. The other two main ways are the impl Trait syntax you already saw and trait bounds on generics. All of these have their advantages and disadvantages, so I want to compare what they allow you to do.

Using a trait object generates the least amount of assembly, as it can rely on the vtables having the same layout. It also is a single type, so you can have a Vec<Box<dyn Trait>>, whereas a Vec<impl Trait> is not allowed - the Rust compiler cannot find one single type for all the possible items. However, the trait object approach will restrict you to just one trait. You cannot have a Box<dyn TraitOne + TraitTwo>².

Trait bounds on generics (e.g. fn<T>(x: T) where T: Copy) are probably the most common option. They generate more assembly due to monomorphization, but allow more granular control (e.g. fn<T>(a: T, b: T) where T: Trait versus fn(a: impl Trait, b: impl Trait) makes sure that a and b have the same type) and allows for multiple trait bounds.

In most cases, the advantages and disadvantages of impl Trait are the same as the previous option, but as is pointed out in this article they are great for describing anonymous types like closures.

Wrapping up

Trait objects are a convenient feature in Rust and I hope that after reading this article you have a slightly better idea of how they are implemented and why they have the limitations they do. Different problems lend themselves to different solutions, so try to experiment and see which of the options we explored here leads to the most natural code. Happy coding!

I was planning on using the assembly generated by rustc --emit asm, but even after demangling and making the program no_std, it was way too much to ever get a proper overview, so I settled for the less reproducible option of Compiler Explorer. ↩
With the exception of auto traits like Send and Sync. ↩

A Soil VM for the Linux Kernel

One of the Linux kernel features that have gained the most traction in the last few years is probably (e)BPF. Originally, the "Berkeley Packet Filter" was intended as a means of filtering network packets in kernel mode. However, BPF quickly developed into a fully-featured VM used for all kinds of purposes. The appeal of BPF is not hard to see: It allows you to load kernel mode code at system runtime (similar to kernel modules) while keeping some degree of sandboxing and fault tolerance afforded by the VM. It is much more difficult to break your kernel with a BPF program than with a regular kernel module. One of the most prominent current users of BPF is sched_ext, a framework for writing scheduler implementations in BPF. This lets you easily tinker with your scheduler and see results live and without the risk of breaking your kernel if your implementation crashes....

2024-09-05 #soil #c #linux #kernel

Share this article: How do Trait Objects work?