As all good deep dives do, this article starts with a slightly contrived code example. Let's say, I have this file containing a number and want to read that. Here's a (very) naive Rust implementation:
fn read_numbers_from_file(path: String) -> u32 {
std::fs::read_to_string(path).unwrap().parse().unwrap()
}
And this works, as long as my path is correct and all the numbers are well-behaved (e.g. no trailing whitespace).
If that's not the case, it'll panic and take the problem with it. So, let's fix the function to return a Result
:
fn read_number_from_file(path: String) -> Result<u32, std::io::Error> {
Ok(std::fs::read_to_string(path)?.parse().unwrap)
}
And this looks better, but we only covered one possible error. The reason for that is rather simple: read_to_string
and u32::parse
return different error types, but our Result
can only return one... Or can it? This is where trait
objects make their entrance:
fn read_number_from_file(path: String) -> Result<u32, Box<dyn std::error::Error>> {
Ok(std::fs::read_to_string(path)?.parse()?)
}
By returning a Box<dyn std::error::Error>>
, we can use the ?
operator on both of our Result
s.
This is Rust's way of implementing polymorphism: Both kinds of error we're dealing with here implement the Error
trait.
If we want to treat them both the same way, we can treat them as if they only implemented this trait. Since we only know
that we're going to return some type implementing Error
, but not how big it is, we need to add a pointer as a
layer of indirection (For some background, see my article on pointers).
How to trait your objects
But how does all of this work under the hood? Let's build a toy example:
pub trait MyTrait {
fn value(&self) -> u32;
}
impl MyTrait for u32 {
fn value(&self) -> u32 {
*self
}
}
impl MyTrait for i32 {
fn value(&self) -> u32 {
*self as u32
}
}
pub fn do_thing_with_value(v: &dyn MyTrait) {
v.value();
}
pub fn do_other_thing_with_value(v: impl MyTrait) {
v.value();
}
pub fn main() {
do_thing_with_value(&0u32);
do_other_thing_with_value(0u32);
do_thing_with_value(&0i32);
do_other_thing_with_value(0i32);
}
We have a trait MyTrait
that is implemented for u32
and i32
and we have two functions that take an argument
that implements this trait using two different syntax variants. You can also see I switched from Box<dyn Trait>
to &dyn Trait
. As a reference is also a pointer, it will work here and it has the advantage of making the
assembly code generated on Compiler Explorer1 much more readable. Did I mention we were going
to look at assembly code? I probably should have.
core::ptr::drop_in_place<i32>:
ret
core::ptr::drop_in_place<u32>:
ret
<u32 as example::MyTrait>::value:
mov eax, dword ptr [rdi]
ret
<i32 as example::MyTrait>::value:
mov eax, dword ptr [rdi]
ret
example::do_thing_with_value:
push rax
call qword ptr [rsi + 24]
pop rax
ret
example::do_other_thing_with_value:
sub rsp, 24
mov dword ptr [rsp + 4], edi
mov rax, qword ptr [rip + <i32 as example::MyTrait>::value@GOTPCREL]
lea rdi, [rsp + 4]
call rax
jmp .LBB5_1
.LBB5_1:
jmp .LBB5_5
.LBB5_2:
jmp .LBB5_4
mov rcx, rax
mov eax, edx
mov qword ptr [rsp + 8], rcx
mov dword ptr [rsp + 16], eax
jmp .LBB5_2
.LBB5_4:
mov rdi, qword ptr [rsp + 8]
call _Unwind_Resume@PLT
ud2
.LBB5_5:
add rsp, 24
ret
example::do_other_thing_with_value:
sub rsp, 24
mov dword ptr [rsp + 4], edi
mov rax, qword ptr [rip + <u32 as example::MyTrait>::value@GOTPCREL]
lea rdi, [rsp + 4]
call rax
jmp .LBB6_1
.LBB6_1:
jmp .LBB6_5
.LBB6_2:
jmp .LBB6_4
mov rcx, rax
mov eax, edx
mov qword ptr [rsp + 8], rcx
mov dword ptr [rsp + 16], eax
jmp .LBB6_2
.LBB6_4:
mov rdi, qword ptr [rsp + 8]
call _Unwind_Resume@PLT
ud2
.LBB6_5:
add rsp, 24
ret
example::main:
push rax
lea rdi, [rip + .L__unnamed_1]
lea rsi, [rip + .L__unnamed_2]
call qword ptr [rip + example::do_thing_with_value@GOTPCREL]
xor edi, edi
call qword ptr [rip + example::do_other_thing_with_value@GOTPCREL]
lea rdi, [rip + .L__unnamed_1]
lea rsi, [rip + .L__unnamed_3]
call qword ptr [rip + example::do_thing_with_value@GOTPCREL]
xor edi, edi
call qword ptr [rip + example::do_other_thing_with_value@GOTPCREL]
pop rax
ret
.L__unnamed_1:
.zero 4
.L__unnamed_2:
.quad core::ptr::drop_in_place<u32>
.quad 4
.quad 4
.quad <u32 as example::MyTrait>::value
.L__unnamed_3:
.quad core::ptr::drop_in_place<i32>
.quad 4
.quad 4
.quad <i32 as example::MyTrait>::value
That's quite a bit of assembly and might look a bit intimidating, so let's got through it bit by bit. First, we have this:
core::ptr::drop_in_place<i32>:
ret
core::ptr::drop_in_place<u32>:
ret
A quick look at the Rust docs shows us that core::ptr::drop_in_place
is a function that can be used to manually drop a pointer. The Rust compiler adds them in to use as destructors for our trait objects.
As we have two implementations of our trait, we need two concrete drop_in_place
implementations for u32
and i32
.
This is the first time we see Rust deal with generics by using Monomorphization:
We use the function with two concrete types in place of the generic, so Rust generates two concrete implementations. Afer that, we see our
trait implementations:
<u32 as example::MyTrait>::value:
mov eax, dword ptr [rdi]
ret
<i32 as example::MyTrait>::value:
mov eax, dword ptr [rdi]
ret
These functions actually seem to do something! Even if you're unfamiliar with x86 assembly, you can probably guess that mov
stands for "move".
In intel syntax (one of the ways to write x86 assembly), the destination is the first argument and the source the last. So we're moving something
from dword ptr [rdi]
to eax
. A quick google search tells us that dword ptr
is
a "size directive" to only use 32 bits of the rdi
register. Aha, so rdi
is a register! Another trip to your search engine of choice
leads us to a description of all the register and confirms the suspicion that eax
could also be one. Now we can piece together what the trait
implementations do exactly: They take the lower 32 bits of the rdi
register and move them into eax
. But why those specific registers?
Because of calling conventions! Calling conventions tell us that for a function call
we should put the arguments in specific registers and read the return value from a specific register. So rdi
must contain the reference to self
that our function takes. Mystery solved! Our next bit of code is this:
example::do_thing_with_value:
push rax
call qword ptr [rsi + 24]
pop rax
ret
You can probably guess what it is. It starts by saving the current value of the rax
register on the stack. The eax
register from before is really just
the lower 32 bits of the rax
register. Then it calls some function that has an address 24 bytes offset from the value of the rsi
register (don't worry,
we'll find out what that is later). Finally, it writes the last thing on the stack back into rax
. Pushing and popping rax
is really just cleanup around
the call
instruction. Next, we get this (slightly longer) assembly for our function using the impl Trait
argument syntax:
example::do_other_thing_with_value:
sub rsp, 24
mov dword ptr [rsp + 4], edi
mov rax, qword ptr [rip + <i32 as example::MyTrait>::value@GOTPCREL]
lea rdi, [rsp + 4]
call rax
jmp .LBB5_1
.LBB5_1:
jmp .LBB5_5
.LBB5_2:
jmp .LBB5_4
mov rcx, rax
mov eax, edx
mov qword ptr [rsp + 8], rcx
mov dword ptr [rsp + 16], eax
jmp .LBB5_2
.LBB5_4:
mov rdi, qword ptr [rsp + 8]
call _Unwind_Resume@PLT
ud2
.LBB5_5:
add rsp, 24
ret
First, it decreases the stack pointer, so we know this function uses of local variables. Then it moves the value in edi
somewhere.
This register contains the argument to do_other_thing_with_value
. This argument is moved to rsp + 4
which is four bytes "up" on
the stack. Next, the location of the MyTrait::value
function is loaded into the rax
register. Then, a pointer to the argument we
just placed on the stack is loaded into the rdi
register and finally the function from rax
is called. Now, we see multiple labels.
If everything went right, the program should jump to .LBB5_1
which then jumps to .LBB5_5
where the stac pointer is reset and the
function exits. The other two labels are used in case of an error.
Below this code, you will see another almost identical function, just for u32
instead of i32
- this is again Rust's monomorphization
at work. You may remember that the &dyn Trait
version only generated one function in assembly. This is because &dyn Trait
is not a
placeholder like impl Trait
. The compiler doesn't have to figure out how to deal with different concrete types implementing the trait,
because it gets an object of type &dyn Trait
directly. After this, only the main function and some data follows:
example::main:
push rax
lea rdi, [rip + .L__unnamed_1]
lea rsi, [rip + .L__unnamed_2]
call qword ptr [rip + example::do_thing_with_value@GOTPCREL]
xor edi, edi
call qword ptr [rip + example::do_other_thing_with_value@GOTPCREL]
lea rdi, [rip + .L__unnamed_1]
lea rsi, [rip + .L__unnamed_3]
call qword ptr [rip + example::do_thing_with_value@GOTPCREL]
xor edi, edi
call qword ptr [rip + example::do_other_thing_with_value@GOTPCREL]
pop rax
ret
.L__unnamed_1:
.zero 4
.L__unnamed_2:
.quad core::ptr::drop_in_place<u32>
.quad 4
.quad 4
.quad <u32 as example::MyTrait>::value
.L__unnamed_3:
.quad core::ptr::drop_in_place<i32>
.quad 4
.quad 4
.quad <i32 as example::MyTrait>::value
By now, you should be able to tell what most of the main function does - it loads some arguments and calls some functions.
Something that has not happened before are the xor edi, edi
parts. These are a clever way of zeroing registers, since
any value xor-ed with itself will be zero.
The last thing here are the .L__unnamed_*
sections. These are the variables and the vtables for the do_thing_with_value(&0u32)
and
do_thing_with_value(&0i32)
calls - and they're the reason Rust only needs to generate one implementation for do_thing_with_value
.
The vtables contain the locations of the functions required by our trait. When we call do_thing_with_value
, this is what happens:
.L__unnamed_1
(the actual value - 32 bits of zeroes) is loaded intordi
.L__unnamed_2
(the vtable) is loaded intorsi
- The
do_thing_with_value
function is called - The function at the location of the vtable with an offset of 24 bytes (which in this case is
<u32 as example::MyTrait>::value
) is called <u32 as example::MyTrait>::value
writes its return value intoeax
So any function dealing with a trait object only needs to know the layout of its vtable, not the specifics of the concrete type.
What makes a trait object-safe?
If you read the chapter on trait objects in the Rust book, you'll find that a trait needs to be object-safe if you want to make it into a trait object. There are two rules for this:
- The return type isn’t
Self
. - There are no generic type parameters.
With our knowledge about how trait objects work internally, we can figure out the reason behind these. Let's start with the first one: Self
is
not a type. It's a stand-in for a concrete type. When a function using a trait object calls the implementation of a trait method, it needs to know
what that method is going to return. Why does the same not apply to the &self
reference all methods take as an argument (also called the "receiver")?
Since the concrete implementation that is called depends on the type of &self
, we can guarantee that it can handle a reference of type &Self
.
The second rule is explained by Rust's use of monomorphization: For each type implementing some trait with generics, you could theoretically generate monomorphized variants of each of the trait's methods, but that would in most cases require a huge number of concrete implementations, driving the size of your program way up. You can of course circumvent this by using a trait object instead of a generic if your context allows it.
The last rule is actually not listed in the Rust book, but one that I ran across on an actual project: All functions in an object-safe trait need to have
a receiver. Actually, I mentioned this kind of as a given in an earlier paragraph - but why is a receiver required? I was actually stumped on this one,
so I tried building an example and letting rustc
tell me what I was doing wrong:
trait Trait {
fn doesnt_work();
}
fn do_something(_: Box<dyn Trait>) {
}
This minimal example gave me the following error:
error[E0038]: the trait `Trait` cannot be made into an object
--> src/main.rs:5:20
|
5 | fn do_something(_: Box<dyn Trait>) {
| ^^^^^^^^^^^^^^ `Trait` cannot be made into an object
|
note: for a trait to be "object safe" it needs to allow building a vtable to allow the call to be resolvable dynamically; for more information visit <https://doc.rust-lang.org/reference/items/traits.html#object-safety>
--> src/main.rs:2:8
|
1 | trait Trait {
| ----- this trait cannot be made into an object...
2 | fn doesnt_work();
| ^^^^^^^^^^^ ...because associated function `doesnt_work` has no `self` parameter
help: consider turning `doesnt_work` into a method by giving it a `&self` argument
|
2 | fn doesnt_work(&self);
| ^^^^^
help: alternatively, consider constraining `doesnt_work` so it does not apply to trait objects
|
2 | fn doesnt_work() where Self: Sized;
| ^^^^^^^^^^^^^^^^^
error: aborting due to previous error
For more information about this error, try `rustc --explain E0038`.
Okay, so the compiler really just tells me "This isn't allowed" without giving a clear reason. But then it recommends
the rustc --explain
subcommand. I'm not a huge fan of looking at unparsed Markdown in my terminal, so I'll instead
go to Rust's compiler error index and search for E0038.
And there you will see exactly what we were looking for: "Method has no receiver". As the site explains, not having a receiver
could lead into a scenario where it is impossible to pick an implementation - good to know!
Specifying Trait Bounds
The focus of this article were trait objects, but they are (as you have seen) not the only way to specify which trait(s) a type must implement.
The other two main ways are the impl Trait
syntax you already saw and trait bounds on generics. All of these have their advantages and disadvantages,
so I want to compare what they allow you to do.
Using a trait object generates the least amount of assembly, as it can rely on the vtables having the same layout. It also is a single type, so you
can have a Vec<Box<dyn Trait>>
, whereas a Vec<impl Trait>
is not allowed - the Rust compiler cannot find one single type for all the possible items.
However, the trait object approach will restrict you to just one trait. You cannot have a Box<dyn TraitOne + TraitTwo>
2.
Trait bounds on generics (e.g. fn<T>(x: T) where T: Copy
) are probably the most common option. They generate more assembly due to monomorphization, but
allow more granular control (e.g. fn<T>(a: T, b: T) where T: Trait
versus fn(a: impl Trait, b: impl Trait)
makes sure that a
and b
have the same type)
and allows for multiple trait bounds.
In most cases, the advantages and disadvantages of impl Trait
are the same as the previous option, but as is pointed out in this article
they are great for describing anonymous types like closures.
Wrapping up
Trait objects are a convenient feature in Rust and I hope that after reading this article you have a slightly better idea of how they are implemented and why they have the limitations they do. Different problems lend themselves to different solutions, so try to experiment and see which of the options we explored here leads to the most natural code. Happy coding!
-
I was planning on using the assembly generated by
rustc --emit asm
, but even after demangling and making the programno_std
, it was way too much to ever get a proper overview, so I settled for the less reproducible option of Compiler Explorer. ↩ -
With the exception of auto traits like
Send
andSync
. ↩