Building a Brainfuck Compiler using Cranelift

By Clemens Tiedt

2023-01-09

After studying computer science at a university level for more than four years at the time of writing this article, many concepts and tasks have become demystified for me. The idea of building a website or app seems pretty simple to me now. Even embedded development or Linux kernel modules don't seem like that big of a task. However, compilers still remain a topic I have a huge amount of respect for.

To get a better idea about how compilers work, I wanted to create a simple compiler by myself. Since I already work so much with Rust, I decided to use the Cranelift framework to build it. As for the programming language I wanted to compile, I went with Brainfuck.

Why Brainfuck and Cranelift?

I want to start by explaining why I chose to build a Brainfuck compiler. Brainfuck, if you don't know it, is an esoteric programming language. It only has eight total instructions (explained below) that are all represented by a single character. This makes the parsing step extremely easy and allows me to focus on the compilation itself.

How does Brainfuck work?

Brainfuck is conceptually very similar to a Turing machine. All data is represented in a large byte array (often 30000 elements) and accessed one cell at a time. The current position in the array is called the data pointer.

Command Explanation

> Move the data pointer right by 1 element

< Move the data pointer left by 1 element

+ Increase the value at the data pointer by 1

- Decrease the value at the data pointer by 1

. Print the value at the data pointer interpreted as ASCII (equivalent to putchar)

, Read one character and save its ASCII value at the data pointer (equivalent to getchar)

[ Starts a loop

] If the value at the data pointer is nonzero, jump back to the matching bracket

Should you write any real programs this way? Probably not. But it is fun to see what you can get away with.

Command	Explanation
>	Move the data pointer right by 1 element
<	Move the data pointer left by 1 element
+	Increase the value at the data pointer by 1
-	Decrease the value at the data pointer by 1
.	Print the value at the data pointer interpreted as ASCII (equivalent to `putchar`)
,	Read one character and save its ASCII value at the data pointer (equivalent to `getchar`)
[	Starts a loop
]	If the value at the data pointer is nonzero, jump back to the matching bracket

Overall, Brainfuck is a "real" enough language that it makes sense to write a compiler for it (e.g. there is a "Hello World" example you can find on Wikipedia and that should work with this compiler). So, why use Cranelift to write it? I initially looked into llvm which is also what Rust, clang and many more compilers use. However, I did not find any of the Rust bindings for llvm particularly intuitive. If you need to use llvm with Rust, inkwell is probably your best bet right now. But there are also alternative compiler frameworks. I decided to go with Cranelift since it is a product of the Rust/WASM space. While it does not support all the targets of llvm yet, it is perfectly sufficient for a toy project like this one.

About the format of this writeup

In this article, I want to explain the parts of writing a Cranelift-based compiler that I find interesting or that I would have found helpful when I started this project. I will not go into depth on e.g. how I handle compiler arguments using structopt since that would make this article even longer than it already is and in my opinion result in a pretty bad information-to-noise ratio. If you are interested in these details, you can read the source code linked at the end of the article.

Step One: Parsing

The first step in any compiler is parsing the source code. Basically, you want to transform the plaintext format into a structured format that is easier to work with and that doesn't contain unnecessary data such as comments or whitespace. This is referred to as an Abstract Syntax Tree (AST). As I already explained, Brainfuck only uses single character instructions. So, we can create a Token enum like this:

#[derive(Debug)]
pub enum Token {
    Right,
    Left,
    Increment,
    Decrement,
    Output,
    Input,
    LoopStart,
    LoopEnd,
}

Then, we can match over all the characters in the input and simply discard any characters that aren't instructions as comments. This is another neat thing about Brainfuck: There can be no syntax errors, so we never need to check.

However, we can do one more trick in parsing. If we get an input like >>>, we shouldn't generate three instructions to increase the data pointer, but rather increase the data pointer by 3 in one instruction. This may increase the performance of our compiled Brainfuck code and make the generated assembly easier to read¹.

Step Two: General Setup

The way Cranelift is designed, you compile your AST in two steps. First, you transform it into a (mostly) target-independent Intermediate Representation (IR). This IR can then be compiled to the assembly language of your target and put in a container format - an object file in our case.

We start with some boilerplate: While Cranelift IR is mostly independent of the target Instruction Set Archtitecture (ISA), some parts like calling conventions are still target-dependent. To generate the target ISA representation we use the following code:

let mut shared_builder = settings::builder();
shared_builder.enable("is_pic").unwrap();
let shared_flags = settings::Flags::new(shared_builder);

let isa_builder = isa::lookup(target).unwrap();
let isa = isa_builder.finish(shared_flags).unwrap();
let call_conv = isa.default_call_conv();

The shared_flags represent ISA-independent settings we want to customize for our build. For example, we enable Position-independent code here. We then call the isa::lookup function to create an isa::Builder for our target triple. The target defaults to the host system, but can be changed using a command line parameter. With the builder, we could also set ISA-specific settings, but this is not necessary here. We then create a Box<dyn TargetIsa>, a trait object that contains all the settings for our target architecture including those that we have modified. Finally, we save the calling convention for our ISA because we'll need it later.

What is a Calling Convention?

When you call a function, the compiler will usually insert a call assembly instruction. However, at that point you still need to think about how parameters are passed to the function and how you retrieve the return value(s). This is specified in a calling convention.

We also need to setup the module our compiled code is going to live in. Modules are the containers that code and data live in when using Cranelift. We use ObjectBuilder and ObjectModule from the cranelift_object crate, since we want to generate linkable object files.

let obj_builder =
    ObjectBuilder::new(isa, "main", cranelift_module::default_libcall_names()).unwrap();
let mut obj_module = ObjectModule::new(obj_builder);

You can see that the ObjectBuilder needs to know our ISA, a symbol name (the "main" string) and libcall names. Some Cranelift IR instructions don't have simple mappings to assembly instructions on certain target architectures. These need to be replaced with function calls. We simply use the default mapping provided by cranelift_module. The symbol name is for our case arbitrary.

Next, we want to create a function. Due to the (lack of) structure in Brainfuck, we just use one function that all our code is going to live in.

let mut sig = Signature::new(call_conv);
sig.returns.push(AbiParam::new(types::I32));
let fid = obj_module
    .declare_function("main", Linkage::Export, &sig)
    .unwrap();

let mut func = Function::with_name_signature(UserFuncName::user(0, 0), sig);
let mut func_ctx = FunctionBuilderContext::new();
let mut fn_builder = FunctionBuilder::new(&mut func, &mut func_ctx);

We need to declare the function in our module and then define it in Cranelift IR. Our main function should take no arguments and return a 32-bit integer (the exit code of our program). Using the name main and making the symbol visible, we declare it in the module. Then, we create a Cranelift IR function with a name and the already created signature. For the name, we use a UserFuncName with namespace and index both set to zero. These numbers are only meant for us and are not used by Cranelift, so we can use arbitrary values here. The exported name is how other programs will find our main function. Then, we create a FunctionBuilder. This is the object that will do most of the work of generating Cranelift IR for us. Before we can translate the Brainfuck tokens into instructions in this shiny, new function, we still need to do some more preparation.

let mem_array = obj_module
    .declare_data("mem", Linkage::Local, true, false)
    .unwrap();
let mut data_ctx = DataContext::new();
data_ctx.define_zeroinit(30000);
obj_module.define_data(mem_array, &data_ctx).unwrap();
let local_data = obj_module.declare_data_in_func(mem_array, fn_builder.func);
let pointer = obj_module.target_config().pointer_type();

As you may remember, Brainfuck uses one large array as its memory. In our module, we declare a data object of 30000 bytes and define it to be zero-initialized. We then need to make it accessible to our function. We also save the pointer type of our platform in a variable, since we'll need it to access the memory later. At this point, we're still not done with our setup! As described in the Brainfuck instruction table, the . and , instructions are equivalent to the putchar and getchar C functions. We need to tell Cranelift that we want to import them.

let mut putchar_sig = obj_module.make_signature();
putchar_sig.params.push(AbiParam::new(types::I8));
let putchar_fn = obj_module
    .declare_function("putchar", Linkage::Import, &putchar_sig)
    .unwrap();
let local_putchar = obj_module.declare_func_in_func(putchar_fn, fn_builder.func);

We define putchar similary to our own main function. However, since it is only linked in and not built by us, we need to import it in the object module. We then need to make it accessible to our main function. The code for getchar is very similar, so I'll omit it here.

Now, we can finally start writing code into our function:

let block = fn_builder.create_block();
fn_builder.switch_to_block(block);
fn_builder.seal_block(block);

let mem = fn_builder.ins().global_value(pointer, local_data);
let zero_ptr = fn_builder.ins().iconst(pointer, 0);
let offset = Variable::new(1);
fn_builder.declare_var(offset, pointer);
fn_builder.def_var(offset, zero_ptr);
let mut jumps = Vec::new();

Code in Cranelift is organized into Blocks . A block is a set of instructions without any branches except at the beginning and end. By sealing the block, we tell Cranelift that there will be no further branches to this block (which we know here, since it's the first block). We then create some variables and values we will need later. mem points to the array we created earlier and makes it accessible via a pointer. offset is the variable we will use as our data pointer and initialized to zero. We also need a table of jumps to manage loops later.

Step 3: Instruction Translation

With the setup inside the function done, we can now generate Cranelift IR for our Brainfuck instructions. We do this by iterating over our generated tokens (our pseudo-AST, so to speak) and matching over each token. I want to start by talking about the > and < instructions.

// The offset is loaded outside the match statement
// because most instructions need to use it
let offset_val = fn_builder.use_var(offset);
match instr {
    Token::Right(count) => {
        // Since `offset` is a pointer, we need to convert it to a Rust i64 here
        let to_add = fn_builder.ins().iconst(pointer, *count as i64); 
        let new_offset = fn_builder.ins().iadd(offset_val, to_add);
        fn_builder.def_var(offset, new_offset);
    }
    // Other match arms omitted
}

As described in the parsing section, we combine repeated instructions. To update the offset variable, we need three steps:

Create a Cranelift IR value for the amount of cells to move
Add the number of repetitions to the existing offset
Save the result back into the offset

As you can see, this is exactly what the above code does. We load or create both operands, add them together and redefine the offset variable. Calling fn_builder.ins() gives us an object that implements the InstBuilder trait and can add instructions to the current block. The Token::Left case looks very similar, but inserts an isub instruction instead².

Next, let's look at the implementation of Token::Increment.

Token::Increment(count) => {
    let mem_offset = fn_builder.ins().iadd(mem, offset_val);
    let val = fn_builder.ins().load(
        types::I8,
        MemFlags::new(),
        mem_offset,
        Offset32::new(0),
    );
    let to_add = fn_builder.ins().iconst(types::I8, *count as i64);
    let val = fn_builder.ins().iadd(val, to_add);
    fn_builder
        .ins()
        .store(MemFlags::new(), val, mem_offset, Offset32::new(0));
}

This one is a bit longer, but in many ways similar to Token::Left and Token::Right. First, we calculate the location of the current data cell by adding the offset to the location of our global memory. Then, we load one byte (an I8) from the calculated location. The addition works the same as before. Finally, we need to store the new value in the same location. Once again, Token::Decrement looks the same, except it uses an isub instruction.

After these instructions, let's see how input and output work.

Token::Output => {
    let mem_offset = fn_builder.ins().iadd(mem, offset_val);
    let val = fn_builder.ins().load(
        types::I8,
        MemFlags::new(),
        mem_offset,
        Offset32::new(0),
    );
    fn_builder.ins().call(local_putchar, &[val]);
}
Token::Input => {
    let mem_offset = fn_builder.ins().iadd(mem, offset_val);
    let res = fn_builder.ins().call(local_getchar, &[]);
    let val = fn_builder.inst_results(res)[0];
    fn_builder
        .ins()
        .store(MemFlags::new(), val, mem_offset, Offset32::new(0));
}

For the output, we once again need to load the value of the current cell. We then insert a call instruction to call the putchar function we defined earlier with val as the argument. Note that Cranelift would let us call putchar with any number of arguments here! We could give it a wrong number of arguments and this would only be caught at the compilation stage.

For Token::Input the implementation looks sort of like the reverse of Token::Input. We call getchar without any arguments and load the first (and here, the only) return value. We then store it at the current cell. We also don't do any error handling yet. This might be a project for the future.

Finally, we have the implementation of loops. These were the most difficult part for me as they require the use of jumps and breaks which in turn need additional blocks.

Token::LoopStart => {
    let next_block = fn_builder.create_block();
    fn_builder.ins().jump(next_block, &[]);
    jumps.push(next_block);
    fn_builder.switch_to_block(next_block);
}
Token::LoopEnd => {
    let mem_offset = fn_builder.ins().iadd(mem, offset_val);
    let val = fn_builder.ins().load(
        types::I8,
        MemFlags::new(),
        mem_offset,
        Offset32::new(0),
    );
    let jump_target = jumps.pop().unwrap();
    fn_builder.ins().brnz(val, jump_target, &[]);
    let next_block = fn_builder.create_block();
    fn_builder.ins().jump(next_block, &[]);
    fn_builder.switch_to_block(next_block);
}

When we enter a loop, we want to create a new block. Here, you can think of a block in a similar way to a label for goto: We need the block to remember where to jump back to if we don't want to exit the loop. Therefore, we save the new block on a stack to access it later.

Once we encounter the end of a loop, we need to check the value of the current cell. If it is zero, we move out of the loop, otherwise we return to the beginning of the loop. So, we need the matching Token::LoopStart, or rather the matching block from our jumps stack. We want to branch to this block if the current cell is non-zero, so we use a brnz instruction. However, this forces us to create a new block for the non-branching path as well since branches and jumps are only allowed at the end of a block. So, the non-branching path will jump to the new block. Finally, our function needs to return.

let val = fn_builder.ins().iconst(types::I32, 0);
fn_builder.ins().return_(&[val]);
fn_builder.seal_all_blocks();
fn_builder.finalize();

At the moment, we will always return with code 0, indicating no error. We can now seal all blocks in our function since we know there will be no further jumps to them and finalize the FunctionBuilder, finishing the generation of Cranelift IR.

Step Four: Generating an Object File

We are almost finished now: We just need to compile our Cranelift IR and put it into an object file.

let mut context = Context::for_function(func);
obj_module.define_function(fid, &mut context).unwrap();
let res = obj_module.finish();

let mut file = File::create(output).unwrap();
res.object.write_stream(&mut file).unwrap();

We can now define the main function in the object module that we declared earlier. By calling ObjectModule::finish, we compile our generated Cranelift IR and can write the result into a file.

Step Five: Linking

We now have a binary object file. However, it refers to the external putchar and getchar functions that we couldn't import yet. In order to use them, we need to link our object file. Using a linker of our choice (like ld on Linux and link.exe on Windows), we can link a C runtime that provides the necessary startup code and functions. On Linux, this is usually /lib/crt1.o and on Windows we can use the Universal C Runtime ucrt.lib. The repository provides scripts to invoke these linkers.

This is where using the proper calling convention matters: Since our program and the precompiled C runtime use the same calling convention, they can understand how to call each other's functions even though they're written in different programming languages.

What I learned

At a total of about 320 lines of code, my Brainfuck compiler is relatively compact. In fact, the compile function where most of the magic happens is only about 160 lines. The main difficulty in this project was the lack and fragmentation of information. Cranelift is still a relatively young project, so there are only relatively few guides on how to use it. The toy JIT compiler was a very helpful resource, along with crate documentation and source code. However, Cranelift is split across multiple crates which can make finding the information you're looking for more difficult than it needs to be. Once I had a general idea of how the different crates worked together, it mainly became a matter of putting the right parts together and I quite enjoyed using Cranelift for this project.

In the future, I also want to look at debugging, since most of my debugging so far consisted of reading Cranelift IR and essentially running the code by hand. This is not a viable technique for more complex compilers, so I'm very interested in seeing how I can integrate something like the DWARF debugging format into my compiler.

For now, you can find the project on my GitHub Page.

I haven't actually profiled this. Maybe a modern CPU can optimize all these identical instructions really well. However, the point about assembly readability is really important to me: A lot of my time debugging this project was spent reading Cranelift IR or x86 assembly. ↩
This similarity indicates that it might make sense to use one instruction for both the left and right case. I chose not to do this here, because it would make parsing the input more complicated and I wanted to focus on the compilation. ↩

A Soil VM for the Linux Kernel

One of the Linux kernel features that have gained the most traction in the last few years is probably (e)BPF. Originally, the "Berkeley Packet Filter" was intended as a means of filtering network packets in kernel mode. However, BPF quickly developed into a fully-featured VM used for all kinds of purposes. The appeal of BPF is not hard to see: It allows you to load kernel mode code at system runtime (similar to kernel modules) while keeping some degree of sandboxing and fault tolerance afforded by the VM. It is much more difficult to break your kernel with a BPF program than with a regular kernel module. One of the most prominent current users of BPF is sched_ext, a framework for writing scheduler implementations in BPF. This lets you easily tinker with your scheduler and see results live and without the risk of breaking your kernel if your implementation crashes....

2024-09-05 #soil #c #linux #kernel

Share this article: Building a Brainfuck Compiler using Cranelift