A Soil VM for the Linux Kernel

By Clemens Tiedt

2024-09-05
#soil
#c
#linux
#kernel


One of the Linux kernel features that have gained the most traction in the last few years is probably (e)BPF. Originally, the "Berkeley Packet Filter" was intended as a means of filtering network packets in kernel mode. However, BPF quickly developed into a fully-featured VM used for all kinds of purposes. The appeal of BPF is not hard to see: It allows you to load kernel mode code at system runtime (similar to kernel modules) while keeping some degree of sandboxing and fault tolerance afforded by the VM. It is much more difficult to break your kernel with a BPF program than with a regular kernel module. One of the most prominent current users of BPF is sched_ext, a framework for writing scheduler implementations in BPF. This lets you easily tinker with your scheduler and see results live and without the risk of breaking your kernel if your implementation crashes.

All of this made me curious about what it would take to put my own VM into the Linux kernel. To be entirely honest, there is no practical use to this project. I have no intentions or delusions of replacing the BPF ecosystem, I'm just doing this for fun. The first step, then, would be to develop my own VM - a non-trivial task on its own. Luckily, I can use the Soil VM developed by my friend Marcel. He developed it because he wanted a lightweight VM for his programming language Martinaise. Soil is a relatively low-level VM whose instruction set maps well to x86 assembly. Importantly for me, there also exists a C implementation that I can mostly reuse for my in-kernel VM.

The basic architecture

Before I could think about any Soil specifics, I had to come up with a general architecture for the project. If you want to get your code running in the Linux kernel, there are two approaches:

I decided in favor of an out-of-tree module for Soil. I have some previous experience working with kernel modules and I honestly just didn't feel like building a custom kernel. However, this means that I can't go the BPF route of using a system call to interact with the Soil VM. Instead, I opted for an IOCTL-based interface.

What is an IOCTL?

On a very abstract level, IO is simple: You can read from or write to a device. In practice, it tends to be more complicated. Besides exchanging data, you often have to deal with control interfaces. For example, if you are dealing with GPIO, you first need to configure the pins you need as input or output. The Linux GPIO driver separates this configuration procedure from regular data exchange using IOCTLs. IOCTL stands for "IO Control" and represents an operation related to a device's configuration. If you have a file descriptor to a device, you can use the ioctl system call on it with the correct IOCTL number and (if required) arguments to trigger a control operation. For example, to request a GPIO line (i.e. a pin), you can use the following IOCTL (definition taken from here):

struct gpio_v2_line_info {
	char name[GPIO_MAX_NAME_SIZE];
	char consumer[GPIO_MAX_NAME_SIZE];
	__u32 offset;
	__u32 num_attrs;
	__aligned_u64 flags;
	struct gpio_v2_line_attribute attrs[GPIO_V2_LINE_NUM_ATTRS_MAX];
	/* Space reserved for future use. */
	__u32 padding[4];
};

#define GPIO_V2_GET_LINEINFO_IOCTL _IOWR(0xB4, 0x05, struct gpio_v2_line_info)

In theory, IOCTLs are identified by arbitrary 32-bit integers. In practice, there are conventions to describe an IOCTL's behavior. The _IOWR macro indicates that this IOCTL both reads from and writes to the device. Its first parameter is a magic number associated with the specific device driver. You can think of the second as a driver-internal IOCTL number. You would run into trouble if you tried to use the IOCTL number 5 globally, but by combining it with the driver's magic number it becomes unique. Finally, the IOCTL definition contains the type of the IOCTL's parameter (if it has one). This is relevant because the function handling the IOCTL on the kernel side only receives an opaque pointer to the parameter. While there is no physical Soil device, I could still use IOCTLs for this project. As I mentioned earlier, kernel modules cannot define syscalls. However, by creating a virtual device, I could define IOCTLs to provide a userspace interface to the VM running in the kernel.

An alternative may have been a sysfs-based interface. Its manual page describes sysfs as a "filesystem for exporting kernel objects". The issue here is that the Soil VM is not really a kernel object (I'll get to the VM's internals in a bit). Also, the kernel documentation recommends IOCTLs as an alternative to writing your own system calls and there seems to be more information available online on the IOCTL-based approach than the sysfs one. On an abstract level, the interface should look like this:

This meant that I had to figure out how to create a device.

Module Setup and Creating a Character Device

The most basic Linux kernel module (courtesy of The Linux Kernel Module Programming Guide) looks like this:

/* 

 * hello-1.c - The simplest kernel module. 

 */ 
#include <linux/module.h> /* Needed by all modules */
#include <linux/printk.h> /* Needed for pr_info() */

int init_module(void)
{
    pr_info("Hello world 1.\n"); 
    
    /* A non 0 return means init_module failed; module can't be loaded. */ 
    return 0;

}

void cleanup_module(void)
{
    pr_info("Goodbye world 1.\n");
}

MODULE_LICENSE("GPL");

In order to build this, you will also need a Makefile:

obj-m += hello-1.o 

PWD := $(CURDIR) 

all: 
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules 

clean: 
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

Running make will create a file hello-1.ko which you can load into your kernel using the command insmod hello-1.ko1. Once you are done, you can unload the module using rmmod hello-1. If you check your dmesg output, you will see the "Hello world 1" and "Goodbye world 1" messages. Within this basic framework, I now needed to create a new character device.

The Linux kernel distinguishes between block-based and character-based IO. For example, hard drives are block-based: The data on the device is arranged in blocks (sectors in HDD parlance). Therefore, you can only read and write in fixed-size blocks. Character devices on the other hand communicate one byte at a time. For example, serial adapters are usually character-based. I didn't actually want to implement any read/write operations on my virtual Soil device, so this distinction wasn't particularly important here. However, since block devices are strongly tied into the mechanics of filesystems, and it is generally easier to build a new character device, I opted for a character device.

The Linux kernel has many interfaces to efficiently handle character devices, most of which are not relevant to this use case, so I used the very basic register_chrdev function to create a new device2. This function takes three parameters: A major number, a name and a struct file_operations vtable. Let's go through these in order.

Linux identifies devices through a major and minor number. The major number is associated with the device driver (the magic number shown earlier in the IOCTL definition), whereas the minor number describes a specific device instance. For example, on my system the major number 4 seems to describe the TTY driver with the different TTY instances using minor numbers 0 through 64. The device name is not particularly important to the kernel, it just provides a human-readable name for your driver (although this has nothing to do with the device file in /dev yet!). Finally, most of the magic happens in the file_operations vtable. A vtable is a structure that contains function pointers, a concept that appears quite commonly in the Linux kernel. Here, the vtable describes the operations the Soil device file should support. Userspace applications must be able to open and close the file, as well as send IOCTLs. In the code, it looks like this:

struct soil_program
{
  Byte *program;
  int len;
};

static int handle_open(struct inode *inode, struct file *file)
{
  return 0;
}

static int handle_release(struct inode *inode, struct file *file)
{
  return 0;
}

static long handle_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) {
  if (cmd == SOIL_IOCTL_LOAD)
  {
    struct soil_program prog;
    int res = copy_from_user(&prog, (struct soil_program *)arg, sizeof(struct soil_program));
    if (res != 0)
    {
      printk("Failed to copy param from user\n");
      return 1;
    }
    char program[1024];
    copy_from_user(program, prog.program, prog.len);

    // Start the VM
  }
  return 0;
}

struct file_operations soil_fops = {
  .open = handle_open,
  .release = handle_release,
  .unlocked_ioctl = handle_ioctl,
};

Let's go through this, starting at the bottom. You'll see that the module defines an instance of the struct file_operations vtable. By populating its members with function pointers, I defined which operations my character device supports and how they work. As mentioned above, you should be able to open and close (here called "release") a file descriptor to /dev/soil. Since userspace applications don't need to read from or write to the device directly, I didn't implement these functions. If you tried to use them anyway, the system calls should fail with a return value of EINVAL (invalid argument). The open and release implementations are pretty simple. Since the Soil device isn't an actual device, these calls should always succeed, so the functions simply return 0. The function to handle IOCTLs does the most work here, but I'll get to that later.

Armed with this vtable, I could now register a character device and make it show up in /dev:

struct device *dev_file;
struct class *cls;

static int __init
init_soil_km(void)
{
  int res = register_chrdev(IOC_MAGIC, "soil", &soil_fops);
  if (res != 0)
  {
    pr_alert("Failed to register character device %d\n", IOC_MAGIC);
    return -1;
  }
  cls = class_create("soil");
  dev_file = device_create(cls, NULL, MKDEV(IOC_MAGIC, 0), NULL, "soil");
  return 0;
}

The call to register_chrdev creates a new device that will appear in /proc/devices. However, only after calling device_create it shows up in /dev. A device needs a device class which is set up by the class_create function. For a normal device, the class would carry various kinds of functionality common to all devices of that class, but for Soil I only needed it for formal reasons. If you are curious, you can find more information in the kernel docs. The MKDEV macro takes a major and minor number to identify a specific device, in my case the first minor number after the Soil driver's major number. Note also that cls and dev_file are defined globally because I need to clean them up after a module exit:

static void __exit
exit_soil_km(void)
{
  device_destroy(cls, MKDEV(IOC_MAGIC, 0));
  class_destroy(cls);
  unregister_chrdev(IOC_MAGIC, "soil");
}

With the character device set up, I'm going to return to the function that makes it do anything useful:

#define SOIL_IOCTL_LOAD _IOW(IOC_MAGIC, 0, struct soil_program)

static long handle_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) {
  if (cmd == SOIL_IOCTL_LOAD)
  {
    struct soil_program prog;
    int res = copy_from_user(&prog, (struct soil_program *)arg, sizeof(struct soil_program));
    if (res != 0)
    {
      printk("Failed to copy param from user\n");
      return 1;
    }
    char program[1024];
    copy_from_user(program, prog.program, prog.len);

    // Start the VM
  }
  return 0;
}

The IOCTL handler receives a pointer to the device file an IOCTL was performed on, the IOCTL number and an (optional) argument. Since there is only one /dev/soil at a time, it can ignore the file pointer. However, the IOCTL number and argument are very relevant. As mentioned earlier, you can think of the IOCTL number as analogous to a system call number. If the Soil device supported multiple IOCTLs (and it may in the future!), I would need some way of distinguishing them. Finally, the argument can be used to pass userspace data to the kernel code handling the IOCTL. For Soil, I used this to pass the bytecode to the VM. While arg is declared as unsigned long in the implementation, it is actually a pointer into userspace memory.

An IOCTL is triggered by a system call in a userspace process. Therefore, the kernel code to handle it runs in the context of the calling process. However, you can't just access userspace memory because the page underlying the virtual address might not be mapped. In order to safely access the IOCTL argument, you need to use the copy_from_user function to copy it into kernel memory. As you can see, struct soil_program contains another pointer to the actual bytecode along with the bytecode's length. Since this Byte* (Byte being a typedef to uint8_t) is created by the calling userspace program, it is another userspace pointer which needs to be copied to kernelspace. Before I describe how the module sets up the VM, I want to take a look at the userspace side of things.

Welcome from the User Side

As described earlier, the Soil interface from userspace should at this point be pretty simple. For now, there is only one IOCTL which both loads and runs a program. I set up a basic C program that reads a file from disk and sends the contents to my kernel module. The result looks something like this:

#include <stdio.h>
#include <stdint.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include "soil_common.h"

int main(int argc, char **argv)
{
  if (argc < 2)
  {
    return -1;
  }

  char buf[1024];
  FILE* file = fopen(argv[1], "rb");
  if (file == NULL)
  {
    perror("fopen");
    return -1;
  }
  fseek(file, 0L, SEEK_END);
  size_t len = ftell(file);
  rewind(file);
  fread(buf, sizeof(char), len, file);
 
  int fd = open("/dev/soil", O_RDONLY);
  if (fd < 0)
  {
    perror("open");
    return -1;
  }
  struct soil_program prog = {
    .program = buf,
    .len = len,
  };
  int res = ioctl(fd, SOIL_IOCTL_LOAD, &prog);
  if (res < 0)
  {
    perror("ioctl");
  }
  close(fd);
  return 0;
}

The program initially checks that it received a file path as a command line argument, then opens that file. Using a combination of fseek and related functions, it reads the file's length 3. After reading the bytecode into a buffer, the program opens /dev/soil and constructs a struct soil_program as the IOCTL's parameter. With all of this, it can finally make the ioctl system call. You may be wondering where does this program gets the struct soil_program type and SOIL_IOCTL_LOAD from. In an earlier example, I simplified things a bit. The definitions shared between the user and kernel side live in a shared header soil_common.h.

Aside: Who should load the bytecode?
Initially, I was unsure whether the userspace code should only transmit the bytecode path or the bytecode itself to the kernel. In terms of efficiency, I thought it might be better to have the kernel do the file IO rather than transmitting possibly large bytecode over the syscall boundary. However, the internet seems to agree that doing file IO in the kernel is considered bad practice, so I decided in favor of the current solution.

My kingdom for a VM!

I now had all the plumbing required to load Soil bytecode into a kernel module. All that was missing now was the VM to run it. As I mentioned earlier, a C implementation of the VM already exists, and thankfully it doesn't rely on the standard library too much. Since the C standard library is built atop the interfaces exposed by the Linux kernel, you can't use it in a kernel module. You may have noticed earlier that instead of printf, the module uses functions like pr_alert. This meant that I had to make some changes to the VM implementation.

The Soil VM I am using is not intended to be embedded into other applications. It's a single C file that compiles down to an executable. However, that was not going to stop me. Most of the VM file's main function deals with loading the bytecode and finding out its size (using the method described above). In order to prepare a VM, it calls a function init_vm to reset the VM's registers and load the bytecode into memory. Then, a call to run actually starts the execution.

Looking at init_vm, I quickly found some library calls I had to change: malloc has to be kmalloc4 and the Soil-specific eprintf and panic functions (which are used for debug printing and abnormal exits respectively) use a printf variant that is unavailable in kernel mode. I want to highlight kmalloc here because it looks slightly different from userspacemalloc. Unlike userspace, you have to specify which properties your memory should have. The Soil module can get away with simply using GFP_KERNEL memory, but if you are curious about other options, check the kernel documentation.

There's one more hurdle, and thankfully I already knew about it going in. As it turns out, Soil supports floating point math. In general, floating point is a sensible trade-off in terms of accuracy and necessary for many applications, e.g. in computer graphics. However, for historic reasons, floating point instructions tend to be complex with their own set of registers and state that has to be kept intact. For this reason, you can't use floating point arithmetic in the Linux kernel. I could have worked around this, but for now floating point support is tedious to implement and mostly unnecessary for kernel applications.

This left one more part of the VM I had to deal with: Syscalls. While these share a name with Linux's (and other operating systems') system calls, they serve a slightly different purpose. In an operating system, syscalls are a means to let userspace programs perform privileged operations (e.g. IO) safely. Soil, on the other hand, uses syscalls to manage all interactions with the outside world. For example, if you want to print something to stdout or open/read/write a file, you use a syscall. In order to avoid confusion, I'll use the term "VM call" to distinguish them from Linux's system calls.

Most VM calls are not part of the minimum viable Soil kernel module, so for now only three VM calls are implemented. The first of these is exit. In the userspace implementation, it kills the process by calling the exit function from the C library. However, I don't want to kill the entire kernel when a Soil program exits. Since all the VM state is global, the VM call just sets a flag that tells the fetch-decode-execute loop to stop executing. I also implemented the print and log calls which just call eprintf for now.

With all of this done, the Soil VM should run. Let's try out an example!

Kernelspace Fibonacci

My current Soil VM is pretty bare-bones. Apart from printing to the kernel log and exiting, there is not much it can do in terms of interacting with the outside world. Since I don't have any experience with Martinaise, Marcel provided me with a simple program for testing the VM. So, I can proudly tell you that it is now possible to calculate Fibonacci numbers in the Soil VM in a Linux kernel module. The script excerpt below shows the full set of commands to load the module and run the example file.

[clemens@archlinux soil-km]$ sudo insmod soil.ko
[sudo] password for clemens:
[clemens@archlinux soil-km]$ sudo ./usoil fib.soil
Soil binary `fib.soil` is 127 bytes long.
[clemens@archlinux soil-km]$ sudo dmesg
[    0.000000] Linux version 6.10.6-arch1-1 (linux@archlinux) (gcc (GCC) 14.2.1 20240805, GNU ld (GNU Binutils) 2.43.0) #1 SMP PREEMPT_DYNAMIC Mon, 19 Aug 2024 17:02:39 +0000
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=5bbe9fba-58d9-4ef3-9dcb-01b9c3fc4ba0 rw rootflags=subvol=@ zswap.enabled=0 rootfstype=btrfs loglevel=3 quiet
#
# Lots of output not related to soil...
#
[ 1236.449775] soil: loading out-of-tree module taints kernel.
[ 1236.449781] soil: module verification failed: signature and/or required key missing - tainting kernel
[ 1466.105055] Hello, soil!
[ 1474.259478] cmd = 1074291712, arg = 00000000d6e61c6a
[ 1474.259484] 127
[ 1474.259485] MEM SIZE 1000000

This first part is simple enough - You've seen the commands for loading a kernel module earlier already, and we've looked at the userspace counterpart as well. As a piece of debug information, it prints out the length of the loaded bytecode. Looking at dmesg, you can see that the module was loaded correctly, printing Hello, soil!. Since I'm not signing my module, the kernel complains, but loads the module anyway. You can also see some debug output that gets printed when we handle the IOCTL. cmd is the actual IOCTL number generated by the macros we used and arg is the (userspace) pointer to the bytecode. After copying the bytecode to kernel memory and setting up the VM, it starts executing instructions. If you are curious, the hidden code listing shows the entire tracing output - not particularly spectacular.

The VM's full tracing output
[ 1474.259528] ran d2 ->
[ 1474.259529] ip = 3, sp = f4240, st = 0, a = 1, b = 0, c = 0, d = 0, e = 0, f = 0
[ 1474.259531] ran d2 ->
[ 1474.259532] ip = 6, sp = f4240, st = 0, a = 1, b = 1, c = 0, d = 0, e = 0, f = 0
[ 1474.259533] ran d2 ->
[ 1474.259534] ip = 9, sp = f4240, st = 0, a = 1, b = 1, c = a, d = 0, e = 0, f = 0
[ 1474.259535] ran d2 ->
[ 1474.259536] ip = c, sp = f4240, st = 0, a = 1, b = 1, c = a, d = 1, e = 0, f = 0
[ 1474.259537] ran a1 ->
[ 1474.259538] ip = e, sp = f4240, st = 0, a = 1, b = 1, c = 9, d = 1, e = 0, f = 0
[ 1474.259539] ran d2 ->
[ 1474.259539] ip = 11, sp = f4240, st = 0, a = 1, b = 1, c = 9, d = 0, e = 0, f = 0
[ 1474.259541] ran c0 ->
[ 1474.259542] ip = 13, sp = f4240, st = 9, a = 1, b = 1, c = 9, d = 0, e = 0, f = 0
[ 1474.259543] ran c1 ->
[ 1474.259543] ip = 14, sp = f4240, st = 0, a = 1, b = 1, c = 9, d = 0, e = 0, f = 0
[ 1474.259545] ran f1 ->
[ 1474.259545] ip = 1d, sp = f4240, st = 0, a = 1, b = 1, c = 9, d = 0, e = 0, f = 0
[ 1474.259546] ran d0 ->
[ 1474.259547] ip = 1f, sp = f4240, st = 0, a = 1, b = 1, c = 9, d = 1, e = 0, f = 0
[ 1474.259548] ran a0 ->
[ 1474.259549] ip = 21, sp = f4240, st = 0, a = 1, b = 1, c = 9, d = 2, e = 0, f = 0
[ 1474.259550] ran d0 ->
[ 1474.259551] ip = 23, sp = f4240, st = 0, a = 1, b = 1, c = 9, d = 2, e = 0, f = 0
[ 1474.259552] ran d0 ->
[ 1474.259552] ip = 25, sp = f4240, st = 0, a = 1, b = 2, c = 9, d = 2, e = 0, f = 0
[ 1474.259554] ran f0 ->
[ 1474.259554] ip = 9, sp = f4240, st = 0, a = 1, b = 2, c = 9, d = 2, e = 0, f = 0
[ 1474.259555] ran d2 ->
[ 1474.259556] ip = c, sp = f4240, st = 0, a = 1, b = 2, c = 9, d = 1, e = 0, f = 0
[ 1474.259557] ran a1 ->
[ 1474.259558] ip = e, sp = f4240, st = 0, a = 1, b = 2, c = 8, d = 1, e = 0, f = 0
[ 1474.259559] ran d2 ->
[ 1474.259559] ip = 11, sp = f4240, st = 0, a = 1, b = 2, c = 8, d = 0, e = 0, f = 0
[ 1474.259561] ran c0 ->
[ 1474.259561] ip = 13, sp = f4240, st = 8, a = 1, b = 2, c = 8, d = 0, e = 0, f = 0
[ 1474.259563] ran c1 ->
[ 1474.259563] ip = 14, sp = f4240, st = 0, a = 1, b = 2, c = 8, d = 0, e = 0, f = 0
[ 1474.259564] ran f1 ->
[ 1474.259565] ip = 1d, sp = f4240, st = 0, a = 1, b = 2, c = 8, d = 0, e = 0, f = 0
[ 1474.259566] ran d0 ->
[ 1474.259567] ip = 1f, sp = f4240, st = 0, a = 1, b = 2, c = 8, d = 1, e = 0, f = 0
[ 1474.259568] ran a0 ->
[ 1474.259568] ip = 21, sp = f4240, st = 0, a = 1, b = 2, c = 8, d = 3, e = 0, f = 0
[ 1474.259570] ran d0 ->
[ 1474.259570] ip = 23, sp = f4240, st = 0, a = 2, b = 2, c = 8, d = 3, e = 0, f = 0
[ 1474.259571] ran d0 ->
[ 1474.259572] ip = 25, sp = f4240, st = 0, a = 2, b = 3, c = 8, d = 3, e = 0, f = 0
[ 1474.259573] ran f0 ->
[ 1474.259574] ip = 9, sp = f4240, st = 0, a = 2, b = 3, c = 8, d = 3, e = 0, f = 0
[ 1474.259575] ran d2 ->
[ 1474.259576] ip = c, sp = f4240, st = 0, a = 2, b = 3, c = 8, d = 1, e = 0, f = 0
[ 1474.259577] ran a1 ->
[ 1474.259577] ip = e, sp = f4240, st = 0, a = 2, b = 3, c = 7, d = 1, e = 0, f = 0
[ 1474.259579] ran d2 ->
[ 1474.259579] ip = 11, sp = f4240, st = 0, a = 2, b = 3, c = 7, d = 0, e = 0, f = 0
[ 1474.259580] ran c0 ->
[ 1474.259581] ip = 13, sp = f4240, st = 7, a = 2, b = 3, c = 7, d = 0, e = 0, f = 0
[ 1474.259582] ran c1 ->
[ 1474.259583] ip = 14, sp = f4240, st = 0, a = 2, b = 3, c = 7, d = 0, e = 0, f = 0
[ 1474.259584] ran f1 ->
[ 1474.259585] ip = 1d, sp = f4240, st = 0, a = 2, b = 3, c = 7, d = 0, e = 0, f = 0
[ 1474.259586] ran d0 ->
[ 1474.259586] ip = 1f, sp = f4240, st = 0, a = 2, b = 3, c = 7, d = 2, e = 0, f = 0
[ 1474.259588] ran a0 ->
[ 1474.259588] ip = 21, sp = f4240, st = 0, a = 2, b = 3, c = 7, d = 5, e = 0, f = 0
[ 1474.259589] ran d0 ->
[ 1474.259590] ip = 23, sp = f4240, st = 0, a = 3, b = 3, c = 7, d = 5, e = 0, f = 0
[ 1474.259591] ran d0 ->
[ 1474.259592] ip = 25, sp = f4240, st = 0, a = 3, b = 5, c = 7, d = 5, e = 0, f = 0
[ 1474.259593] ran f0 ->
[ 1474.259593] ip = 9, sp = f4240, st = 0, a = 3, b = 5, c = 7, d = 5, e = 0, f = 0
[ 1474.259595] ran d2 ->
[ 1474.259595] ip = c, sp = f4240, st = 0, a = 3, b = 5, c = 7, d = 1, e = 0, f = 0
[ 1474.259597] ran a1 ->
[ 1474.259597] ip = e, sp = f4240, st = 0, a = 3, b = 5, c = 6, d = 1, e = 0, f = 0
[ 1474.259598] ran d2 ->
[ 1474.259599] ip = 11, sp = f4240, st = 0, a = 3, b = 5, c = 6, d = 0, e = 0, f = 0
[ 1474.259600] ran c0 ->
[ 1474.259601] ip = 13, sp = f4240, st = 6, a = 3, b = 5, c = 6, d = 0, e = 0, f = 0
[ 1474.259602] ran c1 ->
[ 1474.259602] ip = 14, sp = f4240, st = 0, a = 3, b = 5, c = 6, d = 0, e = 0, f = 0
[ 1474.259604] ran f1 ->
[ 1474.259604] ip = 1d, sp = f4240, st = 0, a = 3, b = 5, c = 6, d = 0, e = 0, f = 0
[ 1474.259605] ran d0 ->
[ 1474.259606] ip = 1f, sp = f4240, st = 0, a = 3, b = 5, c = 6, d = 3, e = 0, f = 0
[ 1474.259607] ran a0 ->
[ 1474.259608] ip = 21, sp = f4240, st = 0, a = 3, b = 5, c = 6, d = 8, e = 0, f = 0
[ 1474.259609] ran d0 ->
[ 1474.259610] ip = 23, sp = f4240, st = 0, a = 5, b = 5, c = 6, d = 8, e = 0, f = 0
[ 1474.259611] ran d0 ->
[ 1474.259611] ip = 25, sp = f4240, st = 0, a = 5, b = 8, c = 6, d = 8, e = 0, f = 0
[ 1474.259613] ran f0 ->
[ 1474.259613] ip = 9, sp = f4240, st = 0, a = 5, b = 8, c = 6, d = 8, e = 0, f = 0
[ 1474.259614] ran d2 ->
[ 1474.259615] ip = c, sp = f4240, st = 0, a = 5, b = 8, c = 6, d = 1, e = 0, f = 0
[ 1474.259616] ran a1 ->
[ 1474.259617] ip = e, sp = f4240, st = 0, a = 5, b = 8, c = 5, d = 1, e = 0, f = 0
[ 1474.259618] ran d2 ->
[ 1474.259619] ip = 11, sp = f4240, st = 0, a = 5, b = 8, c = 5, d = 0, e = 0, f = 0
[ 1474.259620] ran c0 ->
[ 1474.259620] ip = 13, sp = f4240, st = 5, a = 5, b = 8, c = 5, d = 0, e = 0, f = 0
[ 1474.259622] ran c1 ->
[ 1474.259622] ip = 14, sp = f4240, st = 0, a = 5, b = 8, c = 5, d = 0, e = 0, f = 0
[ 1474.259623] ran f1 ->
[ 1474.259624] ip = 1d, sp = f4240, st = 0, a = 5, b = 8, c = 5, d = 0, e = 0, f = 0
[ 1474.259625] ran d0 ->
[ 1474.259626] ip = 1f, sp = f4240, st = 0, a = 5, b = 8, c = 5, d = 5, e = 0, f = 0
[ 1474.259627] ran a0 ->
[ 1474.259628] ip = 21, sp = f4240, st = 0, a = 5, b = 8, c = 5, d = d, e = 0, f = 0
[ 1474.259629] ran d0 ->
[ 1474.259629] ip = 23, sp = f4240, st = 0, a = 8, b = 8, c = 5, d = d, e = 0, f = 0
[ 1474.259631] ran d0 ->
[ 1474.259631] ip = 25, sp = f4240, st = 0, a = 8, b = d, c = 5, d = d, e = 0, f = 0
[ 1474.259632] ran f0 ->
[ 1474.259633] ip = 9, sp = f4240, st = 0, a = 8, b = d, c = 5, d = d, e = 0, f = 0
[ 1474.259634] ran d2 ->
[ 1474.259635] ip = c, sp = f4240, st = 0, a = 8, b = d, c = 5, d = 1, e = 0, f = 0
[ 1474.259636] ran a1 ->
[ 1474.259636] ip = e, sp = f4240, st = 0, a = 8, b = d, c = 4, d = 1, e = 0, f = 0
[ 1474.259638] ran d2 ->
[ 1474.259638] ip = 11, sp = f4240, st = 0, a = 8, b = d, c = 4, d = 0, e = 0, f = 0
[ 1474.259640] ran c0 ->
[ 1474.259640] ip = 13, sp = f4240, st = 4, a = 8, b = d, c = 4, d = 0, e = 0, f = 0
[ 1474.259641] ran c1 ->
[ 1474.259642] ip = 14, sp = f4240, st = 0, a = 8, b = d, c = 4, d = 0, e = 0, f = 0
[ 1474.259643] ran f1 ->
[ 1474.259644] ip = 1d, sp = f4240, st = 0, a = 8, b = d, c = 4, d = 0, e = 0, f = 0
[ 1474.259645] ran d0 ->
[ 1474.259645] ip = 1f, sp = f4240, st = 0, a = 8, b = d, c = 4, d = 8, e = 0, f = 0
[ 1474.259647] ran a0 ->
[ 1474.259647] ip = 21, sp = f4240, st = 0, a = 8, b = d, c = 4, d = 15, e = 0, f = 0
[ 1474.259649] ran d0 ->
[ 1474.259649] ip = 23, sp = f4240, st = 0, a = d, b = d, c = 4, d = 15, e = 0, f = 0
[ 1474.259650] ran d0 ->
[ 1474.259651] ip = 25, sp = f4240, st = 0, a = d, b = 15, c = 4, d = 15, e = 0, f = 0
[ 1474.259652] ran f0 ->
[ 1474.259653] ip = 9, sp = f4240, st = 0, a = d, b = 15, c = 4, d = 15, e = 0, f = 0
[ 1474.259654] ran d2 ->
[ 1474.259654] ip = c, sp = f4240, st = 0, a = d, b = 15, c = 4, d = 1, e = 0, f = 0
[ 1474.259656] ran a1 ->
[ 1474.259656] ip = e, sp = f4240, st = 0, a = d, b = 15, c = 3, d = 1, e = 0, f = 0
[ 1474.259658] ran d2 ->
[ 1474.259658] ip = 11, sp = f4240, st = 0, a = d, b = 15, c = 3, d = 0, e = 0, f = 0
[ 1474.259659] ran c0 ->
[ 1474.259660] ip = 13, sp = f4240, st = 3, a = d, b = 15, c = 3, d = 0, e = 0, f = 0
[ 1474.259661] ran c1 ->
[ 1474.259662] ip = 14, sp = f4240, st = 0, a = d, b = 15, c = 3, d = 0, e = 0, f = 0
[ 1474.259663] ran f1 ->
[ 1474.259663] ip = 1d, sp = f4240, st = 0, a = d, b = 15, c = 3, d = 0, e = 0, f = 0
[ 1474.259665] ran d0 ->
[ 1474.259665] ip = 1f, sp = f4240, st = 0, a = d, b = 15, c = 3, d = d, e = 0, f = 0
[ 1474.259667] ran a0 ->
[ 1474.259667] ip = 21, sp = f4240, st = 0, a = d, b = 15, c = 3, d = 22, e = 0, f = 0
[ 1474.259668] ran d0 ->
[ 1474.259669] ip = 23, sp = f4240, st = 0, a = 15, b = 15, c = 3, d = 22, e = 0, f = 0
[ 1474.259670] ran d0 ->
[ 1474.259671] ip = 25, sp = f4240, st = 0, a = 15, b = 22, c = 3, d = 22, e = 0, f = 0
[ 1474.259672] ran f0 ->
[ 1474.259672] ip = 9, sp = f4240, st = 0, a = 15, b = 22, c = 3, d = 22, e = 0, f = 0
[ 1474.259674] ran d2 ->
[ 1474.259674] ip = c, sp = f4240, st = 0, a = 15, b = 22, c = 3, d = 1, e = 0, f = 0
[ 1474.259676] ran a1 ->
[ 1474.259676] ip = e, sp = f4240, st = 0, a = 15, b = 22, c = 2, d = 1, e = 0, f = 0
[ 1474.259677] ran d2 ->
[ 1474.259678] ip = 11, sp = f4240, st = 0, a = 15, b = 22, c = 2, d = 0, e = 0, f = 0
[ 1474.259679] ran c0 ->
[ 1474.259680] ip = 13, sp = f4240, st = 2, a = 15, b = 22, c = 2, d = 0, e = 0, f = 0
[ 1474.259681] ran c1 ->
[ 1474.259682] ip = 14, sp = f4240, st = 0, a = 15, b = 22, c = 2, d = 0, e = 0, f = 0
[ 1474.259683] ran f1 ->
[ 1474.259683] ip = 1d, sp = f4240, st = 0, a = 15, b = 22, c = 2, d = 0, e = 0, f = 0
[ 1474.259685] ran d0 ->
[ 1474.259685] ip = 1f, sp = f4240, st = 0, a = 15, b = 22, c = 2, d = 15, e = 0, f = 0
[ 1474.259686] ran a0 ->
[ 1474.259687] ip = 21, sp = f4240, st = 0, a = 15, b = 22, c = 2, d = 37, e = 0, f = 0
[ 1474.259688] ran d0 ->
[ 1474.259689] ip = 23, sp = f4240, st = 0, a = 22, b = 22, c = 2, d = 37, e = 0, f = 0
[ 1474.259690] ran d0 ->
[ 1474.259690] ip = 25, sp = f4240, st = 0, a = 22, b = 37, c = 2, d = 37, e = 0, f = 0
[ 1474.259692] ran f0 ->
[ 1474.259692] ip = 9, sp = f4240, st = 0, a = 22, b = 37, c = 2, d = 37, e = 0, f = 0
[ 1474.259694] ran d2 ->
[ 1474.259694] ip = c, sp = f4240, st = 0, a = 22, b = 37, c = 2, d = 1, e = 0, f = 0
[ 1474.259695] ran a1 ->
[ 1474.259696] ip = e, sp = f4240, st = 0, a = 22, b = 37, c = 1, d = 1, e = 0, f = 0
[ 1474.259697] ran d2 ->
[ 1474.259698] ip = 11, sp = f4240, st = 0, a = 22, b = 37, c = 1, d = 0, e = 0, f = 0
[ 1474.259699] ran c0 ->
[ 1474.259699] ip = 13, sp = f4240, st = 1, a = 22, b = 37, c = 1, d = 0, e = 0, f = 0
[ 1474.259701] ran c1 ->
[ 1474.259701] ip = 14, sp = f4240, st = 0, a = 22, b = 37, c = 1, d = 0, e = 0, f = 0
[ 1474.259703] ran f1 ->
[ 1474.259703] ip = 1d, sp = f4240, st = 0, a = 22, b = 37, c = 1, d = 0, e = 0, f = 0
[ 1474.259704] ran d0 ->
[ 1474.259705] ip = 1f, sp = f4240, st = 0, a = 22, b = 37, c = 1, d = 22, e = 0, f = 0
[ 1474.259706] ran a0 ->
[ 1474.259707] ip = 21, sp = f4240, st = 0, a = 22, b = 37, c = 1, d = 59, e = 0, f = 0
[ 1474.259708] ran d0 ->
[ 1474.259708] ip = 23, sp = f4240, st = 0, a = 37, b = 37, c = 1, d = 59, e = 0, f = 0
[ 1474.259710] ran d0 ->
[ 1474.259710] ip = 25, sp = f4240, st = 0, a = 37, b = 59, c = 1, d = 59, e = 0, f = 0
[ 1474.259711] ran f0 ->
[ 1474.259712] ip = 9, sp = f4240, st = 0, a = 37, b = 59, c = 1, d = 59, e = 0, f = 0
[ 1474.259713] ran d2 ->
[ 1474.259714] ip = c, sp = f4240, st = 0, a = 37, b = 59, c = 1, d = 1, e = 0, f = 0
[ 1474.259715] ran a1 ->
[ 1474.259716] ip = e, sp = f4240, st = 0, a = 37, b = 59, c = 0, d = 1, e = 0, f = 0
[ 1474.259717] ran d2 ->
[ 1474.259717] ip = 11, sp = f4240, st = 0, a = 37, b = 59, c = 0, d = 0, e = 0, f = 0
[ 1474.259719] ran c0 ->
[ 1474.259719] ip = 13, sp = f4240, st = 0, a = 37, b = 59, c = 0, d = 0, e = 0, f = 0
[ 1474.259720] ran c1 ->
[ 1474.259721] ip = 14, sp = f4240, st = 1, a = 37, b = 59, c = 0, d = 0, e = 0, f = 0
[ 1474.259722] ran f1 ->
[ 1474.259723] ip = 2e, sp = f4240, st = 1, a = 37, b = 59, c = 0, d = 0, e = 0, f = 0

The final instruction which makes an exit VM call is probably the most immediately interesting:

[ 1474.259724] syscall exit(55)
[ 1474.259725] exited with 55
[ 1474.259726] ran f4 -> 
[ 1474.259726] ip = 30, sp = f4240, st = 1, a = 37, b = 59, c = 0, d = 0, e = 0, f = 0
[clemens@archlinux soil-km]$ sudo rmmod soil

For any instruction, the Soil VM prints out the current register state and opcode since instruction tracing is enabled. As you can see, the program exits with a value of 55, the 10th Fibonacci number (if you start at 1). In other words, everything ran successfully!

What's still missing

At this point, most of the Soil VM works. The issue is all the machinery surrounding it. Here are some features I would still like to implement:

Wrapping up and future plans

Achieving my initial goal of running Soil bytecode in the Linux kernel proved surprisingly easy. I learned quite a bit about how character devices work internally and seeing /dev/soil pop up for the first time was a nice moment of success. Soil itself is also relatively easy to understand, especially given that I started with a working implementation. A couple of design decisions both in the VM in general (e.g. VM calls for file IO) and the implementation specifically (e.g. the approach to memory management) are clearly tailored to userspace applications, but the simple, RISC-like design made it easy enough to adapt.

As noted above, I have some ideas for where to take this project next, so expect further updates in the future.

  1. In order to build kernel modules you will need a number of packages, usually including a C compiler, make, and kernel headers. If you want to compile your own kernel modules, consult your distribution's documentation to find out how to set up these requirements. Also keep in mind that commands like insmod and rmmod will require superuser privileges.

  2. The Linux Kernel Labs website was another very helpful source in figuring out how to create my own character device.

  3. Credit for this idea goes to Marcel. You can't simply read the file and use strlen to get its length because Soil bytecode may contain null bytes.

  4. As it turns out, this Soil implementation tries to allocate 1 GB of memory when starting a VM. Marcel tells me this is necessary for the Martinaise compiler, which also runs on Soil. For the kernel version, I reduced the initial memory to 1 MB.


Read more:

Updates to the Blog

Recently, I did some major work on the infrastructure of this blog. You may have noticed that the design moved away from GitHub's Primer framework and hopefully that loading times have also improved. In this article, I want to walk you through what changes I made and why I made them....

2023-02-02 #rust #axum #rss #database

Share this article: A Soil VM for the Linux Kernel