Hello World in Rust for m68k with #[no_core] and compiler patches

Rust is great for some many practical purposes in modern software development. But who needs any of that? What are the oldest things we can target with standard Rustc?

ARM is a pretty well supported architecture, right? ARM is older than you might expect; it was first used in the Acorn Archimedes in 1987. Hey, I see LLVM’s code has mentions of the ARMv2 architecture that uses! Except… ugh, it looks like that was recently removed (because it never worked). Apparently that uses some sort of 26-bit addressing that was removed in later ARM versions, so it isn’t trivial to support. Thus ARMv4 may be as old as you can go (you can give that a try with Lokathor’s gba crate).

What about 16-bit x86? Sounds like there may be limited support for that in LLVM, but just for targetting 16-bit modes on modern processors for bootstrapping. Could give that a try, but x86 is boring and conventional.

All the cool people are moving from x86_64 to aarch64 these days, so how about the Motorola 68k instead? It’s a 32-bit instruction set, with a 32-bit linear addresss space. Seems a lot nicer than that weird 8086. A new LLVM backend was recently added for the 68k, and initial support in Rust, so it’s something we can target, though things may be a bit rough. Can’t be too bad, though, right?

A complication, and making things intentionally difficult

Rust has a “tier 3” m68k-unknown-linux-gnu target. Tier 3 means Rustup doesn’t distribute standard library binaries for it. Oh, and it isn’t automatically tested and may not work. But anyway, we can use the build-std feature of Cargo.

Except… there’s no std support for m68k-unknown-linux-gnu currently. And even core fails to build. We can’t use Rust without at least core, right? Well… not in stable Rust certainly. But there’s actually a no_core feature we can use.

We can then use FFI to call into the C standard library to interact with the OS. But that’s a bit boring, so what if we just make calls into the OS directly? If we can’t just use std, we might as well do things the hard way.

H. World’s Prelude in C

But how do we make Linux system calls, anyway? Normally the C standard library handles this, so we can look at a libc implementation like Musl. We can try writing our m68k Hello World program in C first, copying the definitions from Musl.

Then we can use the write system call to write text to stdout, and exit to terminate our process (normally handled automatically by C when we return from main):

#define __NR_exit 1
#define __NR_write 4

#define STDOUT_FILENO 1

static inline long __syscall1(long n, long a)
{
        register unsigned long d0 __asm__("d0") = n;
        register unsigned long d1 __asm__("d1") = a;
        __asm__ __volatile__ ("trap #0" : "+r"(d0)
                : "r"(d1)
                : "memory");
        return d0;
}

static inline long __syscall3(long n, long a, long b, long c)
{
        register unsigned long d0 __asm__("d0") = n;
        register unsigned long d1 __asm__("d1") = a;
        register unsigned long d2 __asm__("d2") = b;
        register unsigned long d3 __asm__("d3") = c;
        __asm__ __volatile__ ("trap #0" : "+r"(d0)
                : "r"(d1), "r"(d2), "r"(d3)
                : "memory");
        return d0;
}

void _start() {
        __syscall3(__NR_write, STDOUT_FILENO, (long)"Hello World!\n", 13);
        __syscall1(__NR_exit, 0);
}

What is even going on here?

Gcc, Clang, and Rustc offer “inline assembly” to let us use native assembly instructions directly in our code. To communicate with the Linux kernel, we use “system calls” that are like function calls, but use a special instruction to enter the kernel to handle the call. For linux on m68k, we load the number identifying the system call we want to use, and the arguments we want to pass into registers, then use trap #0 to enter the kernel.

When our program is executed, the _start symbol is called. Normally the C standard library implements this, and calls main() then uses the return value of that as the exit code for the exit syscall.

Trying the C version

Anyway, this works as expected with m68k-linux-gnu-gcc. But what about Clang? We can build Clang from git with experimental m68k support with something like this:

mkdir build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug \
               -DCMAKE_C_COMPILER=clang \
               -DCMAKE_CXX_COMPILER=clang++ \
               -DLLVM_TARGETS_TO_BUILD=X86 \
               -DBUILD_SHARED_LIBS=ON \
               -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=m68k \
               -DLLVM_ENABLE_PROJECTS="clang;compiler-rt" \
               ../llvm
ninja

And with clang -target m68k-unknown-linux hello-m68k.c we get…

hello-m68k.c:20:24: error: unexpected token parsing operands
        __asm__ __volatile__ ("trap #0" : "+r"(d0)
                              ^
<inline asm>:1:7: note: instantiated into assembly here
        trap #0

Oh. So, error messages could be better, but the m68k backend doesn’t know the trap instruction. Presumably the people who have worked on backend so far have been too sensible to make their system calls directly. No matter, it’s not too hard to add a simple instruction like this, once one figures out LLVM’s arcane TableGen language. And figures out how to use Phabricator to submit a patch, which is somehow more arcane than mailing lists. But anyway, LLVM now supports this and a couple other related instructions.

And then it works. Not that we care, we really want to use Rust, but now we know LLVM can handle what is needed, and we have an idea what our code needs to do.

And now, Rust®

We’ll need to compile Rustc with our new version of LLVM. We can clone it from git, and use a config.toml like this:

changelog-seen = 2

[llvm]
download-ci-llvm = false
link-shared = true

[build]
tools = ["cargo"]

[rust]
debug = true

[target.x86_64-unknown-linux-gnu]
llvm-config = "/home/ian/src/llvm-project/build/bin/llvm-config"

Using a shared build of LLVM makes linking faster and use less RAM, though it seems we need to pass LD_LIBRARY_PATH=/home/ian/src/llvm-project/build/lib when building and using the compiler (maybe there’s a better way?). Of course you can use static libraries if you’d rather.

We can then build with ./x.py build. We can use something like rustup toolchain link git $PWD/build/x86_64-unknown-linux-gnu/stage1 so we can use this compiler with rustup.

The actual program, with actual Rust code

Now we can move on to creating our program with cargo new --bin hello-m68k.

We can specify the target and compiler options in .cargo/config.toml:

[build]
target = "m68k-unknown-linux-gnu"
rustflags = ["-C", "target-feature=+crt-static"]

[target.m68k-unknown-linux-gnu]
linker = "m68k-linux-gnu-ld"

We’re finally ready to write some Rust code with #![no_core]. This is basically Extra Super Unstable Mode. We need to use unstable features that are only really meant to be used by the standard library itself, and are unlikely to ever be stabilized in their current form. But we can still do it.

So normally core provides an asm! macro, but here were have to define it ourselves. The definition is simple though since it’s just Magic™ that is actually implemented by the compiler. Using the macro seems to require Sized and Copy, so we’ll also add those with the magic attributes to make the compiler obey our will.

#![feature(lang_items, no_core, rustc_attrs, decl_macro, asm_experimental_arch)]
#![no_main]
#![no_core]

const __NR_EXIT: u32 = 1;
const __NR_WRITE: u32 = 4;

// Copied from libcore
#[rustc_builtin_macro]
pub macro asm("assembly template", $(operands,)* $(options($(option),*))?) {
    /* compiler built-in */
}

#[lang = "sized"]
trait Sized {}

#[lang = "copy"]
trait Copy {}

impl<T> Copy for *const T {}

#[no_mangle]
extern "C" fn _start() {
    let s = b"Hello World!\n";

    unsafe {
        asm!("trap #0", in("d0") __NR_WRITE, in("d1") 1, in("d2") s, in("d3") 13);
        asm!("trap #0", in("d0") __NR_EXIT, in("d1") 0);
    }
}

And with this we get…

   Compiling hello-m68k v0.1.0 (/home/ian/src/hello-m68k)
error[E0472]: inline assembly is unsupported on this target
  --> src/main.rs:21:5
   |
21 |     asm!("trap #0");
   |     ^^^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0472`.
error: could not compile `hello-m68k` (bin "hello-m68k") due to previous error

Oh, that’s a bit unexpected. Apparently inline assembly isn’t just magically supported and we need a bit of code for it in Rustc. This sounds like it could be complicated, but actually doesn’t look too bad looking at previous pull requests to add inline assembly for architectures. The main part of this is a file per architecture in rust/compiler/rustc_target/src/asm. Some of these are more complicated than others, but not too bad. m68k is fairly straightforward with 8 32-bit general purpose registers and 8 32-bit address registers (one of which is the stack pointer).

So copy some boilerplate, try to understand it and update it with information about the registers the m68k uses (while trying to understand m68k itself), open a pull request, receive some corrections from a maintainer who probably has better things to do that review PRs about supporting an architecture that hasn’t been relevant since a bit before I was born, and Rust now supports inline assembly on m68!

> qemu-m68k target/m68k-unknown-linux-gnu/debug/hello-m68k
Hello World!

Yep, that’s Hello World! A very exciting and novel thing!

Future Possibilities

I need to set up Debian m68k in a virtual machine to properly see what m68k Linux is like. But ultimately Linux on a virtual m68k isn’t that interesting, it’s just Linux but slower and with worse support for everything.

So Hello World for AmigaOS seems like a good next step. This will require dealing with the library loading mechanism used on the Amiga, and making sure the right calling convention is used. The way the Amiga handles libraries looks interesting and relatively straightforward, so that should be doable.

Oh, and fixing anything that’s stopping libcore from compiling would definitely be helpful.

I need to give rustc_codegen_gcc a try some time. I wonder if that could work with VAX…

Discussion

r/Rust Mastodon