Hello Hurd

TLDR: github.com/ids1024/hello-hurd has a Hello World program using Mach RPC calls.

I don’t really write blog posts much, but when I do, I take the opportunity to discuss the most relevant technologies one is likely to encounter. Like GNU Hurd.

…Okay, some (very abridged) background. In the beginning, Bell Labs created heaven and earth. Bell said, “let there be light”, and there were lasers. And transistors. And in the very dawn of time, the Unix operating system was born.

Well, close enough. The core of Unix (and later BSD and Linux) is a “monolithic kernel”. The main purpose of an operating system is to allow different programs and threads to share CPU time and other resources, to provide device drivers for hardware, to provide a filesystem, to manage permissions for different processes, etc. With a monolithic architecture, these are all part of one kernel. Which means our filesystem implementation runs with the same permissions as the graphics driver, and a bug in one could cause issues in the other or crash the whole system.

The Mach OS developed at Carnegie Mellon University took the monolithic BSD kernel, and incrementally moved things out of the kernel and into separate processes, that communicate through message passing.

The GNU project sought to create a free (open source) Unix-like operating system, and decided to create a re-implementation of CMU’s Mach, along with the “Hurd”, which would be an implementation of a Unix-like kernel as various “servers” running as processes on that microkernel. While the other parts of the GNU operating system made progress, this kernel was a bit slow to mature, then some Finn named Linus Torvalds decided to create a Unix-like kernel, and everyone ended up using GNU’s C compiler, utilities, and such, but with the Linux kernel, and though Hurd is still developed to some extent, it’s somewhat limited and fairly obscure compare to Linux…

If you’re reading this you probably already have some familiarity with Linux (or are quite confused, and it’s only going to get much worse from here). All in all it’s proven quite a practical kernel! But today, we ~~herd wildebeest~~ try GNU Hurd.

Installing (Debian) Hurd

Like Linux, Hurd is combined with various userspace components in a distribution. But there aren’t so many Hurd distributions. And Debian Hurd is the main one.

The Hurd website has some good instructions for running a Hurd image in qemu, including how to use hostfwd to forward the ssh port, which can be a convient way to run terminal commands int he VM. The image also works in VirtualBox, though you’ll have to use something like qemu-img to convert it. Or Debian has iso installers for Debian Hurd.

Hello World

So now that we’re ssh’d into a Hurd VM, lets see how it compares to Linux, from a programming perspective.

Without futher ado, let us consider the most daunting of programming tasks: Hello World!

#include <stdio.h>

int main() {
    printf("Hello Hurd!\n");
}

And then compile that:

# gcc hello.c
# ./a.out
Hello Hurd!

Okay, that’s just an ISO C program that will work on any OS. We could use more Unix specific calls like write (ignoring details like checking how many bytes were actually written):

int main() {
    // Write a 12 byte string to file descriptor 0 (stout)
    write(1, "Hello Hurd!\n", 12);
}

But that also works the same on Linux, BSD, or macOS. Hurd provides standard Unix-like APIs, and normally you’d write something like this that would also work on another Unix. But what happens inside write?

Now, for Linux, write is a fairly simple wrapper over assembly instructions, that basically just pass the system call number for write along with the arguments in registers, then uses the special syscall instruction (on x86_64) to jump into the kernel:

int main() {
    char *msg = "Hello Linux!\n";
    asm("mov $1, %%rax\n" // system call 1 (write)
        "mov $1, %%rdi\n" // file descriptor 1 (stdout)
        "mov %[msg], %%rsi\n" // buffer
        "mov $13, %%rdx\n" // number of bytes
        "syscall\n" // enter the kernel
        :
        : [msg] "r" (msg)
        : "rax", "rdi", "rsi", "rdx"); // clobbered registers
}

But although write works similarly on Hurd, what’s going on behind the scene is quite different from Linux.

rpctrace

The Linux kernel knows how to “write”. But on Hurd, the “microkernel” called Mach doesn’t understand things like this. Its goal is to handle message passing and task scheduling, and not things like files. So how does write actually work?

On Linux, you may be familiar with strace, which lets you run a program and see all the system calls it performs. Hurd has rpctrace to track what messages are being passed.

Running rpctrace with our program here, we see a bunch of messages, including things like dir_lookup ("lib/i386-gnu/libc.so.0.3" 1 0). So some of this is from dynamic loading, so it’s hard to see exactly which parts are truly needed just to print hello world. No matter; we can re-compile with -static to get rid of those and…

> ./a.out
task136(pid3116)->vm_statistics () = 0 {4096 322741 4974 164778 20483 858488 0 194585 44888 4665842 291976 878066 449697}
Child 3116 Killed

Huh. Okay, that’s weird. I’m not really sure why the statically linked version crashes in rpctrace?

Anyway, looking through the few terminal screens worth of output with the dynamically linked binary, this line stands out:

  84<--145(pid3215)->io_write ("Hello Hurd!\n" -1) = 0 12

It looks a bit familiar, but what do the numbers here mean? And how could we call this more directly, instead of using write?

The Mach Mig

So where does this io_write come from? If we do a quick grep -R io_write /usr/include, we get a few results, but one stands out: /usr/include/i386-gnu/hurd/io.defs. Hang on, what’s a .defs file? You won’t find those alongside your includes on Linux (but will on macOS…). Well anyway, here’s io_write in that file:

routine io_write (
        io_object: io_t;
        RPT
        data: data_t SCP;
        offset: loff_t;
        out amount: vm_size_t);

Well, if we compare to the output of rpctrace, data must be the string, and offset is the -1. amount is some kind of output, so that must be the return value of the write (the number of bytes written).

What’s the RPT part though? A bit more grepping shows /usr/include/i386-gnu/hurd/hurd_types.defs has this:

/* These macros are used in some .defs files so that every routine has a
   server reply port argument #ifdef REPLY_PORTS.  */
#ifdef REPLY_PORTS
#define RPTDECL sreplyport reply: sreply_port_t
#define RPT     RPTDECL;
#define RPTLAST ; RPTDECL
#else
#define RPTLAST
#define RPT
#endif

Hm. So it might mean nothing, or an sreply_port_t argument?

But anyway, what can we do with a .defs file? Browsing the GNU Hurd website some more, apparently there’s something called MIG, the “Mach interface generator”, that generates C code from an IDL (interface desription language). That must be our .defs file. It turns out we already have a command called mig, and it isn’t particularly hard to invoke:

mig /usr/include/i386-gnu/hurd/io.defs

That generates files named ioServer.c, ioUser.c, and io.h. In io.h we have this:

kern_return_t io_write
(
        io_t io_object,
        const_data_t data,
        mach_msg_type_number_t dataCnt,
        loff_t offset,
        vm_size_t *amount
);

Meanwhile ioUser.c implements this function, by building up a data structure and calling a function called mach_msg. In contrast, ioServer.c seems to contain code that calls an io_write function, if we were actually implementing io_write. But we’re not, so lets ignore that.

Looking at the function signature, We don’t see anything corresponding to the RPT we saw earlier, but ioUser.c does call something called mig_put_reply_port. So it seems it’s used internally. While data_t apparently corresponds two two C arguments (a pointer and a length). Otherwise, this neatly matches our .defs, and sending Mach messages doesn’t seem to bad if we can use mig generated bindings.

So using that C file and header, we can call this. But first, what is an io_t? The obviously answer would be that it’s a file descriptor, but that would be too easy… and isn’t quite true. /usr/include/i386-gnu/hurd/hurd_types.defs tells us it’s actually a type alias for a certain kind of “Mach port”:

type io_t = mach_port_copy_send_t

Then if we look in the C standard library, in glibc/sysdeps/mach/hurd/dl-sysdep.c we’ll see:

__ssize_t weak_function
__write (int fd, const void *buf, size_t nbytes)
{
  error_t err;
  vm_size_t nwrote;

  assert (fd < _hurd_init_dtablesize);

  err = __io_write (_hurd_init_dtable[fd], buf, nbytes, -1, &nwrote);
  if (err)
    return __hurd_fail (err);

  return nwrote;
}

So each Unix file descriptor does map to a mach_port_copy_send_t, but we’ll need to look it up in a table. (On Linux there’s also a “file-descriptor table” for each process, but it’s in the kernel.) It turns out _hurd_init_dtable is set in _hurd_startup() defined in glibc/hurd/hurdstatup.c. Which actually calls __task_get_special_port to get something called the TASK_BOOTSTRAP_PORT for our process, and then calls __exec_startup_get_info using that port. We’ll find that in /usr/include/i386-gnu/hurd/exec_startup.defs:

routine exec_startup_get_info (
        bootstrap: exec_startup_t;
        /* These describe the entry point and program header data
           of the user program loaded into the task.  */
        out user_entry: vm_address_t;
        out phdr_data: vm_address_t;
        out phdr_size: vm_size_t;
        /* These are the base address and size of the initial stack
           allocated by the exec server.   */
        out stack_base: vm_address_t;
        out stack_size: vm_size_t;
        /* The rest of the information is that passed by exec_exec.  */
        out flags: int;
        out argv: data_t, dealloc;
        out envp: data_t, dealloc;
        out dtable: portarray_t, dealloc;
        out portarray: portarray_t, dealloc;
        out intarray: intarray_t, dealloc);

That’s quite overwhelming, but except for bootstrap it’s all outputs, so we can just use the parts we need. Which is presumably the dtable.

But wait, aren’t things like argv passed as arguments to main? Well, our process starts executing with a symbol called _start that’s defined by glibc, which does things like calling this _hurd_startup() before it calls our main(). Then it is also responsible for calling exit() after main returns.

So if we really want to write our program in pure Mach messages, we’ll want to not use the _start provided by the standard library (-nostartfiles seems to work). And we’ll also need to call exit() explicitly rather than just returning. Or rather, we’ll use the underlying Mach messages. Following our theme, it turns out glibc’s __exit calls something called proc_mark_exit, then task_terminate. Apparently proc_mark_exit tells one of Hurd’s “servers” what our exit code is (while the microkernel doesn’t have a concept of exit codes), then we tell the kernel to actually stop our task. These are defined in /usr/include/i386-gnu/hurd/process.defs and /usr/include/i386-gnu/mach/mach.defs.

If we bring this all together, this ends up working:

#define ino64_t __ino64_t

#include <mach/mach_traps.h>
#include <mach/task_special_ports.h>
#include "mach.h"
#include "process.h"
#include "exec_startup.h"
#include "io.h"

void _start() {
    mach_port_t bootstrap;
    vm_address_t user_entry, phdr_data, stack_base;
    vm_size_t phdr_size, stack_size;
    int flags;
    data_t argv, envp;
    mach_msg_type_number_t argvCnt, envpCnt, dtableCnt, portarrayCnt, intarrayCnt;
    portarray_t dtable, portarray;
    intarray_t intarray;

    task_get_special_port(mach_task_self(), TASK_BOOTSTRAP_PORT, &bootstrap);

    exec_startup_get_info(bootstrap, 
        &user_entry,
        &phdr_data,
        &phdr_size,
        &stack_base,
        &stack_size,
        &flags,
        &argv,
        &argvCnt,
        &envp,
        &envpCnt,
        &dtable,
        &dtableCnt,
        &portarray,
        &portarrayCnt,
        &intarray,
        &intarrayCnt);

    int wrote;
    io_write(dtable[1], "Hello Hurd!\n", 12, 0, &wrote);

    proc_mark_exit(portarray[INIT_PORT_PROC], 0 << 8, 0);
    task_terminate(mach_task_self());
}

We also need to use mig to generate things like io_write (well, it looks like /usr/include and /usr/lib might already have what we need, but let’s do it the hard way). And we need the right commands to compile and link all that. github.com/ids1024/hello-hurd has this along with a Makefile for that:

# make
# ./hello-hurd
Hello Hurd!

And now, rpctrace has a simple output with just the RPC calls we’ve made:

# rpctrace ./hello-hurd
task136(pid1605)->task_get_special_port (4) = 0    143<--141(pid1605)
  143<--141(pid1605)->exec_startup_get_info () = 0 134514701 134512692 160 8192 16777216 0 "./hello-hurd\0" "SHELL=/bin/bash\0LD_ORIGIN_PATH=/bin\0PWD=/root/hello-mach\0LOGNAME=root\0HOME=/root" {  89<--144(pid1605)   84<--146(pid1605)   77<--147(pid1605)} {  112<--148(pid1605)   5<--149(pid1605)   98<--150(pid1605)   145<--151(pid1605)   106<--152(pid1605) (null)} {18 0 0 0 0}
  84<--146(pid1605)->io_write ("Hello Hurd!\n" 0)
 = 0 12
  145<--151(pid1605)->proc_mark_exit_request (0 0) = 0
task136(pid1605)->task_terminate () = 0
Child 1605 exited with 0

The Messenger of the Gods (or Daemons?): `mach_msg`

I couldn’t really find anything on the GNU Hurd website or elsewhere that broke down how a very simple Unix program like this maps to RPC calls. So I decided to break out gcc and grep and figure it out.

But now that I’ve made the connection here between the normal C API and the Mach-level interface, it seems a bit easier to appreciate discussion of Mach messages in something like The GNU Mach Reference Manual. I can’t explain everything in depth because 1) I still don’t understand the subtleties around types of port rights and such 2) this blog post is already long without paraphrasing an entire reference manual.

But the basic idea is that mach_msg is a system call that or receives a message on a “message port”, which communicates with another process or the kernel itself. And a message consists of a header followed by typed “data items” that can be integers, bools, strings… or can transfer port rights.

Then our mig generated files like ioUser.c just have to build up such a header and data items, and invoke mach_msg.

To match the abstraction level of our Linux code we should try to implement mach_msg from scratch including a bit of inline assembly, and maybe have the MIG generated functions call into our implementation… but not today.

Conclusions

This is really only scratching the surface of what Hurd or Mach even are. But I did find it interesting to get a bit of an idea of how Hurd maps Unix concepts to Mach messages.

Coming from an OS like Linux, It’s quite different to see the “system calls” of the OS as typed messages declared in an interface description language. I’d be really interested to see what sort of Rust API could be generated from the .defs files with a Rust port of the MIG. (But I don’t think I’ll start such a project right now.) And likewise for other languages.

The other important aspect of Hurd and Mach is how it how moving functionality out of the kernel can improve reliability, make things easier to develop, etc.

Hurd is not the only OS that’s like this. macOS is also based on Mach… though it has a “hybrid kernel”, that ports the BSD kernel to run on top of Mach, but with both in the same kernel address space. But it also exposes Mach ports and mach_msg to userspace in some form. I wonder if an exercise like this would be possible there.

Mach isn’t the most modern thing. Over 30 years ago now, L4 was designed to improve on earlier microkernels like Mach with a smaller kernel, and faster message passing (I’m not sure if GNU Mach has improved any of this relative to the original CMU Mach). And more recently SeL4 is a formally verified OS kernel… but I’m not a aware of a “general purpose” OS (Unix-like or otherwise) built on SeL4, so Hurd is easier to mess around with.

What Can we Learn from Hurd

Pretty much all kinds of software can benefit from some ideas about modularity, with components that have some isolation from each other, can be restarted independently, are easy to develop and test individually. The way Erlang “processes” are used is a good example of a somewhat similar idea outside of an OS kernel (or is modern software just OS kernels on top of OS kernels?).

Mach ports and Unix file descriptors can both be understood as a form of “capability”. I’d need to read about and use Mach ports more to appreciate exactly what they offer relative to Unix file descriptors. But regardless, the general idea of “capabilities” is useful whether you’re dealing with Unix or an OS built explicitly around capabilities. In much more recent software, the Webassembly System Interface (WASI) is designed with capabilities in mind. And the related WebAssembly Component Model has an interface description language called WIT that doesn’t serve the exact same purpose as MIG, but does share some similar ideas.

Working with some protocols on Linux that send different flavors of typed messages (that can include file descriptors) over POSIX sockets (Wayland, DBus, Pipewire), as well as some interfaces to the kernel (sysfs, ioctls on dev nodes)… I wonder if it would be better if they could all be build around one system for typed messages with an IDL a bit like MIG. Even if OS developers are concerned that a microkernel won’t perform as well, a really solid message passing mechanism seems like it should be a priority, and communication between processes on Unix can seem a bit haphazard and chaotic once you get beyond simple streams of bytes.

Mach isn’t the pinnacle of software design, but hopefully we can move beyond 80s software design like Mach to something better… instead of being stuck in the 70s.

Discussion

Mastodon