Rust: safe and unsafe as theorems and axioms

UPDATE: I would probably talk about this different now that I know a bit more about how something like this would actually be formalized, but this post is an analogy more than an actual formal model, even if a variation can be formalized.

There is a fair amount of confusion about what unsafe means in Rust, as well as debate about how one should think about it. Recently I’ve seen several blog posts like What is Rust’s unsafe?, The Temptation of Unsafe and Unsafe as a Human-Assisted Type System.

So I’ll add yet more useless philosophizing provide another perspective.

I’m not really attempting to explain what is considered unsafe in Rust, which is explained by the reference. Nor am I going to try to answer the question of precisely when unsafe should be used and how often.

My basic suggestion: we can think of unsafe in terms of mathematical axioms and theorems. This understanding is somewhere in between actual mathematical rigour and an analogy.

Terminology

  • Unsafe block
    An unsafe block, unsafe {} allows potentially unsound actions like dereferencing a raw pointer or calling an “unsafe function”. Code attempting these actions outside an unsafe block or function fails to compile. The code may (and should) be sound, but the compiler cannot verify this.

  • Unsafe function
    Declared with unsafe fn, an unsafe function can perform all the actions that are possible in an unsafe {} block, but is also unsafe to call.

  • Soundness
    “Unsound” code is any code triggering undefined behavior. Undefined behavior is a rather complicated concept that is important to understand especially for C programmers, but also C++ programs and anyone using the unsafe keyword in Rust. This series of LLVM blog posts is helpful.

    Safe code should always be sound (barring compiler bugs and calls to incorrectly implemented functions containing unsafe blocks). Correct uses of unsafe are also sound. So “unsound” code is unsafe code that fails to correctly uphold the invariants safe code relies on.

    Other terms are used; often just “safe”. But it gets a bit consfusing to talk about things like “safe uses of unsafe”. And more importantly, it’s then ambigous to say how “doing $x is unsafe”, which could mean $x it requires unsafe and the programmer must maintainer certain invariants, or that $x always results in undefined behavior.

Safe code

One important subgoal of programming in Rust is to proof your program is sound and free of undefined behavior. (You also want to verify that behavior is actually the behavior you want, performance is acceptable, etc.).

The glory of Rust is that you can “prove” this, as long as all your code from main() down doesn’t use unsafe.

fn main() {
    let mut v = Vec::new();
    v.push(42);
}

We can think of this as going along with a proof:

Theorem: main() is sound
Proof:

  • Assume Vec::new() and Vec::push() are sound
  • If main() compiles, it follows from the compiler’s checks that main() is sound

This proof is valid as long as we have corresponding proofs for Vec::new() and Vec::push(), and assume the compilers rules for verifying the soundness of safe code are correct. The proof is rather short since it’s core argument is simply rustc said so. We’ll consider the implications of this reasoning later.

By writing fn main() without the unsafe keyword, and not using an unsafe {} block in the body, and assuming the code compiles, we have written a proof that the main() function is sound. This applies equally to other functions we might write and call (directly or indirectly) from main().

Functions with unsafe blocks

But… at some point you are almost certainly dependent on unsafe code. At least in the standard library.

Consider Vec::push, for instance:

pub fn push(&mut self, value: T) {
    // This will panic or abort if we would allocate > isize::MAX bytes
    // or if the length increment would overflow for zero-sized types.
    if self.len == self.buf.cap() {
        self.reserve(1);
    }
    unsafe {
        let end = self.as_mut_ptr().add(self.len);
        ptr::write(end, value);
        self.len += 1;
    }
}

We use unsafe {} when the compiler cannot prove the soundness of our code. So it becomes an axiom.

Axiom: Vec::push() is sound

Thus our “proof” for the soundness of main() relies on this axiom. This is only true if the ptr::read call is correct; but the compiler isn’t able to verify this, and relies on the programmer to assert that it is.

Unsafe functions

Now let’s instead consider an unsafe fn; in particular, Vec::set_len():

pub unsafe fn set_len(&mut self, new_len: usize) {
    debug_assert!(new_len <= self.capacity());

    self.len = new_len;
}

This defines neither a theorem nor axiom. Generally, the fact that it’s an unsafe fn suggests some uses are sound and others are not; without clarifying what uses or allowing the compiler to check this.

But the documentation does provide what could be considered an axiom:

Safety

  • new_len must be less than or equal to capacity().
  • The elements at old_len..new_len must be initialized.

Or, restated:

Axiom: Suppose new_len <= capacity(). Suppose elements at old_len..new_len are initialized. Then it follows that unsafe { v.set_len(new_len); } is sound.

Given this axiom, we can “prove” our use of set_len is correct; but the compiler cannot verify these requirements.

Ideal world: encoding and proving invariants in Rust

Ideally we’d be able to encode these requirements in Rust. Then our caller could provide some proof that the invariants are met, and call the function without using unsafe {}. Similarly, our set_len() and other Vec methods could have soundness proofs, assuming the stated constraints are met.

Thus, use of unsafe {} could be limited further to only cases that are correct but impossible to prove (under the provided system) or where it would be too difficult to prove.

I’m not the first one to think such a feature would be nice, but it would be rather complicated. So for now, we rely on the programmer verifying these properties themself. It’s probably not something the language will even have, but perhaps some external tool could provide some form of this for users who require it.

But without that, it’s still important to include this information in the documentation. A well-written library should provide an “axiom” of this sort in the documentation of any unsafe fn (at least in the public API). Rust’s ability to encapsulate unsafely isn’t much use if it isn’t possible to determine what uses of the function are sound.

Implicit axioms

Our programs are dependent on unsafe code whether or not we realize it: at least in the standard library. As we see above with Vec, the standard library contains unsafe {} blocks, so our list of axioms include those defined by these usages.

Perhaps less obviously, the language itself also implicitly defines various axioms. The exact behavior doesn’t have a mathematical specification, but we could list some easily enough:

Axiom: It is sound to consecutively execute two sound blocks of code
Axiom: In safe code, the lifetime checker correctly prevents lifetime violations
Axiom: An if statement is sound if the condition, body, and else block are sound

And we could continue, until we have a fairly large but finite set of axioms.

A soundness bug in the standard library or compiler thus can make or program unsound. This is rare, but occasionally happens.

Limitation: unsafety extends beyond function block

Our model is made a little less neat by a common subtly of unsafety, as mentioned in the Nomicon:

Because it relies on invariants of a struct field, this unsafe code does more than pollute a whole function: it pollutes a whole module. Generally, the only bullet-proof way to limit the scope of unsafe code is at the module boundary with privacy.

The example there is different, but applies to the implementation of Vec as well. Which you might have been wondering about, if you noticed that the implementation for set_len() above (unlike many uses of unsafe fn) would still compile without the unsafe keyword. All it does is set the .len member, with fairly ordinary safe code.

But this could lead to problems in combination with other code. The push() code above would be unsound if, for instance, another function had set self.len to a bogus value. So even though setting .len is safe, carelessly changing it could cause methods like .push() to produce undefined behavior.

So my neat idea that a function can correspond one-to-one with a soundness theorem or axiom isn’t quite perfect, at least without more assumptions, since different methods of a type like Vec interact in ways that could allow safe code in one to trigger undefined behavior in the unsafe code of another. But perhaps the idea is still somewhat useful.

Limitation: not formally specified/verified

Rust does not have a formal specification, so there is no official list of axioms like this. For true mathematic rigor, we would also need a formally verified compiler proven to generate code conforming to these axioms. Computer-verified proofs can be done written and checked software like Coq, which provides an elaborate type system that can prove theorems using the Curry-Howard correspondence.

There have been efforts toward formal verification in Rust. But a full formal specification and formally verified compiler, especially one that is also performant, probably isn’t really feasible.

This doesn’t invalidate the general reasoning here, but we have to be careful not to be too bold to claim things are really “proven” in a rigorous mathematical sense.

Conclusion

In an ideal world, a major aim of compilers is proof checking. The type checker proves that types are used correctly. The optimizer creates smaller, more efficient code and proves it’s behavior is equivalent. The lifetime checker in Rust proves the liveness of references.

In practice the mathematical rigor involved here is lower than in an ideal (and unrealistic) world. But we can still reason about these things in terms of the concept of mathematical proof.