return to table of content

Translation of Rust's core and alloc crates to Coq for formal verification

fsloth
19 replies
7h5m

Totally not expert on formal verification. If Rust’s base libraries get formally verified, and one does not use unsafe code, does that means all Rust programs that use the formally verified libs get de-facto formally verified quality regarding their memoryhandling?

Galanwe
14 replies
6h22m

Rust safety has very little to do with bugs.

Rust has its own definition of what "safe" means, which is arguably a subset of memory safety.

You can have perfectly safe Rust code with data races, deadlocks, memory exhaustion, etc. Not even talking about logical errors.

dist1ll
13 replies
6h11m

You can have perfectly safe Rust code with data races

No, you can't. Data races are explicitly one type of race condition that Rust protects you from. Anything else would indicate an unsound library or compiler bug. For reference, you can take a look at the non-exhaustive list of undefined behaviors[0]. None of those behaviours should be possible to trigger from safe Rust.

[0] https://doc.rust-lang.org/reference/behavior-considered-unde...

Galanwe
8 replies
4h36m

You can have both data races and race conditions in a perfectly safe Rust program.

Rust can only offer its safety for the lifetimes it tracks, and this tracking is limited to a single instance of your process.

In other words, if you perform concurrent accesses through multiple threads spawned from a same process, you're safe from data races, at risk of race conditions. If you perform concurrent accesses through multiples processes, you're at risk of both.

That implies that even in a world where everything is safe Rust, you cannot guarantee that these two Rust processes will not data races together.

bigstrat2003
4 replies
3h3m

I don't think most people expect language safety guarantees to extend to other processes. That's a pretty unrealistic expectation you are bringing.

Galanwe
3 replies
1h28m

I don't think that's unrealistic, it all depends on which sphere of development you're navigating into.

In highly concurrent/long lived systems, you often end up using a multiprocess approach between your producers and consumers rather than multithreading. This is because it allows you to dynamically spawn new processes to consume from a shared memory without going through a bastion process managing threaded consumers. e. g. It allows you to spawn at will newly developed consumers without redeploying and restarting the producers. Note that this does not mean a different codebase.

As for the expectations, I think it's fair to highlight it. Because people tend to think that a world of applications fully developed in Rust would somehow imply no data races. That's not the case: On a fully fault-free operating system, with only 100% safe Rust applications running, you can, and will, still run into data races, because in the real world applications cross process boundaries.

steveklabnik
2 replies
59m

What is your definition of "data race"?

While I'm aware of some differences in opinion in the details, all of the definitions I'm aware of exclusively refer to multiple threads of execution writing to the same memory location, which makes the concept of a multi-process data race impossible. For example, Reghr: https://blog.regehr.org/archives/490

A data race happens when there are two memory accesses in a program where both:

target the same location

are performed concurrently by two threads

are not reads

are not synchronization operations

Galanwe
1 replies
46m

The subtle differences (or lack of) between threads and processes don't really matter IMHO. A data race happen when two concurrent accesses on the same memory location happen, one of these accesses is a write, and there is no (correct) synchronization in place. The means by which this memory location was shared don't really matter. Whether it was because both processes share the same initial memory space, or whether this memory space was mapped later on, or even whether both accesses were from a mapped space, is really not of the matter. You could have two threads mmap the same shared memory to communicate for what it matters.

steveklabnik
0 replies
43m

Okay, at least there's definitional agreement here. However,

The means by which this memory location was shared don't really matter.

It does in the context of this conversation, because you are claiming that safe Rust can create data races. Is there a safe API that lets you map addresses between processes? The closest thing I'm aware of is mmap, which is unsafe, for exactly this reason.

oconnor663
2 replies
4h25m

If you perform concurrent accesses through multiples processes

How would you do that in safe code? If I understand correctly, the problem you're referring to is the exact reason that mmap APIs in Rust are unsafe: https://docs.rs/memmap2/latest/memmap2/struct.Mmap.html#safe...

nindalf
1 replies
3h59m

I don't agree with the point GP is making, but presumably something like two processes writing a file and reading their changes back. If the writes and reads interleave you'd get a race condition.

couchand
0 replies
3h13m

That's decidedly not a data race, however.

hu3
3 replies
5h36m

As per https://doc.rust-lang.org/nomicon/races.html

However Rust does not prevent general race conditions.

This is mathematically impossible in situations where you do not control the scheduler, which is true for the normal OS environment.

For this reason, it is considered "safe" for Rust to get deadlocked or do something nonsensical with incorrect synchronization: this is known as a general race condition or resource race. Obviously such a program isn't very good, but Rust of course cannot prevent all logic errors.
couchand
2 replies
5h16m

On the one hand, we like to encourage learning here. On the other, we prefer not to copy-paste a bunch of possibly-irrelevant content. Well, forgive me for pasting in a StackOverflow answer that may be relevant here: https://stackoverflow.com/questions/11276259/are-data-races-...

Are "data races" and "race condition" actually the same thing in context of concurrent programming

No, they are not the same thing. They are not a subset of one another. They are also neither the necessary, nor the sufficient condition for one another.

The really curious thing here is that the Nomicon page also describes this distinction in great detail.

hu3
1 replies
4h31m

Ahh the uncalled for snark. A trademark of the Rust community.

I knew of the distinction. Posted anyway for clarity.

So races can still happen. Just not a subset of them.

And as per another responder, smaller subset than it looks at first glance. Even in data races.

And who's we?

-----

original msg I replied to:

On the one hand, we like to encourage learning here. On the other, we prefer not to copy-paste a bunch of possibly-irrelevant content. Well, forgive me for pasting in a StackOverflow answer that may be relevant here: https://stackoverflow.com/questions/11276259/are-data-races-...

Are "data races" and "race condition" actually the same thing in context of concurrent programming

No, they are not the same thing. They are not a subset of one another. They are also neither the necessary, nor the sufficient condition for one another.

The really curious thing here is that the Nomicon page also describes this distinction in great detail.

couchand
0 replies
3h9m

I apologize if my comment came off as snark. Your comment was nothing but pasted text which ommitted relevant detail, so it was not clear what the intent was. In context, to me, it did not seem to be illuminating. It actually seemed to be introducing confusion where there previously was none.

Data races are not possible in safe Rust. Race conditions are. The distinction is made clear in the Nomicon article, but commenters here are really muddying the waters...

raphlinus
1 replies
6h17m

I think that to some extent represents the dream, with caveats. A few pieces need to fall into place.

1. You need to formalize the semantics of unsafe Rust. Ralf Jung's pioneering RustBelt[1] work was a huge step in that direction, but it is not yet complete. One element that's emerging as tricky is "pointer provenance."

2. Part of that is making a formal model of the borrow checker. Stacked borrows[2] was a good attempt, but has some flaws. Tree borrows[3] may correct those flaws, but it's possible something even more refined may emerge.

3. Another part of that is a formal memory model, which is mostly about the behavior of atomics and synchronization (thus, is critically important for standard library components such as Arc). It is widely understood that Rust's memory model should be similar to the C++ one (and that they should interoperate), but there are some remaining flaws in that, such as the "out of thin air" problem and also some missing functionality like seqlocks (part of why the Linux kernel still has its own).

4. You need a proof that the safety guarantees compose well; in particular if you have sound code written in unsafe Rust, composed with other code written in safe Rust, the safety guarantees still hold. There are good results so far but it needs to be proved over the entire system.

5. You do need to close all the remaining soundness bugs in Rust[1], for such a proof to hold for all code. Progress on this is slow, as many of these problems are theoretical, or are only really relevant for adversarial code[5].

When all this is done, you still only have a partial guarantee. Huge amounts of the surface area to interface with the system are based on unsafe code. If you're doing only pure computation, that's one thing, but if you're building, say, a graphical UI, there's a massive amount of stuff that can still go wrong.

Even so, there is a practical path forward, and that leads to a much better situation than the usual state of things today, which is of course systems riddled with vulnerabilities.

[1]: https://people.mpi-sws.org/~dreyer/papers/rustbelt/paper.pdf

[2]: https://plv.mpi-sws.org/rustbelt/stacked-borrows/

[3]: https://www.ralfj.de/blog/2023/06/02/tree-borrows.html

[4]: https://github.com/rust-lang/rust/issues?q=is%3Aissue+is%3Ao...

[5]: https://github.com/Speykious/cve-rs

clarus
0 replies
4h42m

Thanks for the explanations and all the links!

steveklabnik
0 replies
56m

Most of the interest in formalizing Rust is specifically about formalizing the model that unsafe code is supposed to uphold.

weinzierl
12 replies
10h8m

What I don't quite understand with these efforts is this: If we have to translate the code to Coq manually or semi-manually, isn't the likelihood that we make mistakes there much higher than what we ultimately gain with the formal verification. In other words, how do we know that what we proof is still valid for the original code?

bennofs
5 replies
9h50m

Yes, that is a limitation. But this limitation is not too bad.

In many cases, a bug in the translation simply makes the proof impossible. Somebody then investigates why the proof does not go through and finds the bug in the translation.

We only have a problem if the bug in the translation specificially cancels a bug in the original code. If there is no systematic risk, it is quite unlikely that two bugs cancel each other in this way.

user2342
3 replies
9h29m

In case of coq-to-ocaml: is it feasible to do an extraction to OCaml on the translated code and compare it with the original?

user2342
1 replies
7h30m

Yes, I know, I mentioned the extraction.

My question was whether it can help detecting translation errors from the first step.

cccbbbaaa
0 replies
6h49m

I'm not sure which first step you are talking about. Typically, one would write the program directly in Coq and use the extracted code as-is.

im3w1l
0 replies
3h58m

Let's say you want to check if use after free can ever occur. Your translation is bugged and translates the whole program to a single "nop". Then the verifier happily asserts that the translated program does not cause use after free.

I doubt the translation would be that bad but it could have more subtle issues of the same kind.

muldvarp
2 replies
9h46m

In other words, how do we know that what we proof is still valid for the original code?

We don't. We will always have to trust _something_ (the hardware, the compiler, the specification, Coq's trusted kernel, ...). Formal verification is sadly often discussed as removing the possibility of bugs in their entirety. In practice, formal verification only makes bugs much less likey.

That said, yes, automatic translation between Rust and Coq massively reduces the complexity of the things you have to trust and should therefore be preferred over manual translation.

epolanski
1 replies
6h57m

Every tool can only guarantee absence of some categories of bugs.

The most common ones, business logic, often escape verification simply because of wrong assumptions.

Simple example: some websites use the browser's default language to set your locale, some use your ip. Thus I travel to Poland and suddenly YouTube shows me Polish content and ads (this can be changed often, but not always, see Google docs idiocy). They will all unavoidably lead to business logic bugs because the assumptions are wrong in the first place.

jlokier
0 replies
53m

That's why some industries have separate "verification" and validation" (V&V) activities.

Verification is checking against formal or at least known requirements like those being discussed in this thread.

Validation is a set of processes for ensuring the specifications actually correspond to how the product ought to work. "How it ought to work" is open-ended and meant to include real-world aspects you didn't know beforehand. For example business logic assumptions. But despite being open-ended it is possible to formalise and track the process of looking for issues systematically.

In your example of YouTube showing Polish content when you travel, verification is the process of checking the website uses the IP correctly because using the IP is specified somewhere, but validation is the process of asking "what's the right thing to do when a user travels, is it really to use the IP?" and "let's systematically research what issues are affecting users due to incorrect specifications that we should change".

clarus
2 replies
10h1m

The code is translated automatically with coq-of-rust! When issues are found in the translation they can be fixed once in the coq-of-rust tool, and all the translations are updated.

weinzierl
1 replies
9h30m

Ok, what I think I do not understand is what they mean by "tedious and error prone"? Is it tedious to write the automated translation (aka the coq-of-rust tool in this case) or does the translation of a concrete piece of code (e.g. the core crate) still involve manual work?

clarus
0 replies
8h19m

The "tedious and error prone" code was what we were doing before, when the translation of the standard library was not yet working automatically with coq-of-rust. Now, this is automatic. Maybe the explanations on the blog post were not clear.

thaliaarchi
11 replies
10h51m

That's really impressive.

Automatic translation like this shifts the trust to the tool. coq-of-rust itself is written in Rust, not in Coq. The recursive nature is somewhat boggling, but I think a proof of its correctness is possible in a similar process to David A. Wheeler's “Countering Trusting Trust through Diverse Double-Compiling” (2009) [0] (which circumvents Ken Thompson's Trusting Trusting attack by using a second compiler), but with a mix of a CompCert approach.

To verify it, you'd use coq-of-rust to convert the coq-of-rust translator from Rust to Coq. That translation is not trusted, because it was performed in Rust, but it doesn't matter. Once in Coq, you prove the desired correctness properties—crucially, that it preserves the semantics of the Rust program when it translates a program to Coq.

As in the article, it is likely easier to work with more functional definitions in proofs instead of generated ones, so you'd undertake the same process as they do with the stdlib of proving equivalence between definitions. Since the current line count for the coq-of-rust translator (specifically, lib/ [1]) is 6350 lines of Rust, it even seems feasible to write a full translator in Coq and prove its equivalence to the generated one.

Then, you execute the proven-correct Coq coq-of-rust translator on the Rust source of the coq-of-rust translator. The Coq definitions it outputs should match the output of the Rust coq-of-rust translator that you started with.

As an aside, it's nice to see industry funding for work like this. I'm often cynical of cryptocurrency, but its correctness constraints really push for improvements in areas I like (Rust, Coq, funding for masters students I know, etc.).

[0]: https://dwheeler.com/trusting-trust/wheelerd-trust.pdf

[1]: https://github.com/formal-land/coq-of-rust/tree/main/lib

clarus
6 replies
10h32m

Thanks for the comment! One of the authors here.

Indeed this would be a nice process to verify coq-of-rust. Also, although the code is rather short, we depend on the Rust compiler to parse and type-check the input Rust code. So that would need to be also verified, or at least formally specified without doing the proofs, and the API of rustc is rather large and unstable. It could still be a way to get more insurance.

deredede
2 replies
9h50m

Is there a Coq formalisation for enough of Rust that would be usable here? I thought people were still figuring that out.

clarus
1 replies
8h23m

The formalization work for Rust was done mostly at the MIR level, which is one step lower than the THIR level we use here. See, for example, the https://plv.mpi-sws.org/rustbelt/ project. MIR should be more amenable to formal specification, as this language is more compact than THIR and aims to be more stable.

However, we also lose some information going to MIR, as there are no expressions/loops from the original code anymore. There are still ways to reconstruct these, but we preferred to use the THIR representation directly.

deredede
0 replies
13m

That makes sense and fits what I had in mind: there is no formalization at the source level. Thanks for the details!

thaliaarchi
1 replies
9h49m

I didn't touch on that, but I did assume trust of the Rust toolchain, verifying starting at THIR. Verifying rustc would be a monumental undertaking, though I think some people are working on it.

Since we don't have a verified rustc (a la CompCert [0]), I wonder if an approach like the translation validation of seL4 [1] would work. They prove that the artifact (ARM machine code) produced by an existing compiler (gcc) for a chosen program (seL4) matches the source semantics (C). Thus you could circumvent trusting rustc, but it only works to verify a specific output of a chosen program. If the chosen program was coq-of-rust, I don't think this would be easier than the approach I detailed above. The seL4 kernel is 9,500 lines of C, while their Isabel/HOL specification is over 200,000 lines, so the technique doesn't seem to scale to a large chosen source like rustc.

Isn't bootstrapping fun?

[0]: Xavier Leroy. 2008. “Formal verification of a realistic compiler”. https://xavierleroy.org/publi/compcert-CACM.pdf

[1]: Thomas Sewell, Magnus Myreen, and Gerwin Klein. PLDI 2013. “Translation Validation for a Verified OS Kernel”. https://sci-hub.st/10.1145/2491956.2462183

deredede
0 replies
9h12m

Since you contrast the two approaches you might be interested in learning that CompCert also uses translation validation in some part of the compiler (notably for register allocation), see Jean-Baptiste Tristan's papers with Xavier.

Gajurgensen
0 replies
4m

Very interesting work! I'm curious how you handle loops/recursion? I imagine the `M` monad seen in the examples has a special primitive for loops?

im3w1l
1 replies
8h35m

As an aside, it's nice to see industry funding for work like this. I'm often cynical of cryptocurrency, but its correctness constraints really push for improvements in areas I like (Rust, Coq, funding for masters students I know, etc.).

That's part of it, but another part I think is that crypto currencies simply transferred a lot of wealth to people interested in CS (and a wouldn't-it-be-cool-if mindset in general). And they are spending that money on CS research not just because it benefits them but because it's philanthropy aligned with their hobby.

noneeeed
0 replies
10h14m

It reminds me of when I used to work in SPARK Ada. On a number of projects where there was no supported Ada target (especially very small devices), it would be converted to C and then compiled for the target. That allowed us to perform the various forms of static analysis in the SPARK land.

It obviously introduced issues around verifying the output or the transpiler, but I think the resulting C code was quite readable and limited in style for verification purposes, and the tools had a high degree of trust.

The SPARK static analysis was only ever a part of the whole verification and validation process and there was plenty of testing and other activities that provides additional levels of trust.

The whole approach seemed to work pretty well.

hedora
0 replies
2h47m

I’m not sure how that approach prevents the rust compiler that’s used to build binaries from injecting its malicious payload. (You could build A with B and B with A, and diff the binaries, I guess).

Another approach is proof carrying code: The rust compiler would emit a coq proof that its executable output matches the semantics of the source code input.

Of course, you could also write a compiler of a subset of rust (perhaps without the borrow checker or optimizations) in machine code for one architecture, then bootstrap everything with that.

vlovich123
4 replies
2h46m

Is formal verification specification writing similar to writing a more complicated property test? I’ve found writing property tests to be kinda difficult and time consuming for non-trivial programs.

Jtsummers
3 replies
2h15m

Similar, not always the same. Property tests generally work at the interface of a system (either down to the function/procedure level or higher level) where you specify a description of the inputs and test the outputs satisfy some property or set of properties.

This will look a lot like what you do with formal verification at the same level. Formal verification tools can go deeper, though, like answering questions about the state of the system internal to a computation. Like "I have this loop invariant, can I prove (or have a system prove) that it actually holds across all iterations?" or "Can I prove that while doing this computation there is never underflow/overflow in any intermediate computation?" The former could be done with property testing if you pull the loop kernel out into its own procedure and then test the invariant property by executing the kernel with a large number of intermediate states. The latter is harder to do without really, really tearing your program apart down to the level of making each line separately executable.

vlovich123
2 replies
1h58m

I think my main point is that writing the specification is difficult enough with a non-trivial program with property tests, with many places tending to even avoid it if they can. If formal verification is even more difficult, it doesn't matter that it has more power and can make the code even more correct - the cost is a huge blocker.

clarus
0 replies
1h11m

There are some specifications that are always there, like the absence of reachable panics or the backward compatibility between releases on stable entry points.

Otherwise you can outsource this work to specialized companies such as Formal Land or others!

Jtsummers
0 replies
1h41m

It depends on how you do the formal verification. There are systems like SPARK/Ada which brings the specification into the language you're developing in (supports only a subset of Ada, but every year they're expanding what it covers), JML which does something similar for Java but the specification language is expressed in its own notation in Java comments, and others.

If you use these kinds of systems you can be more judicious in what you try to prove. Whole program proofs are infeasible (or at least impractical) generally, but you pick the subsets you care about most and you can add annotations and prove them (automatically ideally, but you can bring in an external proof assistant as well). You leave the rest of the system to testing which has proven remarkably effective over the decades (when taken seriously, at least).

This isn't an all-or-nothing proposition, you choose how to spend your time and effort with formal verification just like you do with testing or other work.

clarus
0 replies
7h54m

It shoud be possible. A specific feature of Coq that we use is impredicative Set. I do not know if this is the case in F*.

xavxav
1 replies
8h58m

Can you compare how your approach contrasts to Aeneas or RustHornBelt? How do you handle pointers and mutable borrows?

clarus
0 replies
8h2m

I do not know how RustHornBelt works. We are focusing on safe code, although we still generate a translation for unsafe blocks as a "best effort".

Compared to Aeneas the goal is very similar as we want to verify Rust programs using interactive theorem provers. However, with coq-of-rust we write the purely functional version of the code (the one on which we make proofs) by hand, or with the help of some GitHub Copilot as this is rather repetitive, and prove it equivalent with the automatic translation. In Aeneas the aim is to directly generate a functional version.

We handle all the pointers as if they were mutable pointers (the `*` type). We do not use any information from Rust's borrow checker, which simplifies the translation, but we pay that at proof time.

To reason about pointers in the proofs, we let the user provide a custom allocator that can be designed depending on how the memory will be used. For example, if the program uses three global mutable variables, the memory can be a record with three entries. These entries are initially `None` to represent when they are not yet allocated. We do not yet know how this technique can scale, but at least we can avoid separation logic reasoning for now. We hope that most of the programs we will verify have a rather "simple" memory discipline, especially on the application side.

the8472
0 replies
4h3m

Did you find any bugs in the crates?

pron
0 replies
6h33m

The size of programs (as opposed to libraries, which could be easier because their code is not nearly as self-interacting) that has been end-to-end verified with semi-automated deductive proof systems like Coq is quite small. In fact, the size of programs that can be verified in this way has grown at a slower pace than the growth in the average size of programs overall. While sound software verification (i.e. a 100% conformance to a spec) certainly has its place (usually to prove the correctness of some core algorithms), it doesn't scale well, which is why there's been a shift in research into unsound methods because the cost of 100% assurance can be 10x that of 99.9999% assurance, and the difference in probability may fall below that of non-software failures (after all, a physical system cannot be proven to conform to a spec) or the probability of a spec that doesn't conform well enough to reality.

p4bl0
0 replies
9h35m

Note to interested readers: I submitted this blog post because it is less directly versed in cryptocurrency-related stuff than other posts from the same blog, but there are many more technically interesting posts there, especially the last two discussing the same approach but for Python instead of Rust.

konstantinua00
0 replies
5h19m

I remember lecture [1] that mentioned fuzzer finding bugs in formally-verified C compiler - because formal verification didn't cover frontend and backend

I know people raise questions about trust in Coq itself and in translation (and I question how it will synchronize with rust updates), but even perfect formal verification doesn't mean 100% correctness from start to finish

[1] https://youtu.be/Ux0YnVEaI6A

Havoc
0 replies
10h25m

Didn’t know that is even feasible.

I wonder if efforts like this could speed up rust in key parts of kernel adoption