That sounds ... hard. Especially as idiomatic Rust as written by skilled programmers looks nothing like C, and most interesting code is written in C++ anyway.
Isn't it equivalent to statically determining the lifetimes of all allocations in the C program, including those that are implemented using custom allocators or which cross into proprietary libraries? There's been a lot of research into this sort of thing over the years without much success. C/C++ programs can do things like tie allocation lifetimes to what buttons a user clicks, without ref counting or other mechanisms to ensure safety. It's not a good idea, but, they can do it.
The other obvious problem with trying to write such a static analysis is that the programs you're analyzing are by definition buggy and the lifetimes might not make sense (if they did, they wouldn't have memory safety holes and wouldn't need to be replaced). The only research I've seen on this problem of statically detecting what lifetimes should be does assume the code being analyzed is actually correct to begin with. I guess you could try and aim for a program that detects where lifetimes can't be worked out and asks the developer for help though.
Hard for humans. But it's DARPA, is it hard for AI? Image classification used to be hard also, today cars drive themselves.
I'd say it's good timing.
You can attach about a hundred asterisks to that.
If anything, I think self the failure to hit L5 driving after billions of dollars and millions of man hours invested is probably reflective of how automatic C to Rust translation will go. We'll cruise 90% of the way, but the last 10% will prove insurmountable with current technology.
Think about the number of C programs in the wild that rely on compiler-specific or libc-specific or platform-specific behavior, or even undefined behavior plus the dumb luck of a certain brittle combination of {compiler version} ∩ {libc version} ∩ {linker version} ∩ {build flags} emitting workable machine code. There's a huge chunk of C software where there's not enough context within the source itself (or even source plus build scripts) to understand the behavior. It's not even clear that this is a solvable problem in the abstract.
None of that is to say that DARPA shouldn't fund this. Research isn't always about finding an industrial strength end product; the knowledge and expertise gained along the way is important too.
Not in San Francisco. There are about 300 Waymo cars safely driving in one of the most difficult urban environments around (think steep hills, fog, construction, crazy traffic, crazy drivers, crazier pedestrians). Five years ago this was "someday" science-fiction. Frankly I trust them much more then human drivers and envision a future utopia where human drivers are banned from urban centers.
To get back on topic, I don't think automatic programming language translation is nearly as hard, especially since we have a deterministic model of the machines it runs on. I can see a possible approach where AI systems take the assembler code of a C++ program, then translate that into Rust, or anything else. Can they get 100% accuracy and bit-for-bit compatibility on output? I would not bet against it.
Opinions about automated driving systems vary. Just from my own experience doing business all around San Francisco I have seen at least a half dozen instances of Waymo vehicles making unsafe maneuvers. Responders have told me and local government officials that Waymo vehicles frequently fail to acknowledge emergency situations or respond to driving instructions. Driving is a social exercise which requires understanding of a number of abstractions.
they're not perfect, sure, but they're out there, just driving around all autonomously and all, contrary to GGP's assertion that they don't exist.
GGGP talked about L5 self-driving, isn't Waymo L4?
San Francisco, for all its challenges, mostly has traffic laws that people follow. This is not true throughout the world.
Isn't 100% accuracy (relatively) easy? c2rust already does that, or at least comes close, as far as I know.
Getting identical outputs on safe executions, catching any unsafe behavior (at translation-time or run-time), and producing efficient, maintainable code all at once is a million times harder.
Limited to specific areas during specific hours, and have caused crashes (at least when I lived there till last summer).
This is the exact formulation of the argument before computers beat humans at chess, or drew pictures, or represented color correctly, or... Self driving cars will be solved. There is at least one general purpose computer that can solve it already (a human brain), so of a purpose built computer can also be made to solve it.
In 10 (or 2 or 50 or X) years when Chevy, Ford, and others are rolling out cheap self driving this argument stops working. The important thing is that this argument stops working with no change in how hard C to Rust conversion is.
We really should be looking at the specifics of both problems. What makes computer language translation hard? Why is driving hard? One needs to be correct while inferring intent and possibly reformulating code to meet new restrictions. The other needs to be able to make snap judgments and in realtime avoid hitting things even if it just means stopping to prefer safety over motion. One problem can be solved piecewise without significant regard to time and the other solved in realtime as it happens without producing unsafe output.
These problems really aren't analogous.
I think you picked self driving cars just because it is a big and only partially solved problem. One could just as easily pick a big solved problem or a big unstarted problem and formulate equally bad arguments.
I am not saying this problem is easy, just that it seems solvable with sufficient effort.
I'd put money on the solutions to said problems looking largely the same though - big ass machine learning models.
My prediction is that a tool like copilot (but specialized to this domain) will do the bulk of source code conversions, with a really smart human coming behind to validate.
Which are things that took 20 or 50 years longer than expected in some cases.
But C to Rust translation is a big and only partially solved problem.
Ok, but if it's like 90% of small projects can use it as direct no pain bridge, that can be a huge win.
Even if it's "can handle well 90%" of the transition for any project, this is still interesting. Unlike cars on the road, most code transition project out there doesn't need to be 100% fine to provide some useful value.
Even if every project can only be 90% done, that’s a huge win. Best would be if it could just wrap the C equivalent code into an unsafe block which would be automatically triaged for human review.
Just getting something vaguely Rust shaped which can compile is the first step in overcoming the inertia to leave the program in its current language.
c2rust exists today, and pretty much satisfies this. I've used it to convert a few legacy math libraries to unsafe rust, and then been able to do the unsafe->safe refactor in the relative comfort of the full rust toolset (analyser + IDE + tests)
There is real utility in slowly fleshing out the number of transforms in a tool like c2rust that can recognise high-level constructs in C code and produce idiomatic safe equivalents in rust
In addition to the other replies, this is a one-time project. After everything (or almost everything) has been translated, you're done, you won't be running into new edge cases.
Well, Claude 3.5 can do translation from one language to another in a fairly competent manner if the languages are close enough. I've used it for that task myself with success (Java -> JavaScript).
But, this isn't just about rewriting code from one language to another. It's about reverse engineering complex information out of the code, which may not be immediately visible in it, and then finding a way to make it "safe" according to Rust's type system. Where's the training data for that? It'd be really hard even for skilled humans.
Personally I think the most pragmatic way to make C/C++ memory safe quicker is one of two approaches:
1. Incrementally. Make std::vector[] properly bounds checked (still not done even in chrome!), convert allocations to allocations that know their own size and do bounds checking e.g. https://issues.chromium.org/issues/40285824
2. Or, go the whole hog and use runtime techniques like garbage collection and runtime bounds checks.
A good example of approach (2) is Managed Sulong, which extends the JVM to execute LLVM bitcode directly whilst exposing to the C/C++/FORTRAN a virtualized Linux syscall interface. The whole piece of code can be sandboxed with permissions, and memory safety errors are caught at runtime. The compiler tries to optimize out as many bounds checks as possible. The interesting thing about this approach is it doesn't require big changes to the source code (as long as it's already been ported to Linux), which means the work of making something safe can be done by teams independent of the original authors. In practice "rewrite it in Rust" will usually mean a fork, which introduces lots of complicated technical, cultural and economic issues.
Managed Sulong is also a research project and has a bunch of problems to solve, for instance it needs to lose the JITC dependency and go fully AOT compiled (doable, there's no theoretical issue with it and much of the needed infra already exists). And performance/memory usage can always be improved of course, it regresses vs the original C. But those are "just" systems engineering problems, not rewrite-the-world and solve-static-analysis problems.
Disclosure: I do work part time at Oracle Labs which developed Managed Sulong, but I don't work on it.
std::vector [] has had bounds checking since forever if you set the correct compiler flag. Since they aren't using it this is a choice, presumably they prefer the speed gain.
You mean _GLIBCXX_DEBUG? It's got some issues. Linux only, it doesn't always work [1] and it's all or nothing. What's really needed is the ability to selectively opt-out on a per-instantiation level so very hot paths can keep the needed performance whilst all the rest gets opted into safety checks.
Microsoft has this:
https://learn.microsoft.com/en-us/cpp/standard-library/safe-...
but it doesn't seem to actually make std::vector[] safe.
It's frustrating that low hanging fruit like this doesn't get harvested.
[1] "although there are precondition checks for some string operations, e.g. operator[], they will not always be run when using the char and wchar_t specializations (std::string and std::wstring)."
With MSVC you can use _CONTAINER_DEBUG_LEVEL=1 to get a fast bounds check that can be used in release builds. Or just use it in development to catch errors.
Interesting thanks. Seems the reason I couldn't find anything on that is because it's internal only and not a feature you're actually meant to use?
https://github.com/microsoft/STL/issues/586
> We talked about this at the weekly maintainer meeting and decided that we're not comfortable enough with the (lack of) design of this feature to begin documenting it for wide usage.
What you want should be _ITERATOR_DEBUG_LEVEL instead, that is the public macro for bounds checking configuration.
As far as I am aware, the standard doesn't mandate bounds checking for std::vector::operator[] and probably never will for backwards compatibility reasons. Most standard library implementations have opt-out std::vector[] bounds checking in unoptimized builds, but not in optimized builds.
I tried a toy example with GCC [1], Clang [2], and MSVC [3], and none of them emit bounds checks with basic optimization flags.
[1] https://godbolt.org/z/W5e3n5oWM
[2] https://godbolt.org/z/Pe8nPPvEd
[3] https://godbolt.org/z/YTdv3nabn
As I said you need the correct flag set.. MSVC use _CONTAINER_DEBUG_LEVEL=1 and it can be used in release. They have had this feature since 2010 or so, though the flag name has changed.
The correct name is _ITERATOR_DEBUG_LEVEL.
Add a "#define _ITERATOR_DEBUG_LEVEL 1" on top for VC++.
That might not be too bad.
A combination of a formal system and an LLM might work here. Suppose we see a C function
First question: is "buf" a pointer to an array, or a pointer to a single char? That can be answered by looking at what the function does with "buf", and what callers pass to it.If it's an array, how big is it? We don't have enough info to know that yet. But a reasonable guess, and one than an LLM might make, is that the length of buf is "n".
Following that assumption, it's reasonable to translate this to Rust as
and, if n is needed within the function, use The next step is to validate that guess. The run-time approach is to write all calls to "somefn" with Maybe formal methods can prove the assert true, and we can take it out. Or if a SAT solver or a fuzz tester can generate a counterexample, we know that the guess was wrong and this has to be done the hard way, as implying more subscript checks inside "somefn".The idea is to recognize common C idioms and do clean translations to Rust for them. This should handle a high percentage of cases.
Yes, this is similar to what IntelliJ does for Java->Kotlin. Do a first pass that's extremely non-idiomatic and mechanical, then do lots of automated refactoring to bring it closer to idiomatic.
But if you're going to do it that way, the right place to start is probably to a safer form of C++ not Rust. That way code can be ported file-at-a-time or even function-at-a-time, and so you'll have a chance to run the assertions in the context of the original code. Which of course may not have good test coverage, as C codebases often don't, so you'll have to be testing your assertions in production.
There's something to be said for that. You're going to need at least an internal representation that's a safe C/C++.
Most compilers do have flags to turn this on, which I use all the time.
The issue is the "performance trumps safety" culture that pushes back against using them.
As a reminder, DARPA funded self-driving car research since at least the 1980s with the Autonomous Land driven Vehicle (ALV) project, plus the DARPA Grand Challenges, and more.
It's very hard; DARPA likes to fund hard things[1] :-).
This isn't, however, DARPA's first foray into automatic program translation, or even automatic translation into Rust[2].
[1]: https://www.urbandictionary.com/define.php?term=DARPA%20hard
[2]: https://c2rust.com/
in this case it seems to me the hard task that DARPA has chosen is to get me to forget how much they spent on pushing Ada.
in this case it seems to me the hard task that DARPA has chosen is to get me to forget how much they spent on pushing Ada.
You hate jumbo jets, high-speed trains, air traffic control, and satellites?
Do you know what fear is? Getting in an airplane where the flight controls use NPM.
I have enough fears about features in the entertainment system, and that performance options are accessed through that same touch screen UX.
ada does not require 'pushing'.
once the maturity of the users advances to a sufficient point, then ada is the only solution.
"ada. used in creating reliable software since 1983"
when i first saw ada, i didn't understand the why. now i understand the why, but ada is effectively gone.
-- old fortran / C / Assembly programmer
Ada is still around, at a big enough level to keep 7 commercial vendors selling compilers.
Something unheard of, paying for software tools in 2024, who would imagine that.
it was depressing when RH dropped ada support. sure, it was gcc, but it was so nice to have an ada compiler part of the default gcc installation.
gnat needs money. well deserved. but adoption needs a free, easy to install compiler.
5 years ago i had the pleasure of resurrecting a dead system. it was about 30k of ada, lets call it ada 87 (!). unknown compiler, 32 bit, 68K processor, 16 MB memory, unknown OS.
code was compiling in 2 days, running in 2 weeks. i needed to change from using 32 bit floats to 64 bit floats (seems positional data is a little more accurate in 2020). 1 declaration in 1 package spec and a recompile, and all my positions are good.
i love that language!
Oh, it's around, but laypeople never see those codebases.
I can't find any clear references to DARPA (or ARPA) being involved in Ada's development. It was a DoD program but, well, the DoD is notoriously large and multi-headed.
(But even if DARPA was involved in Ada: I think it's clear, at this point, that Ada has been a resounding success in a small number of domains without successfully breaking into general-purpose adoption. I don't have a particular value judgment associated with that, but from a strategic perspective it makes a lot of sense for DARPA to focus program analysis research on popular general-purpose languages -- there's just more labor and talent available.)
Too lazy to look it up, but I'm pretty sure DARPA was involved and certain that DoD contracta prioritized ADA for a long time.
Too bored to pass up a challenge to refute somebody who is too lazy to look it up.
I looked it up. DARPA was not involved.
DARPA is basically a state-sponsored VC that optimizes for completely different things. Instead of looking for 100x financial returns, they want technical advantages for the United States. The "moat" is the hardness of developing and operationalizing those technologies first.
DARPA's commercialization track record is decidedly mixed, so the VC comparison is unexpectedly apt :-)
(But yes: DARPA's mandate is explicitly to discover and develop the next generation of emerging technologies for military use.)
If you count my number of attempts, sure.
If you count by impact, it's hard to come up with many things more impactful than the Internet...?
Yeah, I meant by number. But also: ARPA didn't commercialize the Internet! They explicitly refused to commercialize it; commercialization only happened after an Act of Congress induced interconnections between NSFNET and commercial networks.
Decades ago, as my father explained to me, ARPA (no "D" at that time) was happy if 1% of their projects went all the way through to successful deployment. If they had a higher success rate it would mean they weren't aiming high enough.
To be pedantic, In-q-tel is the literal state-sponsored VC.
DARPA is a step closer to traditional research labs but there is obviously some overlap.
https://en.wikipedia.org/wiki/In-Q-Tel
> DARPA is a step closer to traditional research labs but there is obviously some overlap.
It's more like the NSF but focused on commercial grantees with project management thrown on top to orchestrate everything.
The really unique part is how much independence each program manager has and the term limits that prevent empire building.
I have to imagine that in the general case it will be a translation to unsafe Rust, with occasional isolated leaf nodes being translated to safe Rust.
If you think it's hard wrestling with the borrow checker, just imagine how much harder it is to write automatic translation to borrow-checker-approved code that accounts for all the possible program space of C and all it's celebrated undefined behavior. A classic problem of writing compilers is that the space of valid programs is much larger than the space of programs which will compile.
A quick web search reveals some other efforts, such as c2rust [1]. I wonder how TRACTOR differs.
[1] https://github.com/immunant/c2rust
That’s not what they are aiming for. FTA: “The goal is to achieve the same quality and style that a skilled Rust developer would produce”
Nitpick: undefined behavior gives the compiler leeway in deciding what a program does, so the more undefined behavior a C program invokes, the easier it is to translate its code to rust.
(Doing that translation in such a way that the behavior remains what gcc, clang or “most C compilers” do may be harder, but I’m not sure of that)
That's the kind of language lawyer approach that caused a rebellion in the last decade amongst C programmers against irresponsible compiler optimizations. "Who cares if your program actually works as intended? My optimization is legal according to the standard, it's your program that's written to exploit loopholes".
I don't see any evidence that that's the attitude being taken by TRACTOR — I sure hope it isn't. But hell, even if the result is unreliable in practice, I suppose that if somebody gets to claim "it works" then the incentives are aligned to produce garbage.
If your program invokes undefined behaviour, it's invalid and non-portable. Out of bounds array accesses are UB, yet a program containing them may just happen to work. It won't be portable even between different compiler versions.
The C standard is a 2 way contract: the programmer doesn't produce code that invokes undefined behaviour, and the compiler returns a standard conforming executable
If undefined behavior is invalid, then reject the program instead of "optimizing" it. This "oh look undefined behavior I'm gonna turn the entire function into a no-op" nonsense is completely unacceptable. It's adversarial and borders on malicious. Null pointer check deletion can turn bugs into exploitable vulnerabilities.
Undefined behavior is usually a result of runtime situation, it is usually not obvious from just the code whether it could or could not happen, so the compiler cannot reject the program.
The 'UB-based' optimization is just assumption that the code is correct and therefore UB-situation could not happen in runtime.
Usually but not always. For example, the removal of an empty effect free infinite loop. This should be an error.
The C++ forward progress guarantee enables more optimizations since it allows the compiler to reason more easily about loops:
But yeah, that's one of the more foot-gunny UB rules that Rust does not have. But it does mean it doesn't mark functions as `mustprogress` in LLVM IR which means it misses out on whatever optimizations that enables.
You significantly underestimate how much UB people write and overestimate the end-result if the current approach would not be taken.
The C standard with its extensive undefined behavior causes programmers and compiler writers to be at odds. In a sane world, "undefined behavior" wouldn't be assumed to mean "the programmer must have meant for me to optimize this whole section of code away". We aren't on the same team, even if I believe that all parties are acting with the best of intentions.
I don't feel that the Rust language situation incentivizes such awful conflict, and it's one of many reasons I now try really hard to avoid C and use Rust instead.
A funny thing about this problem is that it gets worse the more formally correct your implementation is. Undefined behavior is undefined, so it's outside the model, and if your program is a 100% correct implementation of a model then how can it know what to do about something outside it?
But I don't think defining all behavior helps. The defined behavior could be /wrong/, and now you can't find it because the program using it is valid, so it can't be detected with UBSan.
Doing one funny thing on platform A and a different funny thing on platform B when an edge case arises is way better than completely deleting the code on all platforms with no warning.
I don’t see any way it can do otherwise. As a simple example, what would one translate this C statement to:
? I would expect TRACTOR to generate (assuming 64-bit integers): However, that can panic in debug mode and return a negative number in release mode (https://doc.rust-lang.org/stable/std/primitive.i64.html#meth...), and there’s no way for TRACTOR to know whether that makes the program “work as intended”. That code may have worked fine/fine enough) for decades because its standard library returns zero for abs(INT_MIN).It's possible to preserve the semantics of the original program using unsafe Rust. [1]
That's grotesque, but it is idiomatic Rust insofar as it lays bare many of the assumptions in the C code and gives the programmer the opportunity to fix them. It is what I would personally want TRACTOR to generate if it could not prove that `i` can never take on the value `libc::INT_MIN`.Given that generated code, I could then piecemeal migrate the unsafe bits to cleaner, idiomatic safe rust: possibly your code but more likely `i::wrapping_abs()` or similar.
What will TRACTOR choose? At least for this example, they don't have to choose inappropriate pruning of undefined behavior. They claim the following:
If they're going to uphold the same "quality", the translation you presented doesn't cut it. But you may be right and they will go down the path of claiming that a garbage translation is technically valid under undefined behavior and therefore”quality” — if so, I will shun them.
[1] https://play.rust-lang.org/?version=stable&mode=debug&editio...
You assume that the compiler can determine what behavior is undefined. It can't. C compilers don't just look at some individual line of the program and say "oh, that's undefined, unleash the nasal demons". C compilers look at code, reason that if such-and-such variable has a certain value (say, a null or invalid pointer), then such-and-such operation is undefined (say, dereferencing that variable), and therefore on the next line that variable can be assumed not to have that bad value. Despite all the FUD, this is a very limited power. C compilers don't usually know the actual values in question, all they do is exclude some invalid ones.
I (not the person you are replying to) do understand that's how compilers interact with UB. However, a wealth of experience has shown us that the assumption "UB doesn't occur" is completely false. It is, in my opinion, quite irresponsible for compiler writers to continue to use a known-false assumption when building the optimizer. I don't really care how much speed it costs, we need to stop building software on a shaky foundation like that.
Soon (or actually, already) we'll have MTE and CHERI, and then that C undefined behavior will be giving you security improvements as well as speed improvements.
Can't design a system that 100% crashes on invalid behavior if you've declared that behavior is valid, because then someone is relying on it.
Write tests for your C code. Run c2rust (mechanical translation), including the tests. Let a LLM/MCTS/verifier loop go to town. Verifier here means it passes compiler checks, tests, santiziers and miri.
Additional training data can be generated by running mrustc or by inlining unsafe code (from std/core/leaf crates) into safe code and running semantics-preserving mechanical refactorings on the code.
This can be closer to AlphaProof than ChatGPT
You could already use ASAN + UBSan, or Frama-C.
I did mention using sanitizers in the verification step of the optimization loop. The optimization goal here would be reducing the lines of `unsafe` while preserving program semantics.
Essentially neural program synthesis
speaking of hard, the DOE actually funds a project that has been around for 20+ years now (ROSE) that involves (among other things) doing static analysis on and automatically translating between C/C++/Cuda and even high level languages like Python as well as HPC variants of C/C++. They have a combined AST that supports all of those languages with the same set of node types essentially. Quite cool. I got to work on it when I was an intern at Livermore, summer of 2014.
and it's open source as well! http://rosecompiler.org/ROSE_HTML_Reference/index.html
I can't believe that works at all. I'll take a look for sure.
Most of what they use it for is static analysis, but the funding comes from its ability to translate old simulation code to HPC-ready code. I think they even support fortran IIRC
I have already seen legacy projects that were designed using Rational Rose, but for some reason I thought it was only a commercial name, not an actual system. Thanks, I learned something today !
presumably dan wouldn't have gotten darpa funding if it were obviously feasible, and success wouldn't give him anything publishable academically
Just to be clear to others, Dan is the darpa PM on this - he convinced darpa internally it was worth funding other people to do the work, so he himself / his research group won't be doing this work. He's on leave from Rice for a few years to be a PM at DARPA's I2O.
And while DARPA doesn't directly care about research publications as an outcome, there's certainly a publishable research component to this, as well as a lot of lower papers-per-$ engineering and validation work. A lot of the contracts they hand out end up going to some kind of contractor prime (BBN, Raytheon, that kind of company) with one or more academic subs. The academic subs publish.
thank you for the correction; I didn't realize he was the darpa pm
what you describe is exactly my experience as a darpa performer (on a program which dan is apparently now the pm for!)
If the IRS could have more timely funding, all their Cobol would be translated to Java by now
COBOL migrations are tar pits of replicating 40+ years of undocumented niche business logic for a given field, edge cases included, that was "commonly understood" by people who are now retired or dead. Don't get your hopes up.
MicroFocus has COBOL compilers for Java and .NET, as do other COBOL vendors still in business.
Usually the biggest issue, is that most of the porting attempts don't start there, rather they go for the rewritte from scratch, and lets not pay the licenses for those cross-compilers.
Can't most c++ be machine-lowered to C?
Lowering is typically easier than lifting (or brightening). When you lower, you can erase higher-level semantics that aren't relevant; when you lift, you generally want to compose lower-level program behaviors into their idiomatic (and typically safer) equivalent.
Yes, that is after all how C++ started.
How good the resulting performance would be like, that is another matter.
I have to think the approach will be something like "AI summarizes the features of the program into some kind of technical language, then the AI synthesizes Rust code that covers the same feature set".
It would be most interesting if the approach was not to feed the program the original program but rather the manual for the program. That said it's rare that a manual captures all of the nuances of the program so a view into the source code is probably necessary, at least for getting the ground truth.
More like:
"AI more or less sort of summarizes the features of the program into some approximate kind of technical language, then the AI synthesizes something not too far from Rust code that hopefully covers aspirationally the same feature set".
Projects are termed DARPA-hard for a reason.
You're just asking for people to bring out their pitchforks :P
Man, I want to upvote this but…
Really?! The Linux kernel is a _pretty enormous_ counterexample, as are many of the userland tools of most desktop Linux distros.
I am also a key developer of an entirely-written-in-C tool which I'd venture that [a large fraction of desktop Linux users in corporate environments use on a regular basis](https://gitlab.com/openconnect/openconnect).