return to table of content

Translating All C to Rust (TRACTOR)

mike_hearn
90 replies
1d2h

That sounds ... hard. Especially as idiomatic Rust as written by skilled programmers looks nothing like C, and most interesting code is written in C++ anyway.

Isn't it equivalent to statically determining the lifetimes of all allocations in the C program, including those that are implemented using custom allocators or which cross into proprietary libraries? There's been a lot of research into this sort of thing over the years without much success. C/C++ programs can do things like tie allocation lifetimes to what buttons a user clicks, without ref counting or other mechanisms to ensure safety. It's not a good idea, but, they can do it.

The other obvious problem with trying to write such a static analysis is that the programs you're analyzing are by definition buggy and the lifetimes might not make sense (if they did, they wouldn't have memory safety holes and wouldn't need to be replaced). The only research I've seen on this problem of statically detecting what lifetimes should be does assume the code being analyzed is actually correct to begin with. I guess you could try and aim for a program that detects where lifetimes can't be worked out and asks the developer for help though.

childintime
30 replies
1d2h

Hard for humans. But it's DARPA, is it hard for AI? Image classification used to be hard also, today cars drive themselves.

I'd say it's good timing.

Calavar
14 replies
1d1h

today cars drive themselves

You can attach about a hundred asterisks to that.

If anything, I think self the failure to hit L5 driving after billions of dollars and millions of man hours invested is probably reflective of how automatic C to Rust translation will go. We'll cruise 90% of the way, but the last 10% will prove insurmountable with current technology.

Think about the number of C programs in the wild that rely on compiler-specific or libc-specific or platform-specific behavior, or even undefined behavior plus the dumb luck of a certain brittle combination of {compiler version} ∩ {libc version} ∩ {linker version} ∩ {build flags} emitting workable machine code. There's a huge chunk of C software where there's not enough context within the source itself (or even source plus build scripts) to understand the behavior. It's not even clear that this is a solvable problem in the abstract.

None of that is to say that DARPA shouldn't fund this. Research isn't always about finding an industrial strength end product; the knowledge and expertise gained along the way is important too.

programd
6 replies
1d1h

> today cars drive themselves

You can attach about a hundred asterisks to that.

Not in San Francisco. There are about 300 Waymo cars safely driving in one of the most difficult urban environments around (think steep hills, fog, construction, crazy traffic, crazy drivers, crazier pedestrians). Five years ago this was "someday" science-fiction. Frankly I trust them much more then human drivers and envision a future utopia where human drivers are banned from urban centers.

To get back on topic, I don't think automatic programming language translation is nearly as hard, especially since we have a deterministic model of the machines it runs on. I can see a possible approach where AI systems take the assembler code of a C++ program, then translate that into Rust, or anything else. Can they get 100% accuracy and bit-for-bit compatibility on output? I would not bet against it.

m0llusk
2 replies
22h23m

Opinions about automated driving systems vary. Just from my own experience doing business all around San Francisco I have seen at least a half dozen instances of Waymo vehicles making unsafe maneuvers. Responders have told me and local government officials that Waymo vehicles frequently fail to acknowledge emergency situations or respond to driving instructions. Driving is a social exercise which requires understanding of a number of abstractions.

fragmede
1 replies
13h50m

they're not perfect, sure, but they're out there, just driving around all autonomously and all, contrary to GGP's assertion that they don't exist.

cuu508
0 replies
12h42m

GGGP talked about L5 self-driving, isn't Waymo L4?

saagarjha
0 replies
21h19m

San Francisco, for all its challenges, mostly has traffic laws that people follow. This is not true throughout the world.

creata
0 replies
22h25m

Isn't 100% accuracy (relatively) easy? c2rust already does that, or at least comes close, as far as I know.

Getting identical outputs on safe executions, catching any unsafe behavior (at translation-time or run-time), and producing efficient, maintainable code all at once is a million times harder.

TestingWithEdd
0 replies
13h31m

Limited to specific areas during specific hours, and have caused crashes (at least when I lived there till last summer).

sqeaky
2 replies
22h13m

This is the exact formulation of the argument before computers beat humans at chess, or drew pictures, or represented color correctly, or... Self driving cars will be solved. There is at least one general purpose computer that can solve it already (a human brain), so of a purpose built computer can also be made to solve it.

In 10 (or 2 or 50 or X) years when Chevy, Ford, and others are rolling out cheap self driving this argument stops working. The important thing is that this argument stops working with no change in how hard C to Rust conversion is.

We really should be looking at the specifics of both problems. What makes computer language translation hard? Why is driving hard? One needs to be correct while inferring intent and possibly reformulating code to meet new restrictions. The other needs to be able to make snap judgments and in realtime avoid hitting things even if it just means stopping to prefer safety over motion. One problem can be solved piecewise without significant regard to time and the other solved in realtime as it happens without producing unsafe output.

These problems really aren't analogous.

I think you picked self driving cars just because it is a big and only partially solved problem. One could just as easily pick a big solved problem or a big unstarted problem and formulate equally bad arguments.

I am not saying this problem is easy, just that it seems solvable with sufficient effort.

mywittyname
0 replies
21h31m

These problems really aren't analogous.

I'd put money on the solutions to said problems looking largely the same though - big ass machine learning models.

My prediction is that a tool like copilot (but specialized to this domain) will do the bulk of source code conversions, with a really smart human coming behind to validate.

lmm
0 replies
17h2m

This is the exact formulation of the argument before computers beat humans at chess, or drew pictures, or represented color correctly, or...

Which are things that took 20 or 50 years longer than expected in some cases.

I think you picked self driving cars just because it is a big and only partially solved problem. One could just as easily pick a big solved problem or a big unstarted problem and formulate equally bad arguments.

But C to Rust translation is a big and only partially solved problem.

psychoslave
2 replies
1d1h

Ok, but if it's like 90% of small projects can use it as direct no pain bridge, that can be a huge win.

Even if it's "can handle well 90%" of the transition for any project, this is still interesting. Unlike cars on the road, most code transition project out there doesn't need to be 100% fine to provide some useful value.

0cf8612b2e1e
1 replies
1d

Even if every project can only be 90% done, that’s a huge win. Best would be if it could just wrap the C equivalent code into an unsafe block which would be automatically triaged for human review.

Just getting something vaguely Rust shaped which can compile is the first step in overcoming the inertia to leave the program in its current language.

swiftcoder
0 replies
9h11m

c2rust exists today, and pretty much satisfies this. I've used it to convert a few legacy math libraries to unsafe rust, and then been able to do the unsafe->safe refactor in the relative comfort of the full rust toolset (analyser + IDE + tests)

There is real utility in slowly fleshing out the number of transforms in a tool like c2rust that can recognise high-level constructs in C code and produce idiomatic safe equivalents in rust

D-Coder
0 replies
1d1h

In addition to the other replies, this is a one-time project. After everything (or almost everything) has been translated, you're done, you won't be running into new edge cases.

mike_hearn
13 replies
1d1h

Well, Claude 3.5 can do translation from one language to another in a fairly competent manner if the languages are close enough. I've used it for that task myself with success (Java -> JavaScript).

But, this isn't just about rewriting code from one language to another. It's about reverse engineering complex information out of the code, which may not be immediately visible in it, and then finding a way to make it "safe" according to Rust's type system. Where's the training data for that? It'd be really hard even for skilled humans.

Personally I think the most pragmatic way to make C/C++ memory safe quicker is one of two approaches:

1. Incrementally. Make std::vector[] properly bounds checked (still not done even in chrome!), convert allocations to allocations that know their own size and do bounds checking e.g. https://issues.chromium.org/issues/40285824

2. Or, go the whole hog and use runtime techniques like garbage collection and runtime bounds checks.

A good example of approach (2) is Managed Sulong, which extends the JVM to execute LLVM bitcode directly whilst exposing to the C/C++/FORTRAN a virtualized Linux syscall interface. The whole piece of code can be sandboxed with permissions, and memory safety errors are caught at runtime. The compiler tries to optimize out as many bounds checks as possible. The interesting thing about this approach is it doesn't require big changes to the source code (as long as it's already been ported to Linux), which means the work of making something safe can be done by teams independent of the original authors. In practice "rewrite it in Rust" will usually mean a fork, which introduces lots of complicated technical, cultural and economic issues.

Managed Sulong is also a research project and has a bunch of problems to solve, for instance it needs to lose the JITC dependency and go fully AOT compiled (doable, there's no theoretical issue with it and much of the needed infra already exists). And performance/memory usage can always be improved of course, it regresses vs the original C. But those are "just" systems engineering problems, not rewrite-the-world and solve-static-analysis problems.

Disclosure: I do work part time at Oracle Labs which developed Managed Sulong, but I don't work on it.

TinkersW
8 replies
1d1h

std::vector [] has had bounds checking since forever if you set the correct compiler flag. Since they aren't using it this is a choice, presumably they prefer the speed gain.

mike_hearn
3 replies
1d1h

You mean _GLIBCXX_DEBUG? It's got some issues. Linux only, it doesn't always work [1] and it's all or nothing. What's really needed is the ability to selectively opt-out on a per-instantiation level so very hot paths can keep the needed performance whilst all the rest gets opted into safety checks.

Microsoft has this:

https://learn.microsoft.com/en-us/cpp/standard-library/safe-...

but it doesn't seem to actually make std::vector[] safe.

It's frustrating that low hanging fruit like this doesn't get harvested.

[1] "although there are precondition checks for some string operations, e.g. operator[], they will not always be run when using the char and wchar_t specializations (std::string and std::wstring)."

TinkersW
2 replies
17h53m

With MSVC you can use _CONTAINER_DEBUG_LEVEL=1 to get a fast bounds check that can be used in release builds. Or just use it in development to catch errors.

mike_hearn
1 replies
10h31m

Interesting thanks. Seems the reason I couldn't find anything on that is because it's internal only and not a feature you're actually meant to use?

https://github.com/microsoft/STL/issues/586

> We talked about this at the weekly maintainer meeting and decided that we're not comfortable enough with the (lack of) design of this feature to begin documenting it for wide usage.

pjmlp
0 replies
6h24m

What you want should be _ITERATOR_DEBUG_LEVEL instead, that is the public macro for bounds checking configuration.

Calavar
3 replies
1d1h

As far as I am aware, the standard doesn't mandate bounds checking for std::vector::operator[] and probably never will for backwards compatibility reasons. Most standard library implementations have opt-out std::vector[] bounds checking in unoptimized builds, but not in optimized builds.

I tried a toy example with GCC [1], Clang [2], and MSVC [3], and none of them emit bounds checks with basic optimization flags.

[1] https://godbolt.org/z/W5e3n5oWM

[2] https://godbolt.org/z/Pe8nPPvEd

[3] https://godbolt.org/z/YTdv3nabn

TinkersW
1 replies
17h51m

As I said you need the correct flag set.. MSVC use _CONTAINER_DEBUG_LEVEL=1 and it can be used in release. They have had this feature since 2010 or so, though the flag name has changed.

pjmlp
0 replies
6h23m

The correct name is _ITERATOR_DEBUG_LEVEL.

pjmlp
0 replies
6h20m

Add a "#define _ITERATOR_DEBUG_LEVEL 1" on top for VC++.

Animats
2 replies
22h0m

But, this isn't just about rewriting code from one language to another. It's about reverse engineering complex information out of the code, which may not be immediately visible in it, and then finding a way to make it "safe" according to Rust's type system. Where's the training data for that? It'd be really hard even for skilled humans.

That might not be too bad.

A combination of a formal system and an LLM might work here. Suppose we see a C function

   void somefn(char* buf, int n);
First question: is "buf" a pointer to an array, or a pointer to a single char? That can be answered by looking at what the function does with "buf", and what callers pass to it.

If it's an array, how big is it? We don't have enough info to know that yet. But a reasonable guess, and one than an LLM might make, is that the length of buf is "n".

Following that assumption, it's reasonable to translate this to Rust as

   fn somefn(buf: &[u8])
and, if n is needed within the function, use

   buf.len()
The next step is to validate that guess. The run-time approach is to write all calls to "somefn" with

   assert!(buf.len() == n);
   somefn(buf, n);
Maybe formal methods can prove the assert true, and we can take it out. Or if a SAT solver or a fuzz tester can generate a counterexample, we know that the guess was wrong and this has to be done the hard way, as

   fn somefn(buf: &[u8], int n)
implying more subscript checks inside "somefn".

The idea is to recognize common C idioms and do clean translations to Rust for them. This should handle a high percentage of cases.

mike_hearn
1 replies
10h33m

Yes, this is similar to what IntelliJ does for Java->Kotlin. Do a first pass that's extremely non-idiomatic and mechanical, then do lots of automated refactoring to bring it closer to idiomatic.

But if you're going to do it that way, the right place to start is probably to a safer form of C++ not Rust. That way code can be ported file-at-a-time or even function-at-a-time, and so you'll have a chance to run the assertions in the context of the original code. Which of course may not have good test coverage, as C codebases often don't, so you'll have to be testing your assertions in production.

Animats
0 replies
1h6m

But if you're going to do it that way, the right place to start is probably to a safer form of C++ not Rust.

There's something to be said for that. You're going to need at least an internal representation that's a safe C/C++.

pjmlp
0 replies
11h50m

Make std::vector[] properly bounds checked

Most compilers do have flags to turn this on, which I use all the time.

The issue is the "performance trumps safety" culture that pushes back against using them.

eesmith
0 replies
1d1h

As a reminder, DARPA funded self-driving car research since at least the 1980s with the Autonomous Land driven Vehicle (ALV) project, plus the DARPA Grand Challenges, and more.

fsckboy
11 replies
22h33m

in this case it seems to me the hard task that DARPA has chosen is to get me to forget how much they spent on pushing Ada.

reaperducer
3 replies
21h38m

in this case it seems to me the hard task that DARPA has chosen is to get me to forget how much they spent on pushing Ada.

You hate jumbo jets, high-speed trains, air traffic control, and satellites?

9659
2 replies
21h30m

Do you know what fear is? Getting in an airplane where the flight controls use NPM.

warkdarrior
0 replies
21h17m

   npm ERR! install Couldn't read dependencies
   npm ERR! package.json ENOENT, open '/boeing/787-9/flaps-up.json'
   npm ERR! package.json This is most likely not a problem with npm itself.
   npm ERR! package.json npm can't find a package.json file in your current directory.

tracker1
0 replies
3h30m

I have enough fears about features in the entertainment system, and that performance options are accessed through that same touch screen UX.

9659
3 replies
21h31m

ada does not require 'pushing'.

once the maturity of the users advances to a sufficient point, then ada is the only solution.

"ada. used in creating reliable software since 1983"

when i first saw ada, i didn't understand the why. now i understand the why, but ada is effectively gone.

-- old fortran / C / Assembly programmer

pjmlp
1 replies
11h57m

Ada is still around, at a big enough level to keep 7 commercial vendors selling compilers.

Something unheard of, paying for software tools in 2024, who would imagine that.

9659
0 replies
5h31m

it was depressing when RH dropped ada support. sure, it was gcc, but it was so nice to have an ada compiler part of the default gcc installation.

gnat needs money. well deserved. but adoption needs a free, easy to install compiler.

5 years ago i had the pleasure of resurrecting a dead system. it was about 30k of ada, lets call it ada 87 (!). unknown compiler, 32 bit, 68K processor, 16 MB memory, unknown OS.

code was compiling in 2 days, running in 2 weeks. i needed to change from using 32 bit floats to 64 bit floats (seems positional data is a little more accurate in 2020). 1 declaration in 1 package spec and a recompile, and all my positions are good.

i love that language!

Avamander
0 replies
9h6m

Oh, it's around, but laypeople never see those codebases.

woodruffw
2 replies
22h29m

I can't find any clear references to DARPA (or ARPA) being involved in Ada's development. It was a DoD program but, well, the DoD is notoriously large and multi-headed.

(But even if DARPA was involved in Ada: I think it's clear, at this point, that Ada has been a resounding success in a small number of domains without successfully breaking into general-purpose adoption. I don't have a particular value judgment associated with that, but from a strategic perspective it makes a lot of sense for DARPA to focus program analysis research on popular general-purpose languages -- there's just more labor and talent available.)

indolering
1 replies
13h33m

Too lazy to look it up, but I'm pretty sure DARPA was involved and certain that DoD contracta prioritized ADA for a long time.

rerdavies
0 replies
10h45m

Too bored to pass up a challenge to refute somebody who is too lazy to look it up.

I looked it up. DARPA was not involved.

the_snooze
6 replies
1d1h

DARPA is basically a state-sponsored VC that optimizes for completely different things. Instead of looking for 100x financial returns, they want technical advantages for the United States. The "moat" is the hardness of developing and operationalizing those technologies first.

woodruffw
3 replies
1d1h

DARPA's commercialization track record is decidedly mixed, so the VC comparison is unexpectedly apt :-)

(But yes: DARPA's mandate is explicitly to discover and develop the next generation of emerging technologies for military use.)

VikingCoder
1 replies
21h35m

DARPA's commercialization track record is decidedly mixed...

If you count my number of attempts, sure.

If you count by impact, it's hard to come up with many things more impactful than the Internet...?

woodruffw
0 replies
21h28m

Yeah, I meant by number. But also: ARPA didn't commercialize the Internet! They explicitly refused to commercialize it; commercialization only happened after an Act of Congress induced interconnections between NSFNET and commercial networks.

pfdietz
0 replies
21h55m

Decades ago, as my father explained to me, ARPA (no "D" at that time) was happy if 1% of their projects went all the way through to successful deployment. If they had a higher success rate it would mean they weren't aiming high enough.

mburns
1 replies
22h22m

To be pedantic, In-q-tel is the literal state-sponsored VC.

DARPA is a step closer to traditional research labs but there is obviously some overlap.

https://en.wikipedia.org/wiki/In-Q-Tel

throwup238
0 replies
21h58m

> DARPA is a step closer to traditional research labs but there is obviously some overlap.

It's more like the NSF but focused on commercial grantees with project management thrown on top to orchestrate everything.

The really unique part is how much independence each program manager has and the term limits that prevent empire building.

rectang
16 replies
22h39m

I have to imagine that in the general case it will be a translation to unsafe Rust, with occasional isolated leaf nodes being translated to safe Rust.

If you think it's hard wrestling with the borrow checker, just imagine how much harder it is to write automatic translation to borrow-checker-approved code that accounts for all the possible program space of C and all it's celebrated undefined behavior. A classic problem of writing compilers is that the space of valid programs is much larger than the space of programs which will compile.

A quick web search reveals some other efforts, such as c2rust [1]. I wonder how TRACTOR differs.

[1] https://github.com/immunant/c2rust

Someone
15 replies
21h46m

have to imagine that in the general case it will be a translation to unsafe Rust, with occasional isolated leaf nodes being translated to safe Rust.

That’s not what they are aiming for. FTA: “The goal is to achieve the same quality and style that a skilled Rust developer would produce”

just imagine how much harder it is to write automatic translation to borrow-checker-approved code that accounts for all the possible program space of C and all it's celebrated undefined behavior

Nitpick: undefined behavior gives the compiler leeway in deciding what a program does, so the more undefined behavior a C program invokes, the easier it is to translate its code to rust.

(Doing that translation in such a way that the behavior remains what gcc, clang or “most C compilers” do may be harder, but I’m not sure of that)

rectang
11 replies
21h15m

undefined behavior gives the compiler leeway in deciding what a program does, so the more undefined behavior a C program invokes, the easier it is to translate its code to rust.

That's the kind of language lawyer approach that caused a rebellion in the last decade amongst C programmers against irresponsible compiler optimizations. "Who cares if your program actually works as intended? My optimization is legal according to the standard, it's your program that's written to exploit loopholes".

I don't see any evidence that that's the attitude being taken by TRACTOR — I sure hope it isn't. But hell, even if the result is unreliable in practice, I suppose that if somebody gets to claim "it works" then the incentives are aligned to produce garbage.

atiedebee
8 replies
21h0m

Who cares if your program actually works as intended? My optimization is legal according to the standard, it's your program that's relying written to exploit loopholes".

If your program invokes undefined behaviour, it's invalid and non-portable. Out of bounds array accesses are UB, yet a program containing them may just happen to work. It won't be portable even between different compiler versions.

The C standard is a 2 way contract: the programmer doesn't produce code that invokes undefined behaviour, and the compiler returns a standard conforming executable

matheusmoreira
4 replies
19h2m

If undefined behavior is invalid, then reject the program instead of "optimizing" it. This "oh look undefined behavior I'm gonna turn the entire function into a no-op" nonsense is completely unacceptable. It's adversarial and borders on malicious. Null pointer check deletion can turn bugs into exploitable vulnerabilities.

zajio1am
2 replies
16h16m

If undefined behavior is invalid, then reject the program instead of "optimizing" it.

Undefined behavior is usually a result of runtime situation, it is usually not obvious from just the code whether it could or could not happen, so the compiler cannot reject the program.

The 'UB-based' optimization is just assumption that the code is correct and therefore UB-situation could not happen in runtime.

grumpyprole
1 replies
10h34m

Usually but not always. For example, the removal of an empty effect free infinite loop. This should be an error.

the8472
0 replies
10h10m

The C++ forward progress guarantee enables more optimizations since it allows the compiler to reason more easily about loops:

The standards added the forward progress guarantees to change an optimization problem from "solve the halting problem" to "there will be observable side effects in the forms of termination, I/O, volatile, and/or atomic synchronization, any other operation can be reordered". The former is generally impossible to solve, whereas the latter is eminently tractable.

But yeah, that's one of the more foot-gunny UB rules that Rust does not have. But it does mean it doesn't mark functions as `mustprogress` in LLVM IR which means it misses out on whatever optimizations that enables.

Avamander
0 replies
9h0m

This "oh look undefined behavior I'm gonna turn the entire function into a no-op" nonsense is completely unacceptable. It's adversarial and borders on malicious.

You significantly underestimate how much UB people write and overestimate the end-result if the current approach would not be taken.

rectang
1 replies
20h48m

The C standard with its extensive undefined behavior causes programmers and compiler writers to be at odds. In a sane world, "undefined behavior" wouldn't be assumed to mean "the programmer must have meant for me to optimize this whole section of code away". We aren't on the same team, even if I believe that all parties are acting with the best of intentions.

I don't feel that the Rust language situation incentivizes such awful conflict, and it's one of many reasons I now try really hard to avoid C and use Rust instead.

astrange
0 replies
16h50m

A funny thing about this problem is that it gets worse the more formally correct your implementation is. Undefined behavior is undefined, so it's outside the model, and if your program is a 100% correct implementation of a model then how can it know what to do about something outside it?

But I don't think defining all behavior helps. The defined behavior could be /wrong/, and now you can't find it because the program using it is valid, so it can't be detected with UBSan.

Asooka
0 replies
20h38m

Doing one funny thing on platform A and a different funny thing on platform B when an edge case arises is way better than completely deleting the code on all platforms with no warning.

Someone
1 replies
3h32m

I don't see any evidence that that's the attitude being taken by TRACTOR — I sure hope it isn't.

I don’t see any way it can do otherwise. As a simple example, what would one translate this C statement to:

  int i;
  …
  i = abs(i);
? I would expect TRACTOR to generate (assuming 64-bit integers):

  let i: i64;
  …
  i = abs(i);
However, that can panic in debug mode and return a negative number in release mode (https://doc.rust-lang.org/stable/std/primitive.i64.html#meth...), and there’s no way for TRACTOR to know whether that makes the program “work as intended”. That code may have worked fine/fine enough) for decades because its standard library returns zero for abs(INT_MIN).

rectang
0 replies
1h28m

It's possible to preserve the semantics of the original program using unsafe Rust. [1]

    unsafe {
        let mut i: std::os::raw::c_int
            = std::mem::MaybeUninit::uninit().assume_init();
        // ...
        i = libc::abs(i);
    }
That's grotesque, but it is idiomatic Rust insofar as it lays bare many of the assumptions in the C code and gives the programmer the opportunity to fix them. It is what I would personally want TRACTOR to generate if it could not prove that `i` can never take on the value `libc::INT_MIN`.

Given that generated code, I could then piecemeal migrate the unsafe bits to cleaner, idiomatic safe rust: possibly your code but more likely `i::wrapping_abs()` or similar.

What will TRACTOR choose? At least for this example, they don't have to choose inappropriate pruning of undefined behavior. They claim the following:

The goal is to achieve the same quality and style that a skilled Rust developer would produce, thereby eliminating the entire class of memory safety security vulnerabilities present in C programs.

If they're going to uphold the same "quality", the translation you presented doesn't cut it. But you may be right and they will go down the path of claiming that a garbage translation is technically valid under undefined behavior and therefore”quality” — if so, I will shun them.

[1] https://play.rust-lang.org/?version=stable&mode=debug&editio...

derdi
2 replies
20h54m

undefined behavior gives the compiler leeway in deciding what a program does, so the more undefined behavior a C program invokes, the easier it is to translate its code to rust.

You assume that the compiler can determine what behavior is undefined. It can't. C compilers don't just look at some individual line of the program and say "oh, that's undefined, unleash the nasal demons". C compilers look at code, reason that if such-and-such variable has a certain value (say, a null or invalid pointer), then such-and-such operation is undefined (say, dereferencing that variable), and therefore on the next line that variable can be assumed not to have that bad value. Despite all the FUD, this is a very limited power. C compilers don't usually know the actual values in question, all they do is exclude some invalid ones.

bigstrat2003
1 replies
19h17m

I (not the person you are replying to) do understand that's how compilers interact with UB. However, a wealth of experience has shown us that the assumption "UB doesn't occur" is completely false. It is, in my opinion, quite irresponsible for compiler writers to continue to use a known-false assumption when building the optimizer. I don't really care how much speed it costs, we need to stop building software on a shaky foundation like that.

astrange
0 replies
16h48m

Soon (or actually, already) we'll have MTE and CHERI, and then that C undefined behavior will be giving you security improvements as well as speed improvements.

Can't design a system that 100% crashes on invalid behavior if you've declared that behavior is valid, because then someone is relying on it.

the8472
3 replies
23h16m

Write tests for your C code. Run c2rust (mechanical translation), including the tests. Let a LLM/MCTS/verifier loop go to town. Verifier here means it passes compiler checks, tests, santiziers and miri.

Additional training data can be generated by running mrustc or by inlining unsafe code (from std/core/leaf crates) into safe code and running semantics-preserving mechanical refactorings on the code.

This can be closer to AlphaProof than ChatGPT

astrange
1 replies
16h47m

You could already use ASAN + UBSan, or Frama-C.

the8472
0 replies
10h16m

I did mention using sanitizers in the verification step of the optimization loop. The optimization goal here would be reducing the lines of `unsafe` while preserving program semantics.

TestingWithEdd
0 replies
13h34m

Essentially neural program synthesis

sam0x17
3 replies
1d1h

speaking of hard, the DOE actually funds a project that has been around for 20+ years now (ROSE) that involves (among other things) doing static analysis on and automatically translating between C/C++/Cuda and even high level languages like Python as well as HPC variants of C/C++. They have a combined AST that supports all of those languages with the same set of node types essentially. Quite cool. I got to work on it when I was an intern at Livermore, summer of 2014.

and it's open source as well! http://rosecompiler.org/ROSE_HTML_Reference/index.html

andrewflnr
1 replies
14h46m

a combined AST that supports all of those languages with the same set of node types essentially.

I can't believe that works at all. I'll take a look for sure.

sam0x17
0 replies
13h28m

Most of what they use it for is static analysis, but the funding comes from its ability to translate old simulation code to HPC-ready code. I think they even support fortran IIRC

seren
0 replies
10h59m

I have already seen legacy projects that were designed using Rational Rose, but for some reason I thought it was only a commercial name, not an actual system. Thanks, I learned something today !

kragen
2 replies
22h30m

presumably dan wouldn't have gotten darpa funding if it were obviously feasible, and success wouldn't give him anything publishable academically

dgacmu
1 replies
20h47m

Just to be clear to others, Dan is the darpa PM on this - he convinced darpa internally it was worth funding other people to do the work, so he himself / his research group won't be doing this work. He's on leave from Rice for a few years to be a PM at DARPA's I2O.

And while DARPA doesn't directly care about research publications as an outcome, there's certainly a publishable research component to this, as well as a lot of lower papers-per-$ engineering and validation work. A lot of the contracts they hand out end up going to some kind of contractor prime (BBN, Raytheon, that kind of company) with one or more academic subs. The academic subs publish.

kragen
0 replies
20h20m

thank you for the correction; I didn't realize he was the darpa pm

what you describe is exactly my experience as a darpa performer (on a program which dan is apparently now the pm for!)

downrightmike
2 replies
1d1h

If the IRS could have more timely funding, all their Cobol would be translated to Java by now

psunavy03
1 replies
20h49m

COBOL migrations are tar pits of replicating 40+ years of undocumented niche business logic for a given field, edge cases included, that was "commonly understood" by people who are now retired or dead. Don't get your hopes up.

pjmlp
0 replies
11h52m

MicroFocus has COBOL compilers for Java and .NET, as do other COBOL vendors still in business.

Usually the biggest issue, is that most of the porting attempts don't start there, rather they go for the rewritte from scratch, and lets not pay the licenses for those cross-compilers.

01HNNWZ0MV43FF
2 replies
1d2h

Can't most c++ be machine-lowered to C?

woodruffw
0 replies
1d2h

Lowering is typically easier than lifting (or brightening). When you lower, you can erase higher-level semantics that aren't relevant; when you lift, you generally want to compose lower-level program behaviors into their idiomatic (and typically safer) equivalent.

pjmlp
0 replies
11h55m

Yes, that is after all how C++ started.

How good the resulting performance would be like, that is another matter.

jandrese
1 replies
1d1h

I have to think the approach will be something like "AI summarizes the features of the program into some kind of technical language, then the AI synthesizes Rust code that covers the same feature set".

It would be most interesting if the approach was not to feed the program the original program but rather the manual for the program. That said it's rare that a manual captures all of the nuances of the program so a view into the source code is probably necessary, at least for getting the ground truth.

munificent
0 replies
1d1h

More like:

"AI more or less sort of summarizes the features of the program into some approximate kind of technical language, then the AI synthesizes something not too far from Rust code that hopefully covers aspirationally the same feature set".

mattgreenrocks
0 replies
1d2h

Projects are termed DARPA-hard for a reason.

elromulous
0 replies
14h53m

and most interesting code is written in C++ anyway.

You're just asking for people to bring out their pitchforks :P

dlenski
0 replies
1h48m

Man, I want to upvote this but…

most interesting code is written in C++ anyway.

Really?! The Linux kernel is a _pretty enormous_ counterexample, as are many of the userland tools of most desktop Linux distros.

I am also a key developer of an entirely-written-in-C tool which I'd venture that [a large fraction of desktop Linux users in corporate environments use on a regular basis](https://gitlab.com/openconnect/openconnect).

PreInternet01
61 replies
1d2h

The one link for those who think that 'Rewrite it All in Rust' will, well, settle any debates: https://github.com/rust-lang/miri/

calebpeterson
30 replies
1d2h

Genuine question:

Would you mind explaining to a dev that doesn’t know much (anything) about Rust, how does this settle any debate?

PreInternet01
22 replies
1d2h

Well, the general 'Rewrite All in Rust' consensus is that it solves all general programming problems, ever.

Yet, the linked repository shows a huge list of cases in which simple, documented use of Rust can cause Undefined Behavior (a.k.a. 'UB')

Pretty much every argument of Rust advocates against C/C++ boils down to either 'but memory safety' or 'but UB'.

Yet there are many convincing counter-arguments that boil down to 'but CompCert' or similar, and, as the linked repository shows, there might be at least some truth in there?

steveklabnik
13 replies
1d2h

No serious person claims that Rust solves every problem ever.

Also, many people cite things like Cargo as a reason to prefer Rust over C and C++, as well as other things. UB is a big part of it, of course, but it isn’t the only thing.

PreInternet01
11 replies
1d1h

No serious person claims that Rust solves every problem ever

No, but there are a lot of people claiming that Rust cannot ever have any problems.

Just look at this thread. I merely linked to MIRI, and am currently at, like, -10 just for that.

Lots of people claiming that it just applies to 'unsafe Rust': is that true or not?

Regardless of anything else: can you, as a Rust community leader, please state clearly: is UB in generally safe Rust possible or not?

steveklabnik
10 replies
1d1h

No, people are not claiming Rust cannot have any problems.

UB is not possible in safe Rust, by design. The root cause of UB is always in unsafe code. Miri is useless if your code is 100% safe Rust.

The only exception to this is bugs in the compiler, of which there are a few. They’ll be fixed.

PreInternet01
8 replies
1d

UB is not possible in safe Rust, by design

You're available as an expert witness to that fact?

Because, eh, well, in at least one of the Rust-related situations that I'm involved in right now, someone might soon very well require the services of a person both as wise and reluctant-to-offer-any-kind-of-compromise as yourself...

steveklabnik
5 replies
1d

Yes, it is a core design tenet of the language. It's as benign a statement as "C# has garbage collection." That's not "reluctant to offer compromise."

PreInternet01
4 replies
1d

OK, you truly seem not to understand how much damage you're dealing to the general population using absolutist statements like this, do you? Nor do you seem to understand "compromise", like at all, because you seem to equate it with "tit for that", which is unsurprising, but still... disappointing.

In any case, I'm truly done here, in all senses of the word, but I still I wish you and your acolytes the absolute best.

keybored
2 replies
22h40m

Calling Steve Klabnik (of all Core Rust background people, literally all of them) an “absolutist” proves how unreasonable you’re being.

chiggsy
1 replies
4h7m

Why do you feel it is unreasonable for this person to have human failings? What label would you find suitable?

keybored
0 replies
4h3m

You’re either reframing the statement to be about human failings overall—the lack thereof—or you’re assuming the conclusion.

superb_dev
0 replies
11h38m

Man all you had to do was bring proof, like maybe a code snippet with UB?

jcranmer
0 replies
22h36m

The situation you've alluded to in another thread seems to involve an unsafe block (since it's using a type which is only usable in an unsafe block).

Let me be even more explicit than steveklabnik here. If your code, including any libraries you link to, is 100% Rust and free of any unsafe blocks, then (barring compiler bugs) it is impossible to execute undefined behavior. If your code has an unsafe block, then it is possible execute undefined behavior. Note that it is possible for safe code to execute undefined behavior, IF there was an unsafe block that did an operation that requires the programmer to promise something was true that was not true.

For example, there is an unsafe method that will let you convert a pointer to a reference with an arbitrary lifetime. If you wrap that in a safe function, you can return a reference to an object whose lifetime has ended, and cause undefined behavior in attempting to use that lifetime--the attempt can even be outside the safe block. But were that unsafe block that upgraded the lifetime not present, then you couldn't cause the later undefined behavior to happen.

In short, an unsafe block is where the compiler can no longer guarantee that the conditions that prevent the ability to observe undefined behavior are present, and it is up to the programmer to ensure that these conditions are met, and even and especially ensure that they continue to be met after the unsafe block completes. I do worry that too many programmers are blasé about the last bit, and it sounds like your coworker may fall into that category. But Rust has always maintained this principle.

Slyfox33
0 replies
22h35m

What are you talking about? Yes it's impossible to have UB in safe rust unless theres some obscure compiler bug or something. This isn't a controversial statement.

chiggsy
0 replies
3m

I have no faith in this statement. Let's see how it plays out.

galangalalgol
0 replies
1d1h

I selected it for performance reasons myself, the UB protection was a nice benefit that was expected, cargo wasn't expected and is extremely nice coming from the cmake,conan,vcpkg and duct tape world I came from.

timeon
5 replies
1d1h

Well, the general 'Rewrite All in Rust' consensus is that it solves all general programming problems, ever.

This is obvious example of strawman. Why are you doing this?

PreInternet01
4 replies
1d1h

Towards general mental health. I'm just a C# wage slave, and I'll admit, when being prompted, that my language, its vendor, its runtime environment, and its general approach are, to put it kindly, flawed.

However, as evidenced by the arguments and voting in this thread, Rust proponents will take no criticism, whatsoever.

I linked to a GitHub repository that documents many, many instances in which generally safe Rust causes UB.

The same kind of UB that recently hit one of my coworkers, caused a 3-day outage and now (despite all my counseling to the contrary!) will burn them out permanently.

My only request: can you guys please back off just a little bit? Programming is already hard enough without the purity wars you're stoking all the time...

keybored
3 replies
22h42m

Stoking language flame wars based on hysterical exaggeration has never promoted mental health.

fargle
2 replies
20h22m

to be fair, from his perspective, it's often the rusty crowd who is stoking the flame wars - this sounds like a reaction to them.

how often do we hear something like "C and C++ are horribly flawed and completely unsafe. it's basically a crime against humankind and gross negligence to use them"?

i get weary of that kind of thing too. i wouldn't approach it by reacting in the same way as the GP comment, but i get it. and it's not really that much of a strawman. it's more exasperation and sarcasm.

personally, i'm very interested in rust. but everytime someone at best "overhypes" it or at worse, outright dogs on other languages, it's a negative point toward dealing with the whole rust ecosystem.

keybored
0 replies
4h11m

I don’t buy it.

People can, in the most neutral way possible, point out facts about how safe or unsafe Rust is compared to C and C++. People will STILL complain about how the Rust zealots are bullying their language. This is how it plays out every time.

You can look at this thread. The “exasperation and sarcasm“ is stupid and one-sided. “But” they always say “that’s just a reaction to a previous debate”–because the Rust zealots are always in the rear-view mirror, never in front of them.

How about complaining about something in Rust… that is bad? Like how un-ergonomic Async is? Or how pointy and awkward the syntax can be? Instead they choose to fight the losing battle over how C and Rust are equally unsafe or how actually Rust’s safety doesn’t matter, depending on the phase of the moon. Then they whine about tone and zealotry when they realize arguing against Rust safety from the C and C++ side is a losing battle and they have run out of arguments.

jamincan
0 replies
4h50m

In all honesty, I don't see that sort of thing posted except maybe the overly naive excited "omg I love rust" post in /r/rust from someone just learning it which no one should be taking as credible.

I do, however, see people trot out the oft-repeated "rust evangelists want to rewrite everything in rust" or "rust people say programming C++ is a crime against humanity", but it seems to me that's the only place I see this argument. In other words, it's a simple strawman.

superb_dev
0 replies
1d1h

Well, the general 'Rewrite All in Rust' consensus is that it solves all general programming problems, ever.

No, that’s not the consensus. This is a strawman.

leftyspook
0 replies
1d1h

Well, the general 'Rewrite All in Rust' consensus is that it solves all general programming problems, ever.

a) There is no such consensus. The actual consensus is that even if Rust solved all problems, it would not be financially feasible to rewrite pretty much any substantial project.

b) While Rust does solve many problems, it is nowhere close to solving all safety, otherwise there would be no `unsafe` keyword. Alas, fully proving safety in an impure, turing-complete language is mathematically impossible.

c) The only reason you would think that there's some sort of woke Rust lobby, is if you spend way too much time subjecting yourself to opinions of literal sixteen year olds on twitter.

mrweiden
2 replies
1d2h

From the original post > It’s not enough to rely on bug-finding tools

From the Miri github: > Miri is an Undefined Behavior detection tool for Rust.

keybored
0 replies
1d2h

Darpa is already ahead of you all with the hedging:

The preferred approach is to use “safe” programming languages

“Safe”. Terms and conditions may apply.

Sharlin
0 replies
1d2h

There is no contradiction. The fact that UB-finding tools alone are not sufficient doesn't mean they aren't useful even with a safe(r) language.

In other words, from "safer languages are necessary" it does not follow that "safer languages are sufficient".

jerf
1 replies
1d2h

I believe it goes something like, "I have constructed a strawman that Rust claims that all code written in it is automatically safe by all conceivable definitions of safe, but look, ha ha, here's something that detects unsafe code in Rust!", and I don't mean "code marked in unsafe blocks".

It's a concatenation of several logical fallacies in a row; equivocation, straw manning, binary thinking about safety, several others. It's hard to pick the main one, but I'd go with the dominant problem being a serious case of binary thinking about what "safety" is. Of course, if the commentor is using anything other than Idris for all their programming, they're probably not actually acting on their own accusations.

marcosdumay
0 replies
22h21m

Of course, if the commentor is using anything other than Idris

I'm sure the Idris compiler has bugs somewhere too. If the OP actually programs, they are violating their rationale (I'm quite sure assembly or assembled binary aren't ok either).

spease
0 replies
1d1h

They are claiming that because code in ‘unsafe’ blocks in Rust can have undefined behavior, that the language is no safer than C.

This does not settle the debate because unsafe is rarely needed for a typical Rust program. In addition, the presence of an unsafe block also alerts the reader that the set of possible errors is greatly increased for that part of the code and more careful auditing is needed.

It’s a little like saying traffic lights are useless because emergency responders need to drive through them sometimes, so we should just leave intersections completely unsignaled and expect drivers to do better.

Rust is by default restrictive and requires you to explicitly make it unsafe, C/++ are by default unsafe and require you to explicitly make them restrictive.

leftyspook
0 replies
1d1h

It is a tool for checking that your unsafe code doesn't cause UB. It doesn't really settle anything, but the commenter uses it as a gotcha to say "rust is no better than C, because you still can compile code that contains UB".

keybored
28 replies
1d2h

You linked an interpreter for some kind of internal compiler representation that the Rust compiler uses.

What on Earth do you mean?

PreInternet01
24 replies
1d2h

What on Earth do you mean?

That documented use of safe Rust can easily lead to UB, which this infernal 'internal compiler representation' demonstrates.

I'm not even sure what is even remotely confusing about that?

commodoreboxer
13 replies
1d2h

Miri is an Undefined Behavior detection tool for Rust. It can run binaries and test suites of cargo projects and detect unsafe code that fails to uphold its safety requirements.

... detect unsafe code that fails ...

Show me the documented safe Rust code that causes UB without using any unsafe blocks outside of the standard library.

steveklabnik
12 replies
1d2h

There are some soundness holes in the implementation that can cause this. Just like any project, the compiler can have bugs. They’ll be fixed just like any bug.

PreInternet01
9 replies
1d1h

Ah, a voice of sort-of sanity, at long last.

So, the reason I posted my original reply, is that at one of my $DAYJOBs, we recently had a 3-day outage on some service, related to Rust. Something like using AVX to read, like, up to 7 bytes too many from an array.

Nothing really major -- we have a 10-day backup window, and the damage was limited to 4 days, so we were able to identify and fix all identified cases. But the person-to-Git-blame for this issue happened to be one of my mentees, and... they were blown away by it.

As in: literally heartbroken. Unable to talk about it. "But the compiler said it was okay!", crying. One of my coworkers pointed at MIRI, which correctly warned about the issue-at-hand, at which point I recommended incorporating that tool into the build pipeline, as well as (the usual advice in cases such as this) improving unit tests and focusing on X-1 and X+1 cases that might be problematic.

To this day, I'm truly worried about my mentee. I'm just a C# wagie, and I fully accept that my code, my language, my compiler, and my runtime environment are all shit.

But, as evidenced by my experience and supported by the voting in this thread, it seems that Rust users seem to self-identify with the absolute infallibility of anything relate to the language, and react quite violently and self-destructively to any evidence to the contrary.

As a community leader, do you see any room for improvement there? And if not, what would it take to convince you?

steveklabnik
3 replies
1d1h

using AVX

This would require using unsafe code.

As in: literally heartbroken. Unable to talk about it.

I would hope that this person improves as an engineer, because this isn't particularly professional behavior, from the way you describe it.

"But the compiler said it was okay!"

Given that you'd have to use unsafe to do this, the compiler can't say it was okay. It sounds like this person may not fully understand Rust either.

it seems that Rust users seem to self-identify with the absolute infallibility of anything relate to the language, and react quite violently and self-destructively to any evidence to the contrary.

I don't see how this generalizes. You had one (apparently junior, given "mentee"?) person make a mistake and respond poorly to feedback. You also barged into this thread and made incorrect statements about Rust, and were downvoted for it. That doesn't mean that Rust users think everything is perfect.

As a community leader, do you see any room for improvement there?

I do think sometimes enthusiastic people who don't understand things misrepresent the thing they're enthusiastic about, but that's a human problem, not a Rust problem. I do not think there's a way to fix that, no.

mike_hearn
2 replies
10h18m

It'd require using unsafe code somewhere in the stack. Not necessarily by the mentee. It's possible that the AVX code wasn't properly hidden behind a safe abstraction in a library.

steveklabnik
1 replies
4h17m

That still means the unsafe code is at fault.

mike_hearn
0 replies
3h6m

Yes, but if a developer can't trust the abstractions then isolating unsafe code behind them is of no value.

n_plus_1_acc
1 replies
1d1h

The Rust community as a whole very much promotes the idea of trusting the Compiler. Which is a very useful thing, especially for folks coming from other languages like C. It's not perfect of course as the compiler has bugs, but I think it still a good thing to teach.

astrange
0 replies
16h41m

You should never do this if you work at a company large enough to have a compiler team, btw, because they're going to fork the compiler and put bugs in it.

Conversely, if you never encounter bugs in a component, it means it's not being improved fast enough.

keybored
1 replies
22h45m

I'm just a C# wagie, and I fully accept that my code, my language, my compiler, and my runtime environment are all shit.

What is shit about those things for C#? That’s the application programming language that seems to get the least flak out of all of them.

If I’m using an alpha or beta compiler, I might suspect a compiler bug from time to time… not really when I’m working in a decades-old, very established language.

astrange
0 replies
16h31m

Java is an underpowered clone of ObjC and C# is a slightly less underpowered clone of Java.

So they fixed the biggest issues (at least it has value types), but it has nullable classes, collection types are mutable, integer overflow doesn't trap, it doesn't have nearly enough program verification features (aka dependent types), etc.

Worst of all it was written by enterprise programmers, who think programs get better designed when you put all their types four namespaces deep. I assume whoever named System.Collections.ArrayList keeps everything in their house in one of those filing cabinets with the tiny drawers.

neonsunset
0 replies
1d

Don't worry, your language and especially the runtime and compiler are great. Particularly so in the last few years. I wouldn't worry about the noise, maybe it concerns C++, but C# is a strict productivity upgrade for general-purpose applications despite some* of the dated bits in the language (but not the runtime).

* like un-unified representation of nullable reference types and structs under generics for example, or just the weight of features over the years, still makes most other alternatives look abysmal in comparison

commodoreboxer
1 replies
1d2h

Yes, in particular some interactions with LLVM have caused some frustrating UB. But those are considered implementation bugs, rather than user bugs, and all the conditions Miri states at the top are relevant primarily in unsafe code, which contradicts the OP's point, which is that there are tons of documented cases of UB in safe Rust. This is not true. There are a few documented cases, and most have been fixed. It's nowhere close to the world of C or C++'s UB minefield.

steveklabnik
0 replies
1d1h

For sure, just making sure to acknowledge this is the case, before someone responded to your post with cve-rs. :)

keybored
7 replies
1d2h

Indeed. There have been UB bugs in the standard library caused by unsafe blocks.

Those are bugs. They are faults in the code. They need to be fixed. They are not UB-as-a-feature like in C/C++. “Well watch out for those traps every time you use this.”

This is like getting mad that a programming language boasts that it produces great binaries and yet the compiler has a test suite to catch bugs in the emitted assembly. That’s literally what you are doing.

Calavar
6 replies
1d1h

Those are bugs. They are faults in the code. They need to be fixed. They are not UB-as-a-feature like in C/C++.

Rust has UB-as-a-feature too. They could have eliminated UB from the language entirely, but they chose not to (for very valid reasons in my opinion).

UB is a set of contracts that you as the author agree to never violate. In return, you get faster code under the assumption that you never actually encounter a UB condition. If you violate those contracts in Rust and actually encounter UB, that's a a bug, that's a fault in the code. If you violate those contracts in C++, that's a bug, that's a fault in the code. This is the same in both languages.

It's true that Rust UB can only arise from unsafe blocks, but it is not limited to unsafe blocks. Rust UB has "spooky action at a distance" the same way C++ UB does. In other words, you can write UB free code in Rust, but if any third party code encounters UB (including the standard library), your safe code is now potentially infected by UB as well. This is also the same in both languages.

There are good reasons to favor Rust's flavor of UB over C++'s, but I keep seeing these same incorrect arguments getting repeated everywhere, which is frustrating.

keybored
4 replies
1d

There are good reasons to favor Rust's flavor of UB over C++'s, but I keep seeing these same incorrect arguments getting repeated everywhere, which is frustrating.

Tell me what I wrote that was incorrect. I called them UB bugs in the standard library. If they were trivial bugs that caused some defined-behavior logic bug while used outside of the standard library then it wouldn’t rise to the level of being called an UB bug.

Calavar
3 replies
23h46m

They are not UB-as-a-feature like in C/C++.

That's the part that's incorrect. That, plus the implication that UB is a bug in Rust, but not in C++. As I said, the existence of UB is a feature in both languages and actually encountering UB is a bug in both languages. You can play with the semantics of the word "feature" but I don't think it's possible to find a definition that captures C++ UB and excludes Rust UB without falling into a double standard. Unfortunately double standards on UB are pretty common in conversations about C++ and Rust.

keybored
2 replies
22h53m

You’re done editing the comment now?

Do you think UB-as-feature is something that someone would honestly describe C or C++ as? It’s a pretty demeaning way of framing things. Indeed it’s a tongue-in-cheek remark, a vhimsical exaggeration/description of the by-default UB of those languages which was added to the end of the completely factual description of the role that finding UB in the Safe Rust subset of the standard library of Rust serves.

Of course one cannot, from the Rust Side so to speak, use tongue in cheek, off-hand remarks in these discussions; one must painstakingly add footnotes and caveats, list and mention every trivial fact like “you can get UB in unsafe blocks”[1] or else you have a “double standard”.

[1] Obligatory footnote: even though all participants in the discussion clearly knows this already.

Calavar
1 replies
19h18m

Do you think UB-as-feature is something that someone would honestly describe C or C++ as?

Yes. That's how I describe it. That's also how Ralf Jung (long time Rust contributor and one of the main people behind Miri) describes UB in both Rust and C++ (although he says C++ overdoes it) [1]

The thing I edited out of my comment was "motte and bailey fallacy" because after reflecting a bit I thought it was unfair. But now you're actually trying to retroactively reframe as a joke.

[1] https://blog.sigplan.org/2021/11/18/undefined-behavior-deser...

keybored
0 replies
4h6m

Yes. That's how I describe it. That's also how Ralf Jung (long time Rust contributor and one of the main people behind Miri) describes UB in both Rust and C++ (although he says C++ overdoes it) [1]

Okay. Then I was wrong about that.

The thing I edited out of my comment was "motte and bailey fallacy" because after reflecting a bit I thought it was unfair. But now you're actually trying to retroactively reframe as a joke.

What a coincidence. I had written on a post-it note that you were going to pull out an Internet Fallacy. (I guess it’s more about rhetoric.)

I guess you’ve never seen someone explain after the fact that they were being tongue in cheek (it’s not a joke, it’s an exaggeration)? Because jokes, sarcastic remarks are always clearly labelled and unambiguous? Okay then. I guess it was a Motte and Bailey.

oconnor663
0 replies
23h55m

It's true that Rust UB can only arise from unsafe blocks, but it is not limited to unsafe blocks.

This is correct, and it's hard to teach, and I agree that a lot of folks get it wrong. (Here's my attempt: https://jacko.io/safety_and_soundness.html.) But I think this comment is understating how big of a difference this makes:

1. Rust has a large, powerful safe subset, which includes lots of real-world programs. Unsafe code is an advanced topic, and beginners don't need to learn about it to start getting their work done. Beginners can contribute to big projects without touching the unsafe parts (as you clarified, that means the module privacy boundaries that include unsafe code, not just the unsafe blocks), and reviewers don't need to be paranoid about every line.

2. A lot of real-world unsafe Rust is easy to audit, because you can grep for `unsafe` in a big codebase and zoom right to the parts you need to look at. Again, as you pointed out, those blocks might not be the whole story, and you do need to read what they're doing to see how much code they "infect". But an experienced Rust programmer can audit a well-written codebase in minutes. It's not always that smooth of course, but it's a totally different world that that's even possible.

woodruffw
0 replies
1d2h

Miri is a MIR interpreter aimed at unsafe Rust, not safe Rust. Using the fact that it operats on an internal representation is a very weird swipe; almost all static and dynamic analysis tools work on some kind of IR or decomposed program representation.

estebank
0 replies
21h7m

That documented use of safe Rust can easily lead to UB

The only thing that comes to mind that this could be referring to are the open bugs at https://github.com/rust-lang/rust/issues?q=is%3Aopen+is%3Ais.... Are these what you're referring to?

this infernal 'internal compiler representation'

What makes MIR "infernal"?

I'm not even sure what is even remotely confusing about that?

You posted a link to a tool that executes pure rust libraries and evaluates memory accesses (both from safe and unsafe rust code) to assert whether they conform to the rust memory model. It sits in the same space as valgrind. You left it open to interpretation with really no other context. We can be excused for not knowing what you were trying to say. I personally still don't.

nequo
2 replies
1d2h

It's the old trope that some Rust code uses unsafe blocks so all Rust code is as unsafe as C.

melling
0 replies
1d2h

I don’t know Rust but even if the Rust is just as unsafe in certain blocks, simply being translated to Rust removes a lot of corporate resistance to adopt the language.

Getting people to adopt a new language can be a lot of work. I remember people claiming they missed headers files in Swift so they wanted to stick with Objective C.

keybored
0 replies
1d2h

Of course. I should have expected the Nirvana Fallacy. :)

the8472
0 replies
23h19m

The trophy cases in miri are about bugs in unsafe code. Yes, you can write UB with unsafe code. This should not be news.

And miri is a blessing. There even is a known case where someone found a bug in C by translating it to rust and then running it through miri.

nanolith
46 replies
1d2h

I'm personally not a fan of "rewrite the world in Rust" mentality, but that being said, if one is planning to port a project to a new language or platform, mechanical translation is a poor means of doing so. Spend the time planning better architecture and designing a better software system, and find a way to replace it piece by piece. Don't build a castle in the sky, because it will never reach the ground. If you've decided to use Rust for this system, that's fine. But, write Rust. Don't try to back-port C into Rust.

I think a far better and more mature process is to update C to modern C and use a model checker such as CBMC to verify memory, resource, and integer math safety. One gets the same safety as a gradual Rust rewrite, but the code base, knowledge base, and developers can be maintained.

pdimitar
23 replies
1d1h

I'm personally not a fan of "rewrite the world in Rust" mentality

There is no such mentality anywhere. There is a ton of software that's much better off left alone in a dynamic language, or a statically typed language with a garbage collector (like Golang). Good engineers understand the idea of using the right tool for the job.

The push is to start reducing those memory safety CVEs because they have been proven to be a real problem, many times over.

mechanical translation is a poor means of doing so

Agreed. If we could automatically and reliably translate C/C++ to Rust it would have been done already.

Spend the time planning better architecture and designing a better software system, and find a way to replace it piece by piece.

OK, I am just saying that somewhere along that process people might get a bout of confidence and tell themselves "oh, we're doing C much better now, we no longer write memory safety bugs, can't we stop here?" and they absolutely will. Cue another hilarious buffer overflow CVE 6 months later.

I think a far better and more mature process is to update C to modern C and use a model checker such as CBMC to verify memory, resource, and integer math safety.

A huge investment. If you are going to do that then you might as well just move to Rust.

One gets the same safety as a gradual Rust rewrite

Maybe, but that sounds fairly uncertain or far from a clear takeaway to me.

uecker
16 replies
22h18m

Rewriting is rarely a good idea in general. Rust proponents like to pretend that it is impossible to avoid safety issues in C while it is automatically given in Rust. But this is not so simply in reality.

pdimitar
15 replies
22h8m

I don't like generalizations... in in general. :D (Addressing your "rewrites are rarely a good idea in general" here.)

My experience tells me that if a tech stack supports certain safety guarantees by default that this leads to measurable reduction of those safety problems when you switch to the stack. People love convenient defaults, that's a fact of life.

The apparently inconvenient truth is that most programmers are quite average and you can't rely on them going above and beyond to reduce memory safety errors.

So I don't buy the good old argument of "just hire better C programmers". We still have a ton of buffer overflow CVEs regardless.

And I never "pretended it's impossible to avoid safety issues in C". I'll appreciate if you don't clump me in some imaginary group of "Rust proponents".

What I'm saying is this: use the right tool for the job. The C devs have been given decades and yet memory safety CVEs are still prevalent.

What conclusion would you arrive at if you were in my place -- i.e. not coding C for a living for like 18 years now but still witnessing it periodically crapping the bed?

I'm curious of your take on this. Again, what other conclusion would you arrive at?

uecker
14 replies
20h11m

I am complaining about the usual phrases which are part of the Rust marketing, like the "just hire better C programmer did not work" or the "why are there still CVEs" pseudo arguments, etc.

For example, let's look at the "hire better C programmers does not work" argument. Like every good propaganda it starts with a truism: In this case that even highly skilled C/C++ programmers will make mistakes that could lead to exploitable memory safety issues. The problem comes from exaggerating this to the idea that "all hope is lost and nothing can be done". In reality one can obviously do a lot of things to improve safety in C/C++. And even one short look at CVEs should make it clear that there is often huge room for improvements even with relatively simple measures. For example, a lot of memory safety bugs in C/C++ come from open-coded string or buffer manipulation. But it is not exactly rocket science to abstract this away behind a safer interface. But once this is understood, the obvious conclusion is that addressing some of these low-hanging fruits would be far more effective in improving safety than wasting a lot of time and effort in rewriting in Rust.

pdimitar
13 replies
18h29m

In reality one can obviously do a lot of things to improve safety in C/C++.

That's not "in reality", that's "in theory". Because in actual reality, people still write the good old buffer overflow bugs to this day.

I don't think anyone reasonable is disputing that we indeed can improve C/C++ programming. The argument of myself and many others like myself is: "a lot can be done but for one reason or another it is STILL NOT being done". Likely the classic cost cutting but there are likely other factors at play as well.

But once this is understood, the obvious conclusion is that addressing some of these low-hanging fruits would be far more effective in improving safety than wasting a lot of time and effort in rewriting in Rust.

Explain why this has not been done yet. Explain why Microsoft, Google and various intelligence agencies attribute memory safety bugs to between 60% to 75% of all CVEs and demonstrable exploits that they are aware of.

Please do, I am listening. Why has almost nothing been done yet?

Secondly, "wasting a lot of time and effort in rewriting in Rust" is an empty claim. To demonstrate why, I ask you this: at which point the continued cost of investing in endlessly patching C/C++ and all its glorious foot-guns becomes bigger than the cost a rewrite?

Surely at one point just endlessly throwing money at something that gives you a 1% return of investment (in terms of getting more stable and less dangerously buggy) does indeed get more expensive than starting over?

I have no clear answer because it depends on the organization, the tenure of C/C++ and the devs in the org, and many others. It's strange that you pretend to have the answer.

nanolith
12 replies
17h53m

That's not "in reality", that's "in theory". Because in actual reality, people still write the good old buffer overflow bugs to this day.

That's because while the technology exists, it is not widely communicated. That's not a fault of C, and that's not something that any language can solve.

Explain why this has not been done yet.

See above.

The technology to make C and C++ safer is not yet widely used. But, it exists and it is being used. I use it on every firmware and OS project that I currently work on. The code we produce is free of memory errors, integer errors, API misuse errors, resource management errors, cryptography errors, confused deputization errors, and a host of other errors that our specifications are designed to catch. That goes well beyond what Rust or any other language can provide on its own. But, to be fair, Rust developers can do this using similar tooling.

It's laudable that you wish to rid the world of memory errors. I want to normalize going three or four steps further. Rust by itself won't get us there.

pjmlp
5 replies
9h15m

The proven fact that the said technology has failed its purpose, as the C and C++ culture keeps resisting its adoption, is the fact that all CPU vendors are now integrating hardware memory tagging as the ultimate weapon against memory corruption exploits.

Solaris has already been doing it since 2015, ARM more recently, we have Microsoft putting the big buckets into CHERI (including custom FPGA boards for testing), the new CoPilot+ PCs architecture with Pluton, and while AMD/Intel attempts weren't quite right like MPX, they will surely do something for x64 as well.

nanolith
4 replies
7h0m

The proven fact that the said technology has failed its purpose,

How, because other solutions are being explored? That's not due to a failure of one thing, but because both defense in depth and a desire to fix existing systems with no additional engineering are paths that security researchers and vendors explore. Not everyone will converge on a single solution, even when that solution is practical.

Just because something is not being used universally doesn't mean that it has failed. Moreso, it is not widely known about, and there persists rumors that it requires extraordinary effort, often reinforced by well meaning, but rather outdated advice.

pjmlp
3 replies
6h31m

Hardware is the ultimate castle wall when nothing else fixes the problem at the software level.

nanolith
2 replies
6h2m

That's a rather cynical interpretation of these initiatives. CHERI, for instance, has been in development for twenty years. It predates the general availability of open source tools like CBMC or languages like Rust. But, that doesn't make the concept better or obsolete. It makes it complementary.

Hardware security is complementary to software security. Mitigations at the hardware level, the hypervisor level, and the operation system level complement architectural, process, and tooling decisions made at the software level.

Defense in depth is a good thing. There can always be errors in one layer or another, regardless of software solution, operating system, hypervisor, or hardware. I can wax poetic about current CPU vulnerabilities that must be managed in firmware or operating systems.

pjmlp
1 replies
4h15m

Complementary, as the ultimate defence wall.

Many of the issues caused by C, are solved by Modula-2, Object Pascal and Ada, we didn't need to wait for Rust. But those aren't the languages that come for free with UNIX.

Or even better, they would be solved by C itself, if WG 14 cared even a little about providing proper support for slices, proper arrays and proper string types, or even as library vocabulary types.

But what to expect, when even Dennis Ritchie wasn't able to get his approach to slices being worked on by WG 14.

So hardware memory tagging, and sandboxed enclaves it is.

nanolith
0 replies
2h46m

There is nothing wrong with defense in depth. But, this is not where things stop.

I make extensive use of bounded model checking in my C development. I also use privilege separation, serialization between separate processes, process isolation, and sandboxing. That's not because bounded model checking has somehow failed, but because humans are fallible. I can formally verify the code I write, but unless I'm running bare metal firmware, I also have to deal with an operating system and libraries that aren't under my direct control. These also have vulnerabilities.

That's not a trivial thing. The average software stack running on a server -- regardless of whether it is written in C, Rust, Modula-2, Pascal, Ada, or constructively proven Lean extracted to C++ -- still goes through tens of millions of lines of system software that is definitely NOT safe. All of that code is out of a developer's control for now. Admins can continually apply patches, but until those projects employ similar technology, they are themselves a risk.

One day, hopefully, all software and firmware will go through bounded model checking as a matter of course. Until then, we work with what we can, and we fix what we can. We can also rely on hardware mitigations where applicable. That's not failure as you have claimed, but practical reality.

pdimitar
3 replies
17h35m

That's not a fault of C, and that's not something that any language can solve.

If you say so. Rust clearly does, and before you go saying "but `unsafe` exists!" I'll have to remind you that (1) scarcely any Rust devs reaches for that and (2) it still keeps quite a lot of guarantees and only relaxes some. Some, not all. Not even most.

It's laudable that you wish to rid the world of memory errors. I want to normalize going three or four steps further. Rust by itself won't get us there.

Well now we are on the same page. I never said "ONLY Rust will save us", I am saying that Rust clearly can get us further than we are right now. If there's something even more accessible, less verbose, and with not such a cobbled together Frankenstein async implementation like Rust, I'll start using it tomorrow.

nanolith
2 replies
17h12m

Rust clearly does

Until it exists at the kernel layer, the firmware layer, the runtime library layer, and the application layer, these issues still exist. CVEs come out weekly for memory errors in Linux, in firmware, in operating system libraries, and in application libraries. We need to think beyond rewriting code in one language or platform, and instead think about technologies that we can apply to all languages and platforms, including C and Rust.

I am saying that Rust clearly can get us further than we are right now.

As can bounded model checking, without having to teach developers a new language with new idioms.

If there's something even more accessible, less verbose, and with not such a cobbled together Frankenstein async implementation...

Indeed there is. Reach for the bounded model checker that works with your existing language or platform. Pour over the manual, and look at existing practical examples.

If you like Rust, feel free to use it. But, if you prefer C/C++, Pascal, Ada, Python, C#, Java, or Modula2, that's fine. Either use an existing bounded model checker for that language or port CProver / GOTO to that platform. Rust developers ported CProver to Rust via Kani, because they also recognize that writing safer code can't be done by language alone.

I don't think it's necessary to push people to use different languages or platforms to write safer code. They just need to use or port existing tooling and learn safer coding practices. If I come at firmware developers or old school OS developers with "we need to use Rust", the conversation is immediately shut down and I'm considered a fool. If, instead, I show them tooling that allows them to maintain their existing code base and make it safer, I get much further.

adgjlsfhk1
1 replies
15h19m

bounded model checking is a new language with new idioms

nanolith
0 replies
14h49m

Respectfully, that's a rather extraordinary claim. There are model checkers that use separate specification languages, but there are also model checkers embedded in the host language.

CBMC translates C -- the same language -- to an SMT solver. A different target but the same language.

It is true that new idioms will often be discovered along the way of converting existing C to pass the bounded model checker in every branch condition and in every case. However, software that is already relatively safe will require very little modification. I've seen it go both ways. Simpler code bases can pass model checks relatively unscathed. More complex code bases require refactoring to pass model checking.

To my point, the code base can remain in C, and can be model checked gradually. It doesn't have to be ported to a different language or platform. But, it will require added assertions and some refactoring to make the execution of code more clear. It's still in C. The specifications are specified in C using regular assertions. The only thing that changes is that one will often use shadow methods -- still written in C but simpler than the functions they are shadowing -- in order to model check other functions.

Other bounded model checkers like JBMC, Kani, or PolySpace work in similar ways.

roca
1 replies
12h31m

Bounded model checking is not a silver bullet. If you want to prove it is, verify a Web browser and blog about it.

nanolith
0 replies
6h49m

There are no silver bullets. But, that doesn't mean that we should dismiss tooling that is not well understood in order to chase unrealistic goals, like rewriting extant code bases in a different language to achieve security goals. Or, worse, as this article suggests, using mechanical translation to somehow capture the features of error-prone software without carrying over the errors.

Better process and better tooling allows us to write better software. Bounded model checking is an incredibly useful bit of tooling that allows us, within context of the software, to demonstrate that certain conditions do not arise. This includes memory errors, resource errors, and other classes of errors. The limitation is the faithfulness of the translation to SMT and the complexity of the code being modeled. The former has gotten quite good with CBMC 6, and the latter can be managed through careful refactoring and shadow function substitution.

Is it magic? There is no such thing. But, it is a practical tool that is available for use today.

One need not wait until an entire web browser is verified using it. It can scale to this, but given the unreasonable size and scope of web browsers with respect to this challenge, which are basically operating systems and suites of software in one these days, that's like saying, "verify all software then blog about it."

nanolith
2 replies
22h30m

A huge investment. If you are going to do that then you might as well just move to Rust.

People say that, but the people who say this rarely have any practical experience using CBMC. It's very straight-forward to use. I could teach a developer to use it reliably, on practical software, in a month.

pdimitar
1 replies
22h27m

I am not denying it, nor am I claiming that "just move to Rust" is an universal escape hatch.

What I am saying is that if it were as simple as "just learn CBMC" then maybe Microsoft and Google would have not published their studies demonstrating that 60% - 75% of all CVEs are memory safety errors like buffer under-/over-flows.

nanolith
0 replies
22h22m

These studies aren't wrong. But, that's also because neither Microsoft nor Google make use of practical formal methods in practice. Both have research teams and pie-in-the-sky projects, not dissimilar to this DARPA project. But, when it comes down to the nitty-gritty development cycle, both companies use decades old software development practices.

andrewflnr
2 replies
18h54m

There is no such mentality anywhere.

There definitely is. Mainstream and official Rust community material is generally sane, but the meme did not come from nowhere. The rewrite-everything people are out there.

pdimitar
1 replies
18h40m

The rewrite-everything people are out there.

Meh, there are zealots in every community -- we're not even talking programming language communities only. Not even programming either. Everywhere.

No idea why people over-reacted so much to one particular 0.1% fanatics. It's a pretty normal state of affairs. Point me at your hobby group and even if it is only 20 people I can bet my balls at least 1 of them is a fanatic.

andrewflnr
0 replies
16h2m

Overreacting to fanatics is also a normal state of affairs, so don't act surprised. :) By their nature fanatics almost always make a disproportionate amount of noise, and if you're outside the community you often can't tell the difference: don't know which if any of the loudmouths members pay attention to, etc. And even more broadly, a small number of people can cause a lot of damage.

pjmlp
11 replies
11h49m

Modern C still has the same security exploits in arrays and strings as Classical C, nothing changed in 50 years.

Y_Y
5 replies
10h5m

The programmers have changed, the machines have changed, the literature has changed, the compilers have changed a lot. You can still write and run the old insecure code, but you'll get warnings and hit stack canaries and your colleagues will gasp at you and your merge requests will be rejected.

pjmlp
3 replies
9h29m

Meaningless changes, as proven by the CVE database, or the kernel corruption by a bad pointer caused by Crowdstrike.

Y_Y
2 replies
6h48m

I respectfully disagree. GP claimed that nothing has changed [regarding string and array security bugs in C] in 50 years. I responded that many relevant factors have changed, such that people tend to write different code now which is less susceptible to those bugs. Of course the same old bugs are possible, and sometimes good coders will still write them. Still I argue that there has been meaningful change since there are more protections against writing bugs in the first place, less incentive to write dangerous code, and more security for when (some) bugs still appear.

pjmlp
1 replies
6h33m

ISO C89 is exactly like ISO C23 in that regard.

CVE database proves that those kind of errors keep coming up in 2024, regardless of those changes.

Not only do they keep coming up, the monetary cost of fixing those issues has raised up to a level that now even governments are looking into this.

Y_Y
0 replies
5h22m

You've made three true statements, but I don't agree if you're implying that they prove that "nothing has changed". Bugs still appear, but they are significantly less common (per project or line not per year) and not as damaging when they occur. This is a non-trivial change for the better in the realm of C application quality.

There are more slaves in the world now than ever before in history, but global society has still made great progress on eliminating it in the last thousand years.

Avamander
0 replies
8h45m

All that has changed but we still got the libcue code execution bug.

I could not find an open-source static analyzer (including -analyzer) that would actually pick up the flaw before someone tries to exploit it.

And that's a simple example.

We can't tame the dragon C is, empirically nobody can.

nanolith
4 replies
7h9m

Bounded model checking has changed things. C on its own can't solve these problems. Likewise, Rust on its own -- while it can solve memory errors -- can't demonstrate safety from all errors that lead to CVEs.

Practical formal methods using a tool like CBMC can make C safer. The existing code base can be made safer without porting it to a new language or using experimental mechanical translation. This isn't just something for C. Such tools exist for many languages now, including Rust, so that even Rust can be made safer.

pjmlp
3 replies
6h34m

The amount of people using stuff like CBMC is like trying to boil the ocean.

WG14 can solve those problems, they decided it isn't their priority to fix C.

nanolith
2 replies
6h24m

The amount of people using stuff CBMC is like trying to boil the ocean.

That's like saying, "Getting everyone to use Rust or TDD or X is like trying to boil the ocean."

It's impossible to solve all things for all people at once. But, that doesn't mean that we can't advocate for tooling that can be used today to build safer software. This goes beyond C, as such tools and techniques are being ported to many languages and platforms.

Rust is a solution that works for some people. Modern C with bounded model checking is another solution that works for some other people. I'm certainly not going to change the minds of folks who have decided to port a project to Rust and who are willing to spend the engineering budget for this. But, hopefully, I can convince someone to try bounded model checking instead of maintaining the status quo. Because, the status quo is where we are with projects like the Linux kernel. Linux may pay lip service to Rust folks and allow them to write some components in that language, but the majority of the kernel is still in C and is not being properly vetted for these vulnerabilities, as we can see with the stream of CVEs coming out weekly.

WG14 can solve those problems, they decided it isn't their priority to fix C.

WG14 must maintain some semblance of backwards compatibility with previous versions of C. It's no good to make a feature that breaks older code. This happens from time to time -- old school K&R C won't work in a C18 or C23 compliant compiler -- but efforts are made to keep that legacy code compiling, for good or ill.

pjmlp
1 replies
4h12m

50 years are more than enough time to improve C's security story.

nanolith
0 replies
3h0m

Yep, but we have to deal with what we have. For better or for worse, C remains where it is. We can either use process and tools to improve existing C, or throw our hands up.

I prefer to work toward fixing what is. We are unlikely to see things like array slices in C, and even if such features were added, this does nothing to fix the billions of lines of legacy code out there.

IshKebab
5 replies
1d1h

I think a far better and more mature process is to update C to modern C and use a model checker such as CBMC to verify memory, resource, and integer math safety.

No chance. CBMC is amazing, but have you actually tried formally verifying a "real" program?

I agree replacing with a hand-architected Rust version is clearly the better solution but also more expensive. I think they're going for an RLBox style "improve security significantly with little-to-no effort" type product here. That doesn't mean you shouldn't do a full manual rewrite if you have the resources, but it's better than nothing if you haven't.

nanolith
4 replies
22h32m

No chance. CBMC is amazing, but have you actually tried formally verifying a "real" program?

Yes. Every day. It's actually quite easy to do. Write shadow methods covering the resources and function contracts of called functions, then verify the function. Repeat all of the way up and down the stack. It adds about 30% overhead over just TDD development.

PhilipRoman
3 replies
21h39m

Last time I tried CBMC, it ended up running out of memory for relatively small programs, do you encounter any resource usage issues with it? I'm learning Frama-C and I find it more predictable, although the non-determinism of solvers shocked me when I first tried to prove non-trivial programs. I guess ideally I would like something even more explicit than Frama-C.

nanolith
2 replies
21h25m

CBMC works best on functions, not programs. You want to isolate an individual function, then provide shadows of the functions it calls. The shadows should have nondeterministic behavior (cover every possible error condition) and otherwise follow the same memory and resource rules as the original function. For instance, if shadowing a function that reads a buffer, the shadow should ensure full buffer access as part of its assertions.

The biggest issue you will run into with bounded model checking is recursion and looping. In these cases, you want to refactor the code to make it easier to formally verify outside of the loop. Capture and assert on loop variants / invariants, and feed these forward in assertions on code.

There's no way I can capture all of this in an HN comment, but to get CBMC to work, you need to break down your code.

PhilipRoman
1 replies
21h16m

Thanks, that was really helpful. Relying on getting shadow functions right does seem icky, but I guess the improved productivity of CBMC should make up for it. Definitely going to give it another chance!

nanolith
0 replies
21h5m

You're welcome. I've been meaning to write a blog article on the subject, because it is a subtle thing to get working.

Think of shadow functions as the specifications that you are building. Unlike proof assistants or Frama-C, you write specifications in C itself, and they work similarly to code. Often, the same contracts you write in these specifications can be shared by both the shadow functions and the real functions they shadow.

I take a bottom-up approach to model checking. I'll start by model checking the lowest level code, then I'll shadow this code to model check code that depends on it. In this way, I can increase the level of abstraction for model checking, focusing just on the side effects and contracts of functions I shadow, and move up the stack toward more and more general code.

usrusr
2 replies
22h43m

If you have dormant code, as in running everywhere but not getting worked on anywhere, a "translate to shitty rust before ever touching again" has a certain appeal. Not the appeal of an obviously good idea: chances are the "shitty rust" created through translation would be so much worse to work on than C with some level of background noise of bugs (that would also be present in the "shitty rust" thanks to faithful translation). In C, people have an idea about how to deal with the problems. In "shitty rust", it's, well, shitty, because rust people are not used to that stuff.

But there's a non-zero chance that someone could develop a skillset for iteratively cleaning up into something tolerable.

And then there are non-goal things that could grow out of the project, e.g. some form of linter feedback "can't translate into tolerable rust because of x, y and z". C people could look into that, and once the code is translatable into good rust, why translate.

If that was an outcome of the project, some people might find it easier to describe their solution in runnable C and let the "translator/linter" guide them to a non-broken approach.

I'd certainly consider all these positive outcomes quite unlikely, but isn't it pretty much the job description of DARPA to do the occasional dark horse bet?

suprjami
1 replies
20h46m

In my experience (supporting a machine-translated codebase which resulted in shitty Java) your theory doesn't play out.

If you give developers a shitty codebase then those developers will leave to work somewhere else.

After a few years of working on this codebase we had 88% turnover. 1 in 10 developers remembered the original project's design philosophy and intention.

It wasn't a good situation.

dcsommer
0 replies
19h0m

GP was proposing a different situation where the source code is not changing or changing very rarely. If you have a high churn codebase, obviously the maintenance experience will worsen dramatically after machine translation (at least with many current tools), so your experience is not unexpected.

Apofis
0 replies
23h11m

This is definitely a pie-in-the-sky DARPA challenge that would be great to have around as we migrate away from legacy systems, however, even taking your functions/methods in one language and giving them to ChatGPT and asking it to translate your method to a different language generally doesn't work. Asking ChatGPT the initial problem you're trying to solve, works more frequently, but still generally doesn't work. You still need to do a lot of tinkering and thinking to get even basic things to work that it outputs.

jcalvinowens
11 replies
1d1h

This isn't some "pie in the sky" thing, Immunant has a working C to Rust transpiler and it's really interesting: https://github.com/immunant/c2rust

Animats
6 replies
23h28m

I've tried that thing. The Rust that comes out is terrible. It converts C into a set of Rust function calls which explicitly emulate C semantics by manipulating raw pointers. It doesn't even convert C arrays to a Vec. It's a brute-force transliteration, not a translation.

I and someone else ran this on a JPEG 2000 decoder that sometimes crashed with a bad memory reference. The Rust version crashed with the same bad memory reference. It's bug-compatible.

What comes out is totally unreadable and much bigger than the original C code. Manual "refactoring" of that output is hopeless.

tialaramex
2 replies
18h13m

It doesn't make sense to convert a C array to a Vec, the Vec type is a growable array but the C array isn't growable. It makes sense to convert to Rust's array type, which has a fixed size, and we realise there's a problem at API boundaries because C's arrays decay to pointers, so the moment we touch an API boundary all safety is destroyed.

Animats
1 replies
12h13m

Depends on how the array is created. If it comes from "malloc" or C++ "new", it may need to be created as a "Vec".

tialaramex
0 replies
2h8m

Firstly, that's not an array. C has actual arrays, even though they decay to pointers at API edges and what you've made with malloc is not an array. I'll disregard C++ new and new[]

But also, it's definitely not a growable array. Box::new_uninit_slice makes the thing you've got here, a heap allocation of some specific size, which doesn't magically grow (or shrink) and isn't initialized yet.

marcosdumay
1 replies
22h37m

Any automatic translation is bug-compatible with the original. Did you expect it to divine some requirements?

It still leave you with Rust code that you can improve piecewise. The only question is if something like it is better than FFI calling the C code.

bornfreddy
0 replies
22h21m

Any automatic translation is bug-compatible with the original. Did you expect it to divine some requirements?

That would be useless when translating C to Rust. Yes, I would expect the tool to point out the flaws in the original memory handling and only translate the corrected code. This is far from easy, since some information (intent) is missing, but a good coder could do it on decent codebases. The question is, can an automated tool do it too? We'll see.

jcalvinowens
0 replies
19h10m

I ran this on a JPEG 2000 decoder that sometimes crashed with a bad memory reference. The Rust version crashed with the same bad memory reference. It's bug-compatible.

Of course it is. The README says it generates unsafe rust in the first paragraph, what did you expect?

I think it's a really fascinating experiment, and IMHO it's pretty remarkable what it can do. This is an incredibly difficult problem after all...

steveklabnik
1 replies
1d1h

Their work was also previously sponsored by DARPA, though I do not know if it was under this program or something else.

woodson
0 replies
2h41m

It must have been a different program, as this one hasn’t started yet, but perhaps by another program by the same program manager.

jcranmer
0 replies
22h53m

As I mentioned elsewhere (https://news.ycombinator.com/item?id=41113257), that tool is pretty much useless unless you have some checkbox that says "no C code allowed anywhere". It's not even a feasible starting point for refactoring because the code is so far from idiomatic Rust.

bubberducky
0 replies
17h56m

It seems easy (relatively speaking) to directly translate C to Rust if you're allowed to use unsafe and don't make an effort to actually verify the soundness of the code. But if you need to verify the soundness and fix bugs while translating it? That's really hard, and that's what it sounds like what TRACTOR wants to do.

Using "unsafe" doesn't automatically make Rust useless, of course, but the example on the c2rust website itself doesn't make any effort to verify its usage of unsafe (you can easily read memory out of bounds just by changing "n" to "n + 1" in the example loop). Sadly, that is a much, much harder problem to solve even for fairly basic C programs.

0xbadcafebee
8 replies
1d1h

Every tool has its own specific quirks. Over many years of using a tool, "expertise" is the intimate knowledge of those quirks and how to use that tool most effectively. Changing tools requires you to gain expertise again. You're going to be less proficient in the new tool for a long time, and make a lot of mistakes.

Considering we already know how to make C/C++ programs memory safe, it's bizarre that people would ditch all of their expertise, and the years and years of perfecting the operation of those programs, and throw all that out the window because they can't be bothered to use a particular set of functions [that enforce memory safety].

If you're going to go to all of the trouble to gain expertise in an entirely new tool, plus porting a legacy program to the new tool, I think you need a better rationale than "it does memory safety now". You should have more to show for your efforts than just that, and take advantage of the situation to add more value.

wffurr
6 replies
1d1h

But even proficient C and C++ programmers continue to produce code with memory safety issues leading to remote code execution exploits. This argument doesn’t hold up to the actual experience of large C and C++ projects.

0xbadcafebee
5 replies
21h4m

They aren't trying to prevent them. It's trivial to prevent them if you actually put effort into it; if you don't, it's going to be vulnerable. This is true of all security concerns.

woodruffw
2 replies
20h32m

"You aren't trying hard enough" isn't a serious approach to security: if it was, we wouldn't require seatbelts in cars or health inspections in restaurants.

(It's also not clear that they aren't trying hard enough: Google, Apple, etc. have billions of dollars riding on the safety of their products, but still largely fail to produce memory-safe C and C++ codebases.)

hgs3
1 replies
15h24m

In the case of OpenSSL, Big Tech clearly neglected proper support until after the Heartbleed vulnerability. Prior to Heartbleed, the OpenSSL Software Foundation only received about $2K annually in donations and employed just one full-time employee [1]. Given the projects critical role in internet security, Big Techs neglect raises concerns about their quality assurance practices for less critical projects.

The OpenSSL Foundation is not exempt from criticism despite inadequate funding. Heartbleed was discovered by security researches using fuzz testing, but proactive fuzz testing should have been a standard practice from the start.

[1] https://arstechnica.com/information-technology/2014/04/tech-...

woodruffw
0 replies
14h30m

OpenSSL is not a great example, either before or after funding — it’s a notoriously poorly architected codebase with multiple layers of flawed abstractions. I meant things more like Chromium, WebKit, etc.: these have dozens to hundreds of professional top-bracket C and C++ developers working on them, and they still can’t avoid memory corruption bugs.

wffurr
1 replies
16h56m

No True C Programmer writes code with buffer overflows in it. It's pretty clear this is not a serious take.

defrost
0 replies
16h46m

FWiW "True C Programmers" delibrately coded "buffer overflows" all the time back in the day.

The practice of using variable sized structures that began with type and size info and ended with a char[1] was commonplace.

https://hex-rays.com/blog/igors-tip-of-the-week-94-variable-...

Good True C Programmers had guard rails | canary bytes | etc. to detect and avoid actual buffer overflow (into unallocated memory) rather than technical buffer overflow (reading|writing past the end of a char|byte array).

bigstrat2003
0 replies
19h8m

Considering we already know how to make C/C++ programs memory safe...

I think that the legion of memory bugs which still occur in C/C++ programs are proof of one of two things:

1. We (the industry as a whole) do not actually know how to make these programs memory safe, or

2. Knowing how to make programs memory safe in C/C++ is not sufficient to prevent memory safety issues.

Either way, it seems clear that something needs to be done and that the status quo in C/C++ programming is not enough. I'm not saying Rust will be the right answer in the end (I do like it, but there's a ton of hype and hype makes me distrustful), but I can't fault people for wanting to try something new.

deepsun
7 replies
1d1h

They didn't explain why they've chosen Rust. There are a lot of memory-safe languages besides Rust, especially in application-level area (not systems-level like Rust).

woodruffw
2 replies
1d1h

There are a lot of memory safe languages; there are fewer that have (1) marginal runtime requirements, (2) transparent interop/FFI with existing C codebases, (3) enable both spatial and temporal memory safety without GC, and (4) have significant development momentum behind them. Rust doesn't have to be unique among these qualifications, but it's currently preeminent.

deepsun
1 replies
1d1h

Yes, but you assume all their projects need all 4 of these. I like Rust, but it's a bad choice for many areas (e.g. aforementioned application-level code). I'd expect serious decisions to at least take that into account.

woodruffw
0 replies
1d1h

I’m not assuming anything of the sort. These are just properties that make Rust a nice target for automatic translation of C programs; there are myriad factors that guarantee that nowhere close to 100% of programs (C, application level, or otherwise) won’t be suitable for translation.

oconnor663
2 replies
1d

Apart from runtime/embedded requirements, there's the big question of how you represent what C is doing in other languages that don't have interior pointers and pointer casting. For example, in C I might have a `struct foo*` that aliases the 7th element of a `struct foo[]` array. How do you represent that in Java or Python? I don't think you can use regular objects or regular arrays/lists from either of those languages, because you need assignments through the pointer (of the whole `struct foo`, not just individual field writes) to affect the array. Even worse, in C I might have a `const char*` that aliases the same element and expects every write to affect its bytes. To model all this you'd need some Frankenstein, technically-Turing-complete, giant-bytestring-that-represents-all-of-memory thing that wouldn't really be Java or Python in any meaningful sense, wouldn't be remotely readable or maintainable, and wouldn't be able to interoperate with any existing libraries.

In Rust you presumably do all of that with raw pointers, which leaves you with a big unsafe mess to clean up over time, and I imagine a lot of the hard work of this project is trying to minimize that mess. But at least the mess that you have is recognizably Rust, and incremental cleanup is possible.

wolfspider
1 replies
15h50m

I’ve spent the past few months translating a C library heavy in pointer arithmetic to TypeScript. Concessions have to be made here and there but ended up making utility classes to capture some of the functionality. Structs can be represented as types since they are able to also to be expressed as unions similar to structs. These const types can have fields updated in place and inherit properties from other variables similar to passing by reference which JS can do (pass by sharing) or use a deep clone to copy. As far as affecting the underlying bytes as a type I’ve come up with something I call byte type reflection which is a union type which does self-inference on the object properties in order to flatten itself into a bytearray so that the usual object indexing and length properties automatically only apply to the byte array as it has been expressed (the underlying object remains as well). C automatically does this so there is some overhead for this that cannot be removed. Pointer arithmetic can be applied with an iterator class which keeps track of the underlying data object but sadly does count as another copy. Array splicing can substitute creating a view of a pointer array which is not optimal but there are some Kotlin-esque utilities that create array views which can be used. Surprisingly, the floating point values which I expected to be way off and can only express as a number type are close enough. I use Deno FFI so plenty of room to go back to unmanaged code for optimizations and WASM can be tapped into easily. For me those values are what is important and it does the job adequately. The code is also way more resilient to runtime errors as opposed to the C library which has a tendency to just blow up. TLDR; Don’t let it stop you until you try because you might just be surprised at how it turns out. If the function calls of a library are only 2-3 levels deep how much “performance” are you really gaining by keeping it that way? Marshalling code is the usual answer and Deno FFI does an amazing job at that.

oconnor663
0 replies
3h10m

Very cool! Is your work all by hand, or have you been able to automate some of it?

galangalalgol
0 replies
1d1h

If you have your cross hair on c, then you want a language that can do whatever c does. That makes the list of memory safe languages a lot shorter.

commodoreboxer
7 replies
1d1h

A lot of people are reading this as a call or demand to translate all C and C++ code to Rust, but (despite the catchy project name), I don't read the abstract in that way. There are two related but separate paragraphs.

1. C and C++ just aren't safe enough at large. Even with careful programming and good tooling, so many vulnerabilities are caused by their unsafe by default designs. Therefore, as much code as possible should be translated to or written in "safe" languages (especially ones that guarantee memory safety).

2. We are funding and calling for software to translate existing C code into Rust.

It's not a consensus to rewrite the world in Rust. It's a consensus to migrate to safe languages, which Rust is an example of, and a program that targets Rust in such migration.

akira2501
6 replies
22h8m

or written in "safe" languages

So when those languages have 'unsafe' constructs what are the rules going to be around using those? Without a defining set of rules to use here you're just going to end up right back where you started.

to migrate to safe languages, which Rust is an example of

Rust has a safe mode. It is _not_ a safe language. To do anything interesting you will require unsafe blocks. This will not get you very much.

Meanwhile you have tons of garbage collected languages that don't even let the programmer touch pointers. Why aren't those considered? The reason is performance. And because Rust programmers "care" so much about performance you're not ever going to solve the fundamental problem with that language.

Do you want performance or safety? You can't have both.

timeon
3 replies
21h59m

To do anything interesting you will require unsafe blocks. This will not get you very much.

This is not true.

akira2501
2 replies
21h46m

This is not true.

Burying unsafe blocks in unevaluated cargo modules does not make this true. You're just taking the original problem and sweeping it under the rug.

commodoreboxer
1 replies
17h35m

You can do tons of stuff with purely safe Rust. The main things that you can't do are FFI, making self-referential structures, and dereferencing raw pointers.

And unsafe isn't a problem. It's a point of potential danger to be heavily audited, tested, and understood. Having the entire language unsafe by default is an obviously worse situation. This is throwing the baby out with the bathwater, like rallying against seat belts because you can still die while wearing one. An improvement is still an improvement. I don't understand why people criticizing Rust tend so heavily to let perfect be the enemy of good.

techbrovanguard
0 replies
11h29m

I don't understand why people criticizing Rust tend so heavily to let perfect be the enemy of good.

if you've convinced yourself that you're special and all problems with c are solved by trying harder, clearly everyone else is just lazy. with that line of logic, there's nothing to fix with c. rust is not just redundant, but also aggravating, since its popularity causes the cognitive dissonance to start creeping in.

maybe i can make mistakes? should we improve tooling somewhat? no, it's the children who are wrong.

techbrovanguard
0 replies
14h49m

To do anything interesting you will require unsafe blocks

this is just flagrantly false, have you no shame?

bigstrat2003
0 replies
19h58m

Rust has a safe mode. It is _not_ a safe language. To do anything interesting you will require unsafe blocks. This will not get you very much.

1. There are plenty of interesting programs which don't require unsafe.

2. Even if your program does require unsafe, Rust still limits where the unsafety is. This lets you focus your scrutiny on the small section of the program which is critical for safety guarantees to hold. That is still a win.

Animats
6 replies
1d

It's good to see DARPA pushing on this. It's a hard problem, but by no means impossible. Translating to safe Rust, though, is going to be really tough. There's a C to Rust translator now, but what comes out is horrible Rust, which just rewrites C pointer manipulation as unsafe Rust struct manipulation. The result is less maintainable than the original.

So what would it take to actually do this right? The two big problems are 1) array sizes, and 2) non-affine pointer usage. Pointer arithmetic is also hard, but rare. Most pointer arithmetic can be expressed as slices.

Every array in C has a size. It's just that the compiler doesn't know what it is.

Where is this being discussed in detail?

jcranmer
2 replies
22h57m

I once tried to use c2rust as a starting point for rustification of code and... it's not even good at that. The code is just too freakishly literal to the original C semantics that you can't even take the non-pointery bits and strip off the unsafe block and use that as a basis.

(To give you a sense, it translates something like a + 1 to a.unwrapped_add(1i32), and my recollection is that for (int i = 0; i < 10; i++) gets helpfully turned into a while loop instead of a for loop).

In general, the various challenges that all need to be solved that aren't solved yet are:

a) when is integer overflow intentional in the original code so that you know when to use wrapping_op instead of regular Rust operators?

b) how to convert unions into Rust enums

c) when pointers are slices, and what corresponds to the length of the slice

d) convert pointers to references, and know when they're mutable or const references

e) work out lifetime annotations where necessary

f) know when to add interior mutability to structs

g) wrap things in Mutex/RwLock/etc. for multithreaded access

We're a very long way from having full-application conversion workable, and that might be sufficiently difficult that it's impossible.

Animats
1 replies
21h37m

That doesn't mention the affine type problem. Rust references are restricted to single ownership. If A has a reference to B, B can't have a reference to A. Bi-directional references are not only a common idiom in C, they're an inherent part of C++ objects.

Rust has to use reference counts in such situations. You have an Rc wrapped around structs, sometimes a RefCell, and .borrow() calls that panic when you have a conflict. C code translates badly into that kind of structure.

Static analysis might help find .borrow() and .borrow_mut() calls that will panic, or which won't panic. It's very similar to finding lock deadlocks of the type where one thread locks the same lock twice.

(If static analysis shows that no .borrow() or .borrow_mut() for an RwLock will panic, you don't really need the RwLock. That's worth pursuing as a way to allow Rust to have back references.)

jcranmer
0 replies
21h20m

I'd lump that analysis somewhere in the d-g, because you have to remember that &mut is also noalias and work out downstream implications of that. It's probably presumptive of me to assume a particular workflow for reconstructing the ownership model to express in Rust, and dividing that into the steps I did isn't the only way to do it.

In any case, it's the difficulty of that reconstruction step that leaves me thinking that automated conversion of whole-application to Rust is a near-impossibility. Conversion of an individual function that works on plain-old-data structures is probably doable, if somewhat challenging.

An off-the-cuff idea I just had is to implement a semi-automated transformation, where the user has to input what a final conversion of a struct type should look like (including all Cell/Rc/whatever wrappers as needed), and the tool can use that to work out the rest of the translation. There's probably a lot of ways that can go horribly wrong, but it seems more feasible than trying to figure out all of the wrappers need to be.

steveklabnik
1 replies
1d

Where is this being discussed in detail?

In my understanding, this is a call for proposals to do the work, there is no detailed discussion yet. That will come when there's actual responses to this call.

Animats
0 replies
1d

Right, there's a call, and a project day with an in-person meeting coming up.

clintfred
0 replies
19h56m

Even if just all the unsafe areas were marked, wouldn't that be valuable? At least it would focus review efforts on the parts with the most risk?

thibran
3 replies
1d1h

Porting the Linux kernel to 100% Rust should be the benchmark for AGI.

... and when done, please port SQLite too :)

0cf8612b2e1e
2 replies
1d

I am fully in the RIIR koolaid, but SQLite would be near the absolute bottom of my prioritization list. Care to explain? SQLite is extensively tested, has requirements to run on ~every platform, be backwards compatible, and has a relatively small blast radius if there is a C derived bug. There is much more fertile ground in any number of core system services (network, sudo, dns, etc)

xbar
0 replies
19h0m

Fair. But perhaps it has a narrow attack surface.

sans-seraph
3 replies
1d

I have been aware of this proposed initiative for some time and I find it interesting that it is now becoming public. It is a very ambitious proposal and I agree that this level of ambition is appropriate for DARPA's mission and I wish them well.

As a Rust advocate in this domain I have attempted to temper the expectations of those driving this proposal with due respect to the feasibility of automatic translation from C to Rust. The fundamental obstacle that I foresee remains that C source code contains less information than Rust source code. In order to translate C code to Rust code that missing information must be produced by someone or something. It is easy to prove that it is impossible to infallibly generate this missing information for the same reason that scaling an image to make it larger cannot infallibly produce bits of information that were not captured by the original image. Instead we must extrapolate (invent) the missing information from the existing source code. To extrapolate correctly we must exercise judgement and this is a fallible process especially when exercised in large quantities by unsupervised language models. I have proposed solutions that I believe would go some way towards addressing these problems but I will decline to go into detail.

Ultimately I will say that I believe that it is possible for this project to achieve a measure of success, although it must be undertaken with caution and with measured expectations. At the same time it should be emphasized it is also possible that no public result will come of this project and so I caution those here against reading too much into this at this time. In particular I would remind everyone that the government is not a singular entity and so I would not interpret this project as a blanket denouncement against C or vice versa as a blanket blessing of Rust. Each agency will set its own direction and timelines for the adoption of memory-safe technologies. For example NIST recommends Rust as well as Ada SPARK in addition to various hardened dialects of C/C++.

the8472
0 replies
9h47m

In order to translate C code to Rust code that missing information must be produced by someone or something.

If you don't go for preserving the formal semantics of C code and instead only require the test-suite to still pass after translation that can provide a lot of wiggle room for the translation. This is how oxidation projects often work in practice. Fuzzers can also help with generating additional test data to get good branch coverage.

steveklabnik
0 replies
1d

As a Rust advocate in this domain I have attempted to temper the expectations of those driving this proposal

Thank you!

PaulHoule
3 replies
23h26m

Whatever happened to Ada?

wffurr
2 replies
23h21m

It languished in government work behind a wall of extremely expensive compilers and contractors. Never heard anyone suggest RiiA - Rewrite it in Ada.

nvy
1 replies
23h4m

GCC contains `gnat` which is a libre Ada compiler.

I think Ada has a lot of technical merit but it's just not fashionable the way Rust is, for lots of uninteresting reasons.

PaulHoule
0 replies
21h55m

I remember Ada getting pushed in a time when there were many in the computer industry that were pushing Pascal as both a systems and a teaching language. Ada was a lot like Pascal which I think caused an immediate violent reaction in some people. (e.g. the implementers of every other programming language were pissed that BASIC was so hegemonic but they never asked "Why?" or if their alternatives were really any better)

In the early 1980s, microcomputer implementations such as UCSD Pascal were absolutely horrific in terms of performance plus missing the features you'd need to do actual systems programming work. In the middle of the decade you saw Turbo Pascal which could compile programs before you aged to death and also extended Pascal sufficiently to compete with C. But then you had C, and the three-letter agencies were still covering up everything they knew about buffer overflows.

simon_void
2 replies
23h56m

a) if every C program could be translated into an equivalent safe Rust program, that would mean that each C program is as safe as the safe Rust equivalent. b) since there are C programs that are open to memory currption in a way safe Rust isn't, this corruptability would need to be translated into partially unsafe Rust. Congrats, you now have a corruptible Rust program, what's the point again?? c) so DARPA must be trying to fix/change what the program is doing when switching to Rust. So how to discern what behaviour is intended and which is not? Doesn't this run directly into the undecidability/uncomputability of the halting problem!?!

im3w1l
0 replies
36m

Memory corruption is undefined behavior and means the compiler is free to do anything it wants.

Anything it wants... and that includes doing something entirely safe and reasonable.

If you write out of bounds, the compiler is allowed to shut the program down in a controlled manner. It's allowed to transparently resize the array for you. Etc.

Hence a rust translation can do these things.

Arnavion
0 replies
20h51m

Doesn't this run directly into the undecidability/uncomputability of the halting problem!?!

The programmer gets to decide. DARPA does not expect the translator program to autonomously output a perfect Rust program. It just wants a "high degree of automation towards translating legacy C to Rust" (from the sam.gov link in the submission, emphasis mine).

TinkersW
2 replies
1d2h

Good luck with that..also shouldn't the target be C++ to Rust? Is there really that much pure C still being written?

surfingdino
0 replies
1d1h

IoT, embedded systems still use it. There's loads of them.

riku_iki
0 replies
22h36m

AGI may find much simpler, more robust/performant and safe language.

thesuperbigfrog
0 replies
1d2h

Direct link to Proposer's Day info [PDF]: https://sam.gov/api/prod/opps/v3/opportunities/resources/fil...

"The purpose of this event is to provide information on the TRACTOR technical goals and challenges, address questions from potential proposers, and provide an opportunity for potential proposers to consider how their research may align with the TRACTOR program objectives."

plasticeagle
1 replies
21h47m

If

1) Rust contains no memory bugs 2) C can be automatically translated to it

Then all memory bugs can be fixed automatically, which is almost certainly untrue. This task is very likely completely impossible in the general case.

warkdarrior
0 replies
21h11m

Since you did not specify that you wish to preserve all behaviors of the C code, there are trivial solutions to this problem. For example, one could replace all dynamic memory allocations with fixed buffers (set at translation time), and reject all inputs that do not fit in those buffers.

niemandhier
1 replies
1d2h

Is this supposed to be automatic ? And if so wouldn’t any Programm that can automatically port c to rust, by necessity contain all the functionality to make the c code itself safe?

gpm
0 replies
1d1h

I don't think a reasonable reading of the statement implies "fully automated", at which point the answer to the question is no.

Obviously some C code isn't just "not verifiable correct" but "actually wrong in a memory unsafe way". That code isn't going to be automatically translated without human intervention because, how could it be, there is no correct equivalent code. The tooling is going to have to have an escape hatch where it says "I don't know what this code is meant to do, and I know it isn't meant to do what it does do (violate promises to the compiler), help me human".

On a theoretical level it's not possible for that escape hatch to only be used when undefined behaviour does occur (rices theorem). On a practical level it's probably not even desirable to try because obtuse enough code shouldn't just be blindly translated.

So what I imagine the tooling ends up looking like is an interactive tool that does the vast majority of the work for you, but is guided by a human, and ultimately as a result of that human guidance doesn't end up with exactly equivalent code, just code that serves the same purpose.

luke-stanley
1 replies
19h51m

Surely this could be better pitched to researchers as just another AI benchmark, a bit like ARC Prize? ;) There could be some exiting C projects that are already public, with tests for feedback during development iteration and some holdout tests, and some holdout projects too with a leaderboard and prizes. For preferences about converted code quality, both automated assesment and human preferences could be ranked with Elo? Kaggle is made for this sort of thing I think? I'm sure Google Deepmind and others have some MCTS agents that could do a great job with a bit of effort.

woodson
0 replies
2h36m

And like with most other competition/benchmark, the result is likely optimizing for the benchmark and not the wider goal ;-). It’s difficult to get a serious effort without people trying to game the benchmark.

ksp-atlas
1 replies
21h18m

Technically, Zig has this functionality built in via translate-c, but it's designed for reading by a C compiler, not a human

deepsun
0 replies
19h6m

Well, the main idea is memory-safety. Zig is certainly better, but not as memory-safe.

PS: Java or even JavaScript are memory-safe :)

b20000
1 replies
14h30m

“ the software engineering community has reached a consensus” … hahaha no sorry, I don’t think so

Your priority should be to learn how to build better software and not force a new language onto people.

do you remember the age old saying about nature and fools?

stkdump
0 replies
11h26m

I guess it is a consensus like `goto considered harmful` or `numbering should start at zero`, which is not a perfect consensus, but as much of a consensus as you can reach for such a disparate community.

tpoacher
0 replies
10h5m

Is this part of the ongoing debate about farmers wanting to be able to fix their own TRACTORs? /s

sim7c00
0 replies
23h7m

i like the idea but i struggle to see how one can go about doing 'safe' disk reads, having 'safe' ways to manage global resources in kernel land (page tables, descriptor tables etc) and a lot of other stuff. perhaps if those devices also have rust in their firmware they can reply safely?? genuinely curious because i went back to C from rust in my OS. i could not figure it out (maybe i am not a darpa level engineer but i did work at a similar place doing similar things).

id be excited if this gets solved. rust is a lot more comfy for higher level kernel stuff.

rpoisel
0 replies
23h2m

I think we have to take that literally: They only translate C code to Rust. Not C++.

ristos
0 replies
21h0m

I get the idea of moving to more memory safety, but the whole "rewrite everything in Rust" trend feels really misguided, because if you're talking about being able to trust code and code safety:

- Rust's compiler is 1.8 million lines of recursively compiled code, how can you or anyone know that what was written is actually trustworthy? Also memory safety is just a very small part of being able to actually trust code.

- C compiles down to straightforward assembly, almost like a direct translation, so you can at least verify that smaller programs that you write in C actually do compile down to assembly you expect, and compose those smaller programs into larger ones.

- C has valgrind and ASAN so it's at least possible to write safe code with code coding discipline, and plenty of software has been able to do this for decades.

- A lot of (almost all) higher level programming languages are written in C, which means that those languages just need to make sure they get the compiler and GC right, and then those languages can be used for general purpose, scripting, "low level" high level code like Go or OCaml, etc.

- There are many C compilers and only one Rust compiler, and it's unclear whether it'll really be feasible to have more than one Rust compiler due to the complexity of the language. So you're putting a lot of trust into a small group of people, and even if they're the most amazing, most ethical people, surely if a lot of critical infra is based on Rust they'll get targeted in some way.

- Something being open source doesn't mean it's been fully audited. We've seen all sorts of security vulnerabilities cause a world a hurt for a lot of people that came from all open source code, and often very small libraries that could actually be much easier to audit than lines with millions of lines of code.

- Similarly, Rust does not translate to straightforward assembly, and again would seem to be impossible to do given the complexity of the language.

- There was an interesting project I came across called CompCert, which aims to have a C compiler that's formally verified (in Coq) to translate into the assembly you expect. Something like a recursively compiled CompCert C -> OCaml -> Coq -> CompCert would be an interesting undertaking, which would make OCaml and Coq themselves built on formally verified code, but I'm not sure if that'll really work and I suspect it's too complicated.

- I think Rust might be able to solve some of these problems if they have a fully formally verified thing, and the formally verified thing is itself formally verified, and the compiler was verified by that thing, and then you know that you can trust the whole thing. Still, the level of complexity and the inability to at least manually audit the core of it makes me suspect it's too complicated and would still be based on trust of some sort.

- I still think that static analysis and building higher level languages on top of C is a better approach, and working on formal verification from there, because there are really small C compilers like tinycc that are ~50k LOCs, which can be hand verified. You can compile chibi-scheme with tinycc, for example, which is also about ~50k LOCs of C, and so you get a higher level language from about 100k LOCs (tcc and chibi), which is feasible for an ordinary but motivated dev to manually audit to know that it's producing sound assembly and not something wonky or sketchy. Ideally we should be building compilers and larger systems that are formally verified, but I think the core of whatever the formally verified system is has to be hand verifiable in some way in order to be trustworthy, so that you can by induction trust whatever gets built up from that, and I think that would need to require a straightforward translation into assembly, with ideally open source ISA and hardware, and a small enough codebase to be manually audited like the tinycc and chibi-scheme example I gave.

- Worst case everyone kind of shrugs it all off and just trusts all of these layers of complexity, which can be like C -> recursively compiled higher level lang -> coffeescript-like layer on top -> framework, which is apparently a thing now, and just hope that all of these layers of millions of lines of code of complexity don't explode in some weird way, intentionally or unintentionally.

- Best case of the worst case is that all of our appliances are now "smart" appliances, and then one day they just transform into robots that start chasing you around the house, all the while the Transformers cartoon theme is playing in the background while, which would match up nicely with the current trend of everything being both terrifying and hilarious in a really bizarre way.

pizlonator
0 replies
19h56m

Or you could just use Fil-C.

nickpsecurity
0 replies
20h54m

I think this is indirectly a great argument for automated, test generation or equivalence checking. The reason is that these translations might change the function of the code. Automated testing would show whether or not that happened. It also reveals many bugs.

So, they should solve total, automated testing first. Maybe in parallel. Then, use it for equivalence checks.

llm_trw
0 replies
18h6m

I'm reminded of Darpa's old plans for Ada. I expect we'll see the same issues come up as the last time they tried this.

kernal
0 replies
21h4m

I'm working on something similar that just wraps the C code in an Unsafe block.

jll29
0 replies
1d1h

Difficult: most C programs I know would convert to one single large "unsafe" block...

One might argue that re-writing from scratch is the safer option; and a re-write is also an opportunity to do things differently (read: improve the architecture by using what one has learned), despite the much-feared "second system" syndrome.

But nothing wrong with spending some research dollars towards tooling for "assisted legacy rewrites". DARPA and her sister IARPA fund step innovation (high risk, high reward), and this is an area where good things can come potentially come from.

anticristi
0 replies
1h29m

It sounds to me near-impossible to convert C or C++ into as-safe-as-possible Rust code, because the original intention of the developer are missing. However, I wonder if some clever generative AI could be taught to recognize sufficient C programming patterns in relevant code bases to make the problem tractable.

account42
0 replies
4h42m

the software engineering community has reached a consensus

lol

Thro4l31
0 replies
13h58m

The problems come with maintaining the translated code bases:

1. A code base written in C and a team of C engineers that have a good mental model of the code base to be able to maintain it.

2. An automatically translated Rust code base. Potentially (I'd say probably, but that is just my gut feeling) harder to read and understand than the original one.

3. Now you need a team of Rust engineers that have a good mental model of the code base that was generated.

If you already have that team of Rust engineers, I'd rather let them rewrite the code manually as they can improve it and have the correct mental model from the start.

Sytten
0 replies
14h15m

Would be nice if they could first hire all the smart engineers (that Mozilla laid off) to continue working on the language itself.

Async is still a half finished mess even for people that use it every day. And that is my main annoyance but there are many (trait specialization, orphan rule limits, HKT, etc.)

Aissen
0 replies
10h14m

Isn't weird that they don't mention the already-existing open-source project they funded, c2rust? And no mention of the company behind it, Immunant, either.

29athrowaway
0 replies
18h26m

Why not Ada or Zig?