return to table of content

Writing a Rust compiler in C

prologist11
40 replies
20h50m

This is super cool but what's interesting is that this same kind of bootstrapping problem exists for hardware as well. What makes computers? Previously built computers and software running on them. The whole thing is really interesting to think about.

durumu
16 replies
20h19m

Which came first, the computers or the code?

nine_k
10 replies
20h15m

(The code, of course; the code drove music boxes and looms centuries before computers. Same for chicken and egg: eggs are maybe a billion years older.)

867-5309
9 replies
19h31m

so..the code drove computers

codetrotter
8 replies
19h25m

Correct. And the chicken was written in COBOL.

saghm
5 replies
19h8m

Probably off-topic, but the chicken and the egg "paradox" always seemed silly to me in the context of evolution. We know that there were birds long before chickens, so at some point, the first bird that we would consider to be in the species "chicken" had to hatch from an egg from a bird that was _not_ a chicken, so the egg came first. (This assumes that the question is specifically about chicken eggs; it's even simpler if you count non-chicken eggs from the ancestors of the first chicken, but the logic still works even if you don't).

nurettin
1 replies
8h51m

There is no paradox, because there was never a non-chicken parent which was so different that we could consider the newborn chicken a new species. It takes thousands of generations to say such things, not one.

tialaramex
0 replies
7h25m

And this "it's a chicken" versus "it's not a chicken" distinction is ours, Mother Nature doesn't care whether these are chickens or not, the chickens do not make such a distinction. Same with particle/ wave duality, Mother Nature doesn't care whether light is a particle or not, that's our model and if it doesn't work too good it's our fault.

867-5309
1 replies
18h45m

the chicken is just an example of an egg-laying and -borne animal. substitute it with the first

OJFord
0 replies
18h29m

I think that changes the answer by GP's logic though, since then the first egg-layer obviously came before its egg.

OJFord
0 replies
18h32m

Or to take it another direction - how do they gestate? At what point can we call it a chicken and when does the shell (assuming that's what would make us call it an egg) develop?

ed_elliott_asc
1 replies
19h11m

So that is how it crossed the road

AnimalMuppet
0 replies
19h3m

No. It was running on a mainframe. It was JCL that let it cross the road.

wongarsu
1 replies
19h26m

Ada Lovelace is often credited as the first computer programmer. She died in the late 1800s. Programmable electronic computers didn't come along until the mid 1900s.

Though it obviously depends a bit on what you are willing to count as computer, or as code.

hackermailman
0 replies
6h8m

We all know why the Lovelace myth still persists http://projects.exeter.ac.uk/babbage/ada.html "It is often suggested that Ada was the world's first programmer. This is nonsense: Babbage was, if programmer is the right term. After Babbage came a mathematical assistant of his, Babbage's eldest son, Herschel, and possibly Babbage's two younger sons. Ada was probably the fourth, fifth or six person to write the programmes. Moreover all she did was rework some calculations Babbage had carried out years earlier. Ada's calculations were student exercises. Ada Lovelace figures in the history of the Calculating Engines as Babbage's interpretress"

hughesjj
1 replies
20h15m

Code, unless you count the abacus etc

sim7c00
0 replies
12h34m

i heard the first assembler was written in machine code, then that was used to create compiler. machine code u can just chuck into the cpu. its a little less trivial than assembly because its harder to remember but if u know assembly u can learn it easy enough :>. i dont feel this is an unrealistic path sk i chose to beleive it without any evidence :D

stevefan1999
0 replies
17h58m

It has to be the code, since those are the information/ideas that you've written on any kind of medium such as on a whiteboard or on a paper, or better known as "algorithms".

Also keep in mind with the use of "computer" -- in the past real humans, and in paricular a huge batch, are hired to compute log and sine lookup tables on hand. Earliest case of human SIMD by the way, and some would even take to break encryption by breaking and reversing code boxes, hence they are called "computers", and I reckon many of them being females.

massysett
7 replies
18h1m

Interesting to think about even at a human civilization level. What if humans somehow went back to the Stone Age, but in present day. Could we build back to what we have now?

Kind of a bootstrapping problem. For example, current oil reserves are harder to get than they were a century ago. Could we bootstrap our way into getting them?

adrianN
5 replies
9h31m

Would we want to build the same stuff again? Why bootstrap to oil if you can directly go for renewable alternatives?

massysett
4 replies
9h6m

But we used simpler forms like oil to bootstrap renewables. For instance, making solar panels takes lots of energy. Would it be possible to go straight to renewables?

bluGill
2 replies
5h35m

Wind power existed for hundreds of years before we started drilling for oil. I doubt you can make useful solar cells, but you can make useful windmills, rechargeable batteries, light bulbs (incandescent), and motors.

However just the above list needs a large list of industry to pull off. Can you make a wire? What about a ball bearing - they are made by the millions of insane levels of precision and are cheap. All those little details are why you can't pull it off. Sure if given all the parts you can pull off the next step, but there are so many steps you can't do it.

zamubafoo
1 replies
1h1m

I've thought a lot about these problems and you eventually hit the need for stronger than natural magnets. Without electricity it's a hard challenge, but without magnets creating electricity in a simple bench scale is a lot harder.

I ended up thinking that you'd need to do a chemical battery to bootstrap electricity and then with electricity generate the electromagnet to create stronger magnets and then iterate from there.

Your next stumbling block from there would be optics as everything else can be made with horrible tolerances. Even lathes and similar machinery can be made with pretty good tolerances without optics. But when you start needing time keeping or miniaturizing components for improved efficiencies, it becomes a blocking issue.

You also need to discover photo-reactive elements to do lithography, but that's a lot easier since it's just silver nitrate and you'd already have the components when you are working towards the initiate bootstrap battery.

fragmede
0 replies
36m

would you need to rediscover the table of elements and atomic theory in your version of things? There's a lot of a scientific learning we take for granted that is actually important when building a new civilization from scratch.

namibj
0 replies
4h3m

If you include Coppicing for charcoal and building wood, along with modern knowledge, it should be possible to go straight to wind power and rush solar.

znpy
0 replies
6h47m

I’m not sure the actual problem would be bootstrapping, i think the main problems (not sure in which order) would be: discovery (how do you know who has the necessary skill?), logistics (how do we get all the people in the same place for them to work together and how do we extract and transport the necessary resources in such place?) and ultimately time (how do we do a minimal technological bootstrap before the people currently holding knowledge die before of old age?).

underbooter
2 replies
12h14m

And who makes people?

dailykoder
1 replies
12h3m

Storks

itishappy
0 replies
4h46m

Crass joke time!

Little Timmy came into his parents room one afternoon and said "mommy, daddy, where do babies come from?"

His parents were surprised, he's a little young for that, so that sat him down and explain gently "when two people love each other very much, sometimes, a stork flies in carrying a baby wrapped in blankets in it's bill, and it leaves the baby on the new parents doorstep!"

Little Timmy scrunches up his face, confused, then asks "well then who fucks the stork?"

ch33zer
2 replies
19h0m

I used to work at a company that built data centers. They were trying to get their software to appoint that you could turn up an entire data center from a laptop. Why? So that you could work with European companies and prove to regulators that there were no backdoors. It was a fascinating problem but very difficult. My team was only tangentially involved but we did some work to forward our data to a proxy that ensured that all our data was auditable and not sending stuff it shouldn't. I left before it finished but I heard it was scrapped as too difficult.

technofiend
0 replies
14h46m

Anecdotally I've used software that was capable of it if your hardware could be netbooted, preferably with pxe/ipxe. I used rackn and there's other vendors like maas with purportedly the same abilities.

RackN is good enough it'll let you build virtualization on top of bare metal and then keep going up the stack: building VMs, kubernetes, whatever. You just set up rules for pools, turn on dhcp and let auto discovered equipment take on roles based on the rules you set. Easy to do although I wouldn't envy anyone building a competitor from scratch.

justahuman74
0 replies
39m

There's a few prerequisites that make this all very realistic if the time is put in

* An LTE remote access box connected to a few switches management ports so you can configure the switches yourself

* Ensuring that the vendors pre-cable the racks and provide port-mapping data

* Ensuring that the vendors set the machines to PXE boot

* Ensuring the vendors double-check the list of MAC addresses of the HW provided for both in-band and oob

akira2501
2 replies
20h16m

Then you look at the assembly for the old Cray-1 computers (octal opcodes) and the IBM System/360 computers (word opcodes), and you realize, they made it so amazingly simple you can mostly just write the opcode bytes and assemble by hand if you like.

Then x86 came along, without the giant budgets or the big purchasers, and so they made that assembly as efficient and densely packed as is possible; unfortunately, you lose what you might otherwise conveniently have on other machines.

bluGill
0 replies
18h38m

x86 is the same if you stick to the origional 4bit subset. However it has been extended so many times that you can't find the nice parts.

ScottBurson
0 replies
16h50m

I've read somewhere that Seymour Cray used to write his entire operating system in absolute octal. ("Absolute" means no relocation; all memory accesses and jumps must be hand-targeted to the correct address, as they would have to be with no assembler involved.)

quuxplusone
1 replies
16h19m

The same bootstrapping problem exists for everything. What makes roads? Construction equipment. How do you get that construction equipment to the job site, without a road already being there?

I actually met a person a few months ago who worked for a startup doing delivery/fulfillment of materials for construction projects. They pointed out that this requires special expertise beyond, say, Amazon, not only because these materials tend to have unusual and/or dangerous physical properties, but also because the delivery addresses tend to be... well, they tend not to have addresses yet! This is all solvable (apparently), but only with expertise beyond the usual for delivery companies in the modern age.

fragmede
0 replies
32m

fascinating! I suppose our normal modern systems aren't equipped to handle descriptive addresses - "take a right after the foo store and then go to the end of the road and give the equipment to the people at the end of the road so they can make more road"

grishka
1 replies
17h19m

Lithography masks for early integrated circuits were drawn by hand iirc.

sekuntul
0 replies
19h47m

yep

ekimekim
0 replies
19h29m

This is one of the coolest things about these kinds of bootstrapping projects + reproducible builds IMO. One could imagine creating an incredibly simple computer directly out of discrete components. It would be big, inefficient and slow as molasses, but it could in theory conform to instruction set architecture, and you could use it to build these bootstrap programs, and you could then assert that you get the same result on your fully-understood bad computer as you get on not-fully-trusted modern hardware.

Someone
40 replies
20h18m

If I were to try bootstrapping rust, I think I would write a proto-rust in C that has fewer features than full rust, and then write a full rust compiler in proto-rust.

‘proto-rust’ might, for example, not have a borrow checker, may have limited or no macro support, may never free memory (freeing memory isn’t strictly needed in a compiler whose only goal in life is to compile a better compiler), and definitely need not create good code.

That proto-rust would basically be C with rust syntax, but for rust aficionados, I think that’s better than writing a rust compiler in “C with C syntax” that this project aims for.

Anybody know why this path wasn’t taken?

TwentyPosts
31 replies
20h1m

So now you're writing two compilers.

What did you actually gain from this, outside of more work?

returningfory2
26 replies
19h54m

Writing a small compiler in C and a big compiler in Rust is simpler than writing a big compiler in C.

antirez
14 replies
18h41m

Writing programs in Rust is not simpler then writing programs in C.

pornel
2 replies
4h24m

Rust feels impossible to use until you "get" it. It eventually changes from fighting the borrow checker to a disbelief how you used to write programs without the assurances it gives.

And once you get past fighting the borrow checker it's a very productive language, with the standard containers and iterators you can get a lot done with high level code that looks more like Python than C.

antirez
1 replies
4h18m

I agree but it's not different than C with a decent library of data structures. And even when you become more borrow checker aware and able to anticipate most of the issues, still there are cases where the solution is either non obvious or requires doing things in indirect ways compared to C or C++.

pornel
0 replies
50m

The quality difference between generics and proc macros vs the hoops C jumps through instead is pretty significant. The way you solve this in C is also unobvious, but doesn't seem like it when you have a lot of C experience.

I've been programming in C for 20 years, and didn't realize how much of using it productively wasn't a skilful craft, but busywork that doesn't need to exist.

This may sound harsh, but sensitivity to order definition, and the fragility of headers combined with a global namespace is just a waste of time. These aren't problems worth caring about.

Every function having its own idea of error handling is also nuts. Having to be diligent about error checking and cleanup is not a point of pride, but a compiler deficiency.

Maintenance of build scripts is not only an unnecessary effort, but it makes everything downstream of them worse. I can literally not have build scripts at all, and be able to work on projects bigger than ever. I can open a large project, with an outrageous number of dependencies, and have it build on the first try, integrate with IDEs, generate API docs, run unit tests out of the box. Usually works on Windows too, because the POSIX vs Windows schism can be fixed with a good standard library and cross-platform dependency management.

Multi-threading can be the default standard for every function (automatically verified through the entire call graph including 3rd party code), and not an adventurous novelty.

adwn
2 replies
10h41m

Writing non-trivial programs is easier in Rust than in C, for people that are equally proficient in C as in Rust. Especially if you're allowed to use Cargo and the Rust crates ecosystem.

C isn't even in the same league as Rust when it comes to productivity – again, if you're equally proficient in Rust as in C.

uecker
0 replies
3h2m

This does not match my experience.

FullyFunctional
0 replies
2h54m

I have 40 years of C muscle memory and it took me many tries and a real investment to get into Rust, but I don’t do any C anymore (even for maintenance- I’d rather rewrite it in Rust first).

Rust isn’t in a difference class from C, it’s a different universe!

GuB-42
1 replies
9h17m

You have to consider that those who write the Rust compiler are experts in Rust, but not necessarily experts in C. So even if writing programs in C may be simpler than in writing programs in Rust for some developers, the opposite is more likely in this case, even before we compare the merits of the respective languages.

markusde
0 replies
3h47m

This is 100% the case. All of the honest-to-god Rust experts I know work on the compiler in some way. Same goes for Lean, which bootstraps from C as well.

umanwizard
0 replies
4h1m

Yes it is, why would anyone use it otherwise?

tux3
0 replies
18h23m

For compilers specifically, I think plenty of people would disagree.

It's not that it's exceedingly hard in C, but programming languages have evolved in the last millenium, and there are indeed language features that make writing compilers easier than it used to be

I have the most fun when I write x86 MASM assembly. It's a pretty simple language all in all, even with the macro system. Much simpler than C.

But a simple language doesn't always make it simple to write complex programs like compilers.

tialaramex
0 replies
7h48m

This is probably true if you assume it doesn't matter whether the program is correct.

mananaysiempre
0 replies
4h35m

It is really remarkably sucky to process trees without algebraic datatypes and full pattern matching. Most of your options for that are ML progeny, and the rest are mostly Lisps with a pattern-matching macro. While it’s definitely possible to implement, say, unification in C, I wouldn’t want to—and I happen to actually like C.

Given the task is to bootstrap Rust, a Rust subset is a reasonable and pragmatic choice if not literally the only one (Mes, a Lisp, could also work and is already part of the bootstrappable ecosystem).

kstrauser
0 replies
15h45m

Writing programs that compile is much easier in C. It lets me accidentally do all sorts of ill-advised things that the Rust compiler will correctly yell at me about.

I don't remember it being any easier to write C that passes through a static analyzer like Coverity etc. than it is to write Rust. Think of rustc like a built-in static analyzer that won't let you ignore it. Sometimes that means it's harder to sneak bad ideas past the compiler.

ZoomZoomZoom
0 replies
18h20m

Sure, for you it isn't. It is for me. Especially if we're talking "working roughly as intended" programs.

remram
10 replies
18h38m

But writing a Rust compiler in Rust is already done.

wyager
4 replies
18h23m

How do you compile that on a new platform?

projektfu
1 replies
17h40m

One way would be to have an intermediate target that is easily recompiled or run on any hardware.

https://ziglang.org/news/goodbye-cpp/

maxdamantus
0 replies
17h11m

But that doesn't conform to the "Descent Principle" described in the article.

I haven't really been following Zig, but I still felt slightly disappointed when I learnt that they were just replacing a source-based bootstrapping compiler with a binary blob that someone generated and added to the source tree.

The thing that makes me uncomfortable with that approach is that if a certain kind of bug (or virus! [0]) is found in the compiler, it's possible that you have to fix the bug in multiple versions to rebootstrap, in case the bug (or virus!) manages to persist itself into the compilation output. The Dozer article talks about the eventual goal of removing all generated files from the rustc source tree, ie undoing what Zig recently decided to do.

If everything is reliably built from source, you can just fix any bugs by editing the current source files.

[0] https://wiki.c2.com/?TheKenThompsonHack

remram
0 replies
15h17m

Cross-compilation. There is no requirement of being able to run the compiler on the platform to compile for that platform.

It is much easier to add support for a platform to the compiler backend than to write a new, full compiler with its own, new bootstrapping method.

macjohnmcc
0 replies
17h41m

Cross compilation.

kelnos
3 replies
8h40m

Sure, but the small compiler that you write in C can't compile rustc. So you write a new Rust compiler that uses much simpler Rust that the small compiler in C can compile. Then that new Rust compiler can compile rustc.

And since that new Rust compiler might not have much of an optimizer (if it even has one at all), then you recompile rustc with your just-compiled rustc.

remram
2 replies
4h22m

No that makes no sense to me. Or are we pretending cross-compilation doesn't exist?

returningfory2
0 replies
1h46m

The post is about solving a specific same-architecture bootstrapping problem. Cross-compilation is irrelevant to this discussion.

jdbdbebe
0 replies
1h21m

There are two kinds of bootstrapping:

* Bootstrapping a language on a new architecture using an existing architecture. With modern compilers this is usually done using cross compilation * Bootstrapping a language on an architecture _without_ using another architecture

The latter is mostly done for theoretical purposes like reproducibility like reflections on trusting trust

bastawhiz
0 replies
3h48m

Paring down a rust compiler in rust to only use a subset of rust features might not be a big lift. Then you only need to build a rust compiler (in C) that implements the features used by the pared-down rust compiler rather than the full language.

Pypy, for instance, implements RPython, which is a valid subset of Python. The compiler is written in RPython. The compiler code is limited on features, but it only needs to implement what RPython includes.

ok123456
0 replies
19h4m

You often write two compilers when trying to bootstrap a C compiler, as GCC used to do. Often, it's a very simple version of the language implemented in the architecture's assembly.

Someone
0 replies
8h24m

Even if it is a bit more work:

- you can write the bulk of your code in a language you prefer over C

- you end up with a self-hosting rust compiler

Etheryte
0 replies
19h56m

Two simpler pieces of work as opposed to one complex one. Even if the two parts might be more volume, they're both easier to write and debug.

umanwizard
4 replies
19h2m

FWIW mrustc, the existing state of the art non-rust rust compiler, already doesn’t have a borrow checker.

Removing the borrow checker doesn’t break any correct programs — it just makes it so a huge amount of incorrect programs can be compiled. This is fine, since we mainly want to use mrustc to compile rustc, and we already know rustc can compile itself with no borrow checker errors.

1vuio0pswjnm7
1 replies
14h54m

"Removing the borrow checker doesn't break any correct programs - it just makes it so a huge amount of incorrect programs can be compiled."

Not a user of Rust programs myself but am curious how users determine whether a Rust binary was compiled with mrustc or rustc.

umanwizard
0 replies
13h48m

You can assume that unless you have some specific information to the contrary, any Rust binary you encounter in real life was built with rustc. mrustc is not used for any mainstream purpose other than in the bootstrap chain of rustc in distros like Guix that care about reproducibility, and even then, the build of rustc they distribute to users will be re-built from the bootstrapped rustc, it won’t be one compiled by mrustc directly.

mikepurvis
0 replies
17h27m

And once you have yourself bootstrapped, you can presumably turn around and compile the compiler again, now with borrow-checking and optimizations.

In the very special case of proto-rust bootstrapping, the cost of not having borrow-checking can be paid back basically right away.

astrodust
0 replies
4h4m

It's an interesting point of flex here: Your compiler doesn't have to be feature complete, it just has to be able to build a more feature complete binary.

sjrd
2 replies
20h6m

This is what we did for Mozart/Oz [1]. We have a compiler for "proto-Oz" written in Scala. We use it to compile the real compiler, which is written in Oz. Since the Scala compiler produces inefficient code, we then recompile the real compiler with itself. This way we finally have an efficient real compiler producing good code. This is all part of the standard build of the language.

[1] https://github.com/mozart/mozart2

pabs3
1 replies
12h51m

Is it possible to bootstrap Scala?

nitwit005
13 replies
20h32m

I'm not sure I see the point. To generate functional new binaries on the target machine, rustc will need to support the target. If you add that support to rustc, you can just have it build itself.

jeffparsons
12 replies
20h7m

It's about having a shorter auditable bootstrap process much more than it is about supporting new architectures.

bawolff
5 replies
19h47m

Regardless, the process is so long that it seems inauditable in practise.

Like i guess i can see the appeal of starting from nothing as a kind of cool achievement, but i dont think it helps with auditing code.

codetrotter
4 replies
19h27m

But with the Rust compiler in C the audit path would be much shorter it sounds like, and therefore be more auditable.

Plus OP also wrote in the post that a goal was to be able to bootstrap to Rust without first having to bootstrap to C++, so that other things can be written in Rust earlier on in the process. That could mean more of the foundation of everything being bootstrapped being written in Rust, instead of in C or C++.

bawolff
3 replies
18h27m

What good is being slightly shorter if it us still nowhere remotely close to practical?

Its kind of like saying 100 years is a lot shorter than 200 years. It might be true, but if all the time you have to dedicate is a few hours it really doesnt matter.

jeffparsons
1 replies
16h16m

It doesn't need to be _perfectly_ auditable to be worthwhile — it just needs to be more auditable than the alternatives available today.

dwattttt
0 replies
7h22m

I dunno about that; suppose dozer completes its goal, and 1 year later you want to audit the bootstrap chain. Latest Rust probably won't be able to be compiled by it, so you now need to audit what, 6 months of changes to the Rust language? How many months is short enough to handle?

If dozer _does_ keep getting maintained, the situation isn't exactly better either: you instead have to audit the work dozer did to support those 6 months of Rust changes.

mkesper
0 replies
10h36m

It will be hex editor -> assembler -> tinycc -> dozer -> latest rust so should absolutely be doable or am I missing something?

dathery
4 replies
19h45m

Not dismissing the usefulness of the project at all, but curious what the concrete benefits of that are -- is it mainly to have a smaller, more auditable bootstrap process to make it easier to avoid "Reflections on Trusting Trust"-type attacks?

It seems like you'd need to trust a C compiler anyway, but I guess the idea is that there are a lot of small C compiler designs that are fairly easy to port?

teo_zero
1 replies
18h47m

It seems like you'd need to trust a C compiler anyway, but I guess the idea is that there are a lot of small C compiler designs that are fairly easy to port?

Sorry but TFA explains it very well how to go from nothing to TinyCC. The author's effort now is to go from TinyCC to Rust.

dathery
0 replies
18h32m

Right, but I was trying to understand the author's motivation, and this was me handwaving about if it could be about compiler trust. The article discusses bootstrapping but not explicitly why the author cares—is it just a fun exercise (they do mention fascination)? Are they using an obscure architecture where there is no OCaml compiler and so they need the short bootstrap chain? _Is_ it about compiler trust?

(Again since it can come off wrong in text, this was just pure curiosity about the project, not dismissiveness.)

johnklos
1 replies
17h49m

Let me make a small example that may illustrate the issue.

You can download the NetBSD source tree and compile it with any reasonable c compiler, whether you're running some sort of BSD, macOS or Linux. Some OSes have much older gcc (Red Hat, for instance), some have modern gcc, some have llvm. The source tree first compiles a compiler, which then compiles NetBSD. It's an automatic, easy to understand, easy to audit, two step process that's really nice and clean.

With rust, if you want to compile current rust, you need a pretty modern, up to date rust. You can usually use the last few versions, but you certainly can't use a version of rust that's even a year old. This, to some of us, is ridiculous - the language shouldn't change so much so quickly that something that was brand new a year ago literally can't be used today to compile something current.

If you really want to bootstrap rust from c, you'd have to start with rust from many years ago, compile it, then use it to compile newer rust, then use that to compile even newer rust, perhaps a half a dozen times until you get to today's rust. Again, this is really silly.

There are many of us who'd like to see rust be more directly usable and less dependent on a chain of compilers six levels deep.

quectophoton
0 replies
8h57m

perhaps a half a dozen times until you get to today's rust.

Perhaps? It was already more than that in 2018: https://guix.gnu.org/blog/2018/bootstrapping-rust/

That was back in 2018. Today mrustc can bootstrap rustc 1.54.0, but current rustc version is 1.80.1. So if the amount of steps still scales similarly, then today we're probably looking at ~26 rustc compilations to get to current version.

And please read that while keeping in mind how Rust compilation times are.

quectophoton
0 replies
9h4m

It's about having a shorter auditable bootstrap process

Yeah, in 2018 the chain looked like this[1]:

    g++ -> mrustc@0.8.0 -> rust@1.19.0 -> rust@1.20.0 -> rust@1.21.0 -> rust@1.22.1 -> rust@1.23.0 -> rust@1.24.1 -> rust@1.25.0 -> rust@1.26.2 -> rust@1.27.2 -> rust@1.28.0
Though for me it's less the auditable part, and more that I would be able to build the compiler myself if I wanted, without jumping through so many unnecessary hoops. For the same reason I like having the source code of programs I use, even if most of the time I just use my package manager's signed executable.

And if someone open sources their program, but then the build process is a deliberately convoluted process, then to me that starts to smell like malicious compliance ("it's technically open source"). It's still a gift since I'd get the code either way, so I appreciate that, but my opinion would obviously be different between someone who gives freedoms to users in a seemingly-reluctant way vs someone who gives freedoms to users in an encouraging way.

[1]: https://guix.gnu.org/blog/2018/bootstrapping-rust/

perching_aix
11 replies
18h20m

I'm a bit confused.

It's a bit difficult to dissect, but long story short, in the middle of the post the author finally provides the reason for them embarking on the journey mentioned in the title:

The main issue (...) is that, by the time C++ is introduced into the bootstrap chain, the bootstrap is basically over. So if you wanted to use Rust at any point before C++ is introduced, you’re out of luck. So, for me, it would be really nice if there was a Rust compiler that could be bootstrapped from C. Specifically, a Rust compiler that can be bootstrapped from TinyCC, while assuming that there are no tools on the system yet that could be potentially useful.

However, this contradicts the premise they lay out earlier in the post:

Every new version of rustc was compiled with the previous version of rustc. So rustc version 1.80.0 was compiled with rustc version 1.79.0. Which was, in turn, compiled with rustc version 1.78.0. And so on and so forth, all the way back to version 0.7 if the compiler. At that point, the compiler was written in OCaml. So all you needed was an OCaml compiler to get a fully functioning rustc program. (...) There is a project that can *successfully* compile the OCaml compiler using Guile, which is one of the many variants of Scheme, which is one of many variants of Lisp. Not to mention, Guile’s interpreter is written in C.

The contradiction of course is that then there is a path that is without C++ like they want it to, it's just not the one that the rustc team uses day-to-day. The author even claims that it actually works (see the emphasis I placed).

So I'm ultimately not entirely sure about the motivation here. Is the goal to create a nicer C based bootstrapping process? Is the goal to do that and have that eventually become the day-to-day way rustc is bootstrapped? Why does the author want to get rid of the C++ stage? Why does the author prefer to have a C stage?

The only thing that's clear then is that the author just wants to do this period, and that's fine. But otherwise, even after reading through their fairly lengthy post, I'm none the wiser.

ludocode
10 replies
17h58m

While it is technically possible to bootstrap Rust from Guile and the 0.7 Rust compiler, you would need to recompile the Rust compiler about a hundred times. Each step takes hours, and you can't skip any steps because, like he said, 1.80 requires 1.79, 1.79 requires 1.78, and so on all the way back to 0.7. Even if fully automated, this bootstrap would take months.

Moreover, I believe the earlier versions of rustc only output LLVM, so you need to bootstrap a C++ compiler to compile LLVM anyway. If you have a C++ compiler, you might as well compile mrustc. Currently, mrustc only supports rustc 1.54, so you'd still have to compile through some 35 versions of it.

None of this is practical. The goal of Dozer (this project) is to be able to bootstrap a small C compiler, compile Dozer, and use it to directly compile the latest rustc. This gives you Rust right away without having to bootstrap C++ or anything else in between.

0x0203
6 replies
16h59m

This is accurate. I'm an OS/kernel developer and a colleague was given the task of porting rust to our OS. If I remember correctly, it did indeed take months. I don't think mrustc was an option at the time for reasons I don't recall, so he did indeed have to go all the way back to the very early versions and work his way through nearly all the intermediate versions. I had to do a similar thing porting java, although that wasn't quite as annoying as porting rust. I really do wish more language developers would provide a more practical way of bootstrapping their compilers like the article is describing/attempting. I've seen some that do a really good job. Others seem to assume only *nix and Windows exist, which has been pretty frustrating.

kragen
2 replies
16h5m

that's interesting! what kind of os did you write? it sounds like you didn't think supporting the linux system call interface was a good idea, or perhaps even feasible?

0x0203
1 replies
14h21m

It's got a fairly linux like ABI, though we don't aim or care to be 1-1 compatible, and it has/requires our own custom interfaces. Porting most software that was written for linux is usually pretty easy. But we can't just run binaries compiled for linux on our stuff. So for languages that require a compiler written in its own language where they don't supply cross compilers or boot strapping compilers built with the lowest common denominator (usually c or c++), things can get a little trickier.

kragen
0 replies
3h3m

interesting! what applications are you writing it for?

elcritch
1 replies
13h42m

Nim uses a smaller bootstrap compiler that uses pre-generated C code to then build the compiler proper. It's pretty nifty for porting.

pabs3
0 replies
12h45m

The article mentions that the Bootstrappable Builds folks don't allow pre-generated code in their processes, they always have to build or bootstrap it from the real source.

ben-schaaf
0 replies
12h43m

I'm curious as to why you need to bootstrap at all? Why not start with adding the OS/kernel as a target for cross-compilation and then cross-compile the compiler?

perching_aix
1 replies
17h48m

Moreover, I believe the earlier versions of rustc only output LLVM, so you need to bootstrap a C++ compiler to compile LLVM anyway. If you have a C++ compiler, you might as well compile mrustc. Currently, mrustc only supports rustc 1.54, so you'd still have to compile through some 35 versions of it.

Not sure I follow - isn't rustc still only a compiler frontend to LLVM, like clang is for C/C++? So if you have any version of rustc, haven't you at that point kind of "arrived" and started bootstrapping it on itself, meaning mission complete?

Ultimately from what I glean the answer really is just that this would be made nicer with Dozer, but I still wish this was explicitly stated by the author in the post. It's not like the drudgery of the ocaml route escapes me.

mbrubeck
0 replies
16h47m

Not sure I follow - isn't rustc still only a compiler frontend to LLVM, like clang is for C/C++?

The rustc source tree currently includes LLVM, GCC, and Cranelift backends: https://github.com/rust-lang/rust/blob/c6db1ca3c93ad69692a4c...

(Cranelift itself is written in Rust.)

umanwizard
0 replies
3h8m

Building rustc doesn't take hours on a modern machine. Building it 100 times would take on the order of a day, not months.

Moreover, I believe the earlier versions of rustc only output LLVM, so you need to bootstrap a C++ compiler to compile LLVM anyway.

This is a more legit point.

nijaar
8 replies
20h23m

if this works would this make the rust compiler considerably smaller / faster?

josephg
4 replies
20h16m

Smaller? Yes. Faster? Almost certainly not.

It really doesn't make sense to optimize anything in a bootstrapping compiler. Usually the only code that will ever be compiled by this compiler will be rustc itself. And rustc doesn't need to run fast - just fast enough to recompile itself. So, the output also probably won't have any optimisations applied either.

nijaar
3 replies
14h59m

if it is smaller, doesn't it mean that it has less code to execute hence should it be faster? Trying to understand better -- this is something completely new for me

josephg
1 replies
6h35m

Why would a program run faster just because it’s smaller?

FullyFunctional
0 replies
2h42m

Example: this is a small program

int main() { for(;;); }

duped
0 replies
14h25m

Not necessarily, in fact one of the most important optimizations for compilers is inlining code (copy-pasting function bodies into call sites) which results in more code being generated (more space) but faster wallclock times (more speed). Most optimizations tradeoff size for speed in some way, and compilers have flags to control it (eg -Os vs -O3 tells most C compilers to optimize for size instead of speed).

Where optimizing for size is optimizing for speed is when it's faster (in terms of wall clock time) for a program to compute data than to read it from memory, disk, i/o etc, because i/o bandwidth is generally much slower than execution bandwidth. That means the processor does more work, but it takes less time because it's not waiting for data to load through the cache or memory.

kelnos
2 replies
8h25m

No, this won't change rustc at all. The purpose of this project is to be able to bootstrap a current version of rustc without having to do hundreds of intermediate compilations to go from TinyCC -> Guile -> OCaml -> Rust 0.7 -> ...... Rust current. (Or bootstrap a C++ compiler to be able to build mrustc, which can compile Rust 1.56, which will give you Rust current after another 25 or so compilations.)

Ultimately the final rustc you get will be more or less identical to the one built and distributed through rustup.

nequo
1 replies
4h53m

will be more or less identical

What could cause differences between the bootstrapped rustc and rustup’s?

comex
0 replies
54m

In theory there shouldn’t be any. The official Rust builds, I believe, have one level of bootstrapping: the previous official build is used to build an intermediate build of the current version, which is then used to build itself. So the distributed binaries are the current version built with the current version. A longer from-source bootstrapping process should also end by building the current version with the current version, and that should lead to a bit-for-bit identical result.

In practice, you’ll have to make sure the build configuration, the toolchain components not part of rustc itself (e.g. linker), and the target sysroot (e.g. glibc .so file) are close or identical to what the official builds are using. Also, while rustc is supposed to be reproducible, and thus free of the other usual issues with reproducible builds (like paths or timestamps being embedded into the build), there might be bugs. And I’m not sure if reproducibility requires any special options which the official builders might not be passing. Hopefully not.

See also: https://github.com/rust-lang/rust/issues/75362

jhatemyjob
8 replies
19h53m

This is why the aforementioned ABI (of the latter language in the title of this post) won't die for a long time. The name of the game is compatibility, not performance/security. Bell Labs was first.

Almondsetat
5 replies
19h26m

The C ABI won't die because it has a stranglehold on *NIX. Every new language you make has to conform in some way to C in order to use syscalls.

dpassens
4 replies
10h38m

That is not true on Linux, where you can just make syscalls yourself. They don't even use the C ABI, even if it's pretty similar (For syscalls, the fourth argument is passed in R10 instead of RCX, since that holds the return address for sysret).

Almondsetat
3 replies
8h26m

You can call a syscall with assembly, but the result it gives you follows C's formats. Maybe your language does integers in a different way so you still have to abide to it's standard to adapt what the OS gives you

dpassens
2 replies
7h48m

I'd argue that it follows the CPU's native integer representation, which C also does. Yes, if your language uses tagged integers or something, you'll have to marshal the syscall arguments/results from/to your own representation, but the same is true if you want to use those integers for arithmetic (beyond trivial additions or multiplying/dividing by a power of two, for which you can use lea).

Almondsetat
1 replies
7h27m

Mine was not a critique. Of course every OS needs to be programmed with a language and its syscalls will be formatted accordingly. And if you want to program using an OS's features, other than the compilation to assembly, you also have to worry about inter-operating with what the OS provides. I'm simply noting that for the foreseeable future, C's way of doing things will always have to be kept in mind when writing dev tools

dpassens
0 replies
7h13m

Sure, that makes sense. Out of curiosity, do you know of any way to design a syscall ABI that's not C-like that was either used in the past or would have some advantages if a new OS adopted it? I imagine that lisp machines did things differently, but a) I don't know whether they had syscalls as such or simply offered kernel services as regular functions and b) they did have microcode support for tagged integers and objects.

I'm asking since I want to get into (hobbyist) OS development at some point and would love to know if there's a better way to do syscalls.

cozzyd
1 replies
19h46m

Yes, I think rust made a big mistake for not going for a stable (or at least mostly stable like C++) ABI (other than the C one). The "staticly link everything" is fine for desktops and servers, but not for e.g. embedded Linux applications with limited storage. It's too bad because things like routers are some of the most security sensitive devices.

JoshTriplett
0 replies
19h30m

or at least mostly stable like C++

The C++ ABI doesn't solve generics (templates); C++ templates live in header files, as source code, and get monomorphized into code that's embedded in the binary that includes that header. The resulting monomorphized code is effectively statically linked, and any upgraded version of a shared library has to deal with old versions of template code from header files, or risk weird breakage.

Swift has a more realistic solution to this problem: polymorphic interfaces (the equivalent of Rust "dyn"). That's something we're taking inspiration from in the design of future Rust stable ABIs.

but not for e.g. embedded Linux applications with limited storage

Storage is often not the limiting factor for anything running embedded Linux (as opposed to something much smaller).

The primary point in favor of shared libraries is to aid in the logistics of upgrading dependencies for security. It's possible to solve that in other ways, though.

IshKebab
8 replies
20h37m

For bootstrapping it still feels weird to target C. You could easily target a higher level language or just invent a better language. You don't care about runtime performance. Feels like you don't really gain that much by forcing yourself to jump through the C hoop, and the cost of having to write an entire compiler in C is huge.

Like, how hard would it be to go via Java instead? I bet you can bootstrap to Java very easily.

ronsor
4 replies
20h33m

Every platform, for better or worse, gets a C compiler first. Targeting C is the most practical option.

cozzyd
3 replies
19h57m

Right, but once you have C it's fairly straightforward to use an interpreted language implemented in C (python, perl, guile, lua, whatever).

Obviously such a compiler would likely be unusably slow, but that's not important here.

trueismywork
2 replies
19h53m

You overestimate the comprehensiveness of C standard with half the things being optional. It's not given that python will compile on a minimal comforming C compiler.

cozzyd
1 replies
19h44m

True, but Lua probably will :)

ronsor
0 replies
19h10m

Lua definitely will compile on an ANSI C compiler, without POSIX or Win32 extensions.

fsckboy
1 replies
19h58m

feels weird to target C

he's not targeting C, he's targeting rust; he's using C

it's an important distinction, because he's not writing the C compilers involved, he's leveraging them to compile his target rust compiler which will be used to compile a rust-on-rust compiler. The C compiler is the compiler he has available, any other solution he would have to write that compiler, but his target is rust.

IshKebab
0 replies
10h48m

Targeting C as the language to write his Rust compiler in. You knew that.

syntheticnature
0 replies
20h2m

I'd expect it to be harder. I used to work on a large embedded device that ran some Java code, and there was a specialist vendor providing Java for the offbeat processor platform.

After a little digging, I found a blog post about it, and it does sound denser than the poster's plans to bootstrap Rust: https://www.chainguard.dev/unchained/fully-bootstrapping-jav...

iTokio
6 replies
20h37m

It’s a huge project, I wonder if it wouldn’t be simpler to try to compile cranelift or mrustc to wasm (that’s still quite difficult) then use wasm2c to get a bootstrap compiler.

umanwizard
4 replies
20h32m

The resulting C would not be “source code”.

Edit to explain further: the point is for the code to be written (or at least auditable) by humans.

ncruces
3 replies
18h58m

As long both rust-to-wasm (or zig-to-wasm) and wasm2c are auditable, and every step reproducible, why do you need the generated C to be auditable?

umanwizard
0 replies
18h47m

The point is to shorten the minimal bootstrap path to Rust.

With your suggestion you can't use rust until after you already have rust-to-wasm transpiler available (which almost certainly itself already requires rust, so you are back where you started).

pabs3
0 replies
12h42m

The article mentions that the Bootstrappable Builds folks don't allow pre-generated code in their processes, they always have to build or bootstrap it from the real source.

ludocode
0 replies
17h9m

The generated C code could contain a backdoor. Generated C is not really auditable so there would be no way to tell that the code is compromised.

samsartor
2 replies
18h22m

The community has been very supportive of the gccrs (https://github.com/Rust-GCC/gccrs) project, which is the main project to write a Rust compiler written in C.

jenadine
0 replies
14h41m

It's in C++, not C

Narishma
0 replies
18h8m

I wouldn't say very supportive at all. It often gets bashed whenever some news about it is posted on r/rust for example.

mustache_kimono
0 replies
18h36m

Remembered this article... https://drewdevault.com/2019/03/25/Rust-is-not-a-good-C-repl...

Remembering Drew Devault is the Fox News of programming bloggers. He exhibits the same sort of bad faith obtuseness, and knee-jerk neck beard tech conservatism, that makes me/many want to scream.

First, his thesis is risible. "Rust is not a good C replacement". Note, Drew does not mean replace C code with Rust code, but Rust, the language, literally replacing C, the language. Ignoring, perhaps, Rust doesn't want to "replace" C, because we have C!

Next, see the bulleted text. Upon each topic something interesting might be said re: Rust, but instead they all serve a garbage thesis that Rust can never be the 50 year old language that the tech world is currently built upon. Well, duh.

My least favorite, though, is the final bullet:

Safety. Yes, Rust is more safe. I don’t really care. In light of all of these problems, I’ll take my segfaults and buffer overflows.

And everyone wants to be a cowboy and watch things blow up when they are 8 years old.

Ar-Curunir
0 replies
18h59m

What does that article have to do with this article? The author of the latter article even says that they don’t enjoy writing C, which is kind of the opposite of what your article says

ericyd
5 replies
19h7m

Kind of annoying that I had to follow 4 links just to find a high level justification of the benefits of bootstrapping [0]. I was kinda hoping the "Why" part of this title would address that.

[0] https://bootstrappable.org/benefits.html

ludocode
4 replies
17h36m

It can be difficult to explain why bootstrapping is important. I put a "Why?" section in the README of my own bootstrapping compiler [0] for this reason.

Security is a big reason and it's one the bootstrappable team tend to focus on. In order to avoid the trusting trust problem and other attacks (like the recent xz backdoor), we need to be able to bootstrap everything from pure source code. They go as far as deleting all pre-generated files to ensure that they only rely on things that are hand-written and auditable. So bootstrapping Python for example is pretty complicated because the source contains code generated by Python scripts.

I'm much more interested in the cultural preservation aspect of it. We want to preserve contemporary media for future archaeologists, for example in the Arctic World Archive [1]. Unfortunately it's pointless if they have no way to decode it. So what do we do? We can preserve the specs, but we can't really expect them to implement x265 and everything else they would need from scratch. We can preserve binaries, but then they'd need to either get thousand-year-old hardware running or virtualize a thousand-year-old CPU. We can give them, say, a definition of a simple Lisp, and then give them code that runs on that, but then who's going to implement x265 in a basic Lisp? None of this is really practical.

That's why in my project I made a simple virtual machine, then bootstrapped C on top of it. It's trivially portable, not just to present-day architectures but to future and alien architectures as well. Any future archaeologist or alien civilization could implement the VM in a day, then run the C bootstrap on it, then compile ffmpeg or whatever and decode our media. There are no black boxes here: it's all debuggable, auditable, open, handwritten source code.

[0]: https://github.com/ludocode/onramp?tab=readme-ov-file#why-bo...

[1]: https://en.wikipedia.org/wiki/Arctic_World_Archive

kazinator
2 replies
14h10m

Say you start with nothing but "pure source code".

With what tool do you process that source code?

ludocode
1 replies
13h33m

The minimum tool that bootstrapping projects tend to start with is a hex monitor. That is, a simple-as-possible tool that converts hexadecimal bytes of input into raw bytes in memory, and then jumps to it.

You need some way of getting this hex tool in memory of course. On traditional computers this could be done on front panel switches, but of course modern computers don't have those anymore. You could also imagine it hand-woven into core rope memory for example, which could then be connected directly to the CPU at its boot address. There are many options here; getting the hex tool running is very platform-specific.

Once you have a hex tool, you can then use that to input the next stage, which is written in commented hexadecimal source code. The next tool then adds a few features, and so does the tool after that, and so on, eventually working your way up to assembly and C.

kazinator
0 replies
3h25m

From the point of view of trust and security, bootstrapping has to be something that's easily repeatable by everyone, in a reasonable amount of time and steps, with the same results.

Not to mention using only the current versions of all the deliverables or at most one version back.

ericyd
0 replies
5h45m

Yep, I think this would have been good context in the OP

amelius
5 replies
19h3m

Why not write the compiler in Rust, then compile it to assembly, and then use some disassembler/decompiler to compile that back to portable C?

mighmi
2 replies
19h1m

Wait, dissemblers will turn assembly into any language you want?

bluGill
0 replies
18h35m

Well they try. They tend to get lost on x86 where instructions are not fixed length.

pabs3
0 replies
12h41m

The article mentions that the Bootstrappable Builds folks don't allow pre-generated code in their processes, they always have to build or bootstrap it from the real source.

kelnos
0 replies
8h22m

Because that wouldn't be reasonable to audit. The program that compiles the new Rust compiler, as well as the programs that disassemble and decompile, could insert backdoors or other nefarious behavior into the generated C, in a way that could be difficult to detect.

The "ethos" (for lack of a better word) of these bootstrapping projects requires that everything be written by hand, and in a way that can be auditable.

metadat
4 replies
19h28m

  metadat@zukrfukr:/src/dozer$ \
  wc -l 
  $(find . -name '*.c')
     280 ./src/item.c
     851 ./src/lex.c
     166 ./src/parser.c
     107 ./src/libdozer.c
     103 ./src/resolve.c
     167 ./src/path.c
     219 ./src/traverse.c
      91 ./src/scope.c
     144 ./src/qbe.c
     134 ./src/map.c
    1045 ./src/expr.c
     266 ./src/nhad.c
     349 ./src/emit.c
      92 ./src/main.c
     231 ./src/type.c
     223 ./src/pattern.c
      97 ./src/typemap.c
     224 ./src/token.c
     148 ./src/util.c
     141 ./src/stmt.c
    5078 total
5kloc is pretty light for a `rustc', where are the tests showing what aspects of the grammar ar supported so far in @notgull's crowning achievement ? The article might be longer than the source code, which would be extremely impressive if the thing actually worked :)

I was not able to compile tokio with dozer.

For comparison, turn towards the other major lang HN submission today: a Golang compiler written in PHP; It comes with extensive tests showing what works and what does not. Somehow even the goroutines are working.. in PHP.

Golang interpreter written in PHP - https://github.com/tuqqu/go-php - https://news.ycombinator.com/item?id=41339818

Godspeed.

remram
1 replies
18h37m

The goroutines are working in that they execute, they are not concurrent.

metadat
0 replies
18h22m

It's only a `#include <pthread.h>' away. *grin*

umanwizard
0 replies
19h4m

The project isn’t complete. It can only build trivial examples, definitely not something like tokio.

csb6
0 replies
16h55m

From the article:

But so far, I have the lexer done, as well as a sizable part of the parser. Macro/module expansion is something I’m putting off as long as possible, typechecking only supports i32, and codegen is a little bit rough. But it’s a start.

So it is currently nowhere near complete (and the author never claims otherwise).

cranky908canuck
4 replies
19h30m

<mischief> Maybe the bootstrap process should use FORTH as part of the toolchain? </mischief>

Not mischief: I'd probably look at that option if I was taking this on.

endgame
2 replies
17h17m

From one of the guys heavily involved in all this bootstrapping stuff:

https://lobste.rs/s/fybdug/pulling_linux_up_by_its_bootstrap...

The answer to the question about FORTH is:

well we bootstrapped multiple FORTHs; no one actually was willing to actually do the bootstrapping steps in FORTH besides Virgil Dupras who did collapseOS and duskOS. (Which unfortunately neither currently have a path to GCC or Linux)
blacksqr
1 replies
13h39m

The ultimate answer given later in the above-linked comment is that bootstrapping with FORTH is a great idea but programming in FORTH isn't fun enough to follow up on the notion.

ropejumper
0 replies
10h19m

Bootstrapping with forth is a GREAT idea. I think it's one of the best languages to use for bootstrapping.

The reason is simple: forth can be almost the first thing in the chain, and it's so flexible that most of the rest of the bootstrap can be done by simply building up forth definitions.

The way the bootstrap chain generally builds up the level of abstraction is by compiling a somewhat more general language, multiple times, until you reach something usable. If you bootstrap forth you're basically there. You have clean, readable source code that can be ran with a ridiculously simple interpreter/compiler. It's a very natural choice.

But of course forth is such a different paradigm that most people just don't want to learn how to write in it properly (in such a way that you end up with actually readable code). Which is fine. I guess it really isn't fun enough for most. But it's difficult to ignore just how great of a fit it is.

zellyn
0 replies
18h12m

IIUC, this is frequently suggested but never followed through on by someone who knows enough Forth to do it.

stevefan1999
3 replies
18h3m

Just for the lulz I'm writing a C compiler in Rust as a hobby, and it is humorously called "Small C Compiler", a call back to "Tiny C Compiler" because Rust is obviously more heavyweight than C.

It uses Cranelift as a back end, but the whole compiler architecture is pluggable and hackable with lots of traits throwing around. I do not intend to open source it unless it works on a somewhat functional stage to be able to handle printf("%s", "Hello World!"), so until then, it will never see the light of day.

I've not been able to make too much progress, but I've tried to implement the preprocessor and parser, and I have been involved on rust-peg and HimeCC because of the infamous typedef problem. I know that in the industry we just use a symbol table to keep the typedef context, but that had a limitation of not able to read types below. I wonder what is the academic solution to that as well, and I can only think of transactional memory.

Anything that helps would eventually make me open source it!

a_e_k
1 replies
16h33m

FWIW, (i.e., for some historical fun) Dr. Dobbs Journal published a program called "Small C Compiler" by Ron Cain back in 1980. [1]

Later, it was expanded by James Hendrix into a full book with a more complete implementation. [2] (As a kid, coming across this book in the bargain bin at CompUSA was what led to me learning C. I still have my copy!)

[1] https://archive.org/details/dr_dobbs_journal_vol_05_201803/p...

[2] https://www.amazon.com/Small-Compiler-Language-Theory-Design...

FullyFunctional
0 replies
2h51m

That was my first introduction to C and I hacked a lot on that code. A very enjoyable time was had.

My only regret with Rust is that a “Small Rust Compiler” will be an order of magnitude larger.

dmvdoug
0 replies
14h17m

Hope you name it not “SCC” but “SmaCC”.

wrs
2 replies
19h45m

Given that we're this far along, bootstrapping is purely an aesthetic exercise (and a cool one, to be sure -- I love aesthetic exercises). If it were an actual practical concern, presumably it would be much easier to use the current rustc toolchain to compile rustc to RISC-V and write a RISC-V emulator in C suitable for TinyC. Unless it's a trust exercise and you don't trust any rustc version.

0x0203
1 replies
17h18m

The practical concern for my colleagues and me is that we're OS/kernel developers for an operating system that isn't currently supported. I had to fight these kind of problems to get java ported to our OS, and a coworker had to do it for rust, which was much much harder. And he did end up having to start from one of the earliest versions and compile nearly every blasted version between then and now to get the latest. It's a royal pain and a major time sink. If there were a viable rustc that was written in C or even C++ at the time, we could have been done in a few days. Instead it took months.

foldr
0 replies
9h23m

As in your other comment there seems to be some confusion between bootstrapping and porting here? If you want to port Rust to a new OS then you ‘just’ need to add a new build target to the compiler and then cross-compile the compiler for your OS (from an OS with existing Rust support). That may indeed be a lot of work, but it doesn’t require bootstrapping the compiler, so this project wouldn’t be of any help in that scenario.

zellyn
0 replies
18h11m

I thought that was the whole point?

taneq
1 replies
16h18m

Why isn't anyone referring to bootstrapping a rust compiler as "rusting rust"? :D

namjh
1 replies
18h0m

Do we have a better method of verifying compilation output than just re-executing the compiler with same source, than comparing the output? TEE attestation could be a thing(albeit it could be a "trusted" third party which occasionally be broken).

mappu
0 replies
9h50m

Diverse double-compiling (DDC) can help.

zombot
0 replies
5h46m

It’s basically code alchemy.

More like archaeology. Alchemy was essentially magic, but there's nothing magic about bootstrapping from hex-punched assembly.

zellyn
0 replies
18h8m

Sometimes I fantasize about writing a C++ interpreter or compiler in scheme: going directly from scheme to current gcc would be a huge shortcut. But common wisdom is that writing a C++ compiler is basically impossible. Still, it’d be instructive!

xiaodai
0 replies
19h30m

Should've written it in LLVM IR

sylware
0 replies
9h15m

Dude, you are amazing. If the rust people are serious about anything, they should support you as much as they can.

You got it all right. Really all. QBE, C without extensions (you should lock the code to C99-ish though, or you will have kiddies introducing ISO C feature creeps then planned obsolescence into your C code on the long run).

C without extensions... where the linux kernel failed hard (and tons of GNU projects... like the glibc): it should have plain assembly replacement (and I mean not using any C compiler inline assembler) and that should compile out-of-the-box with a mere SDK configuration option.

This will be a nearly perfect binary seed for the rust programing language, but you are using QBE, then you get some optimizations... guess what... I did my benchmarks (very basic) with CPROC/QBE and I get ~70% of the speed of latest gcc (tinyCC is 2 times slower than gcc, but its generated assembly code is "neat"/"clean").

All that to say, maybe this project will much more than a binary seed if it becomes a "real life" rust compiler.

The main issue though is the rust syntax itself: heard it is not that stable/finished, and it is on the way to that abomination of c++ syntax complexity. When I tried to read some of latest real life rust syntax, I did not understand anything, and I code mainly C (c++ when I was young brain-washed fool), assembly (rv64/x86_64), this is bad omens.

Oh, and don't forget to provide statically linked binaries for various platforms, you know: the binary seed.

nilslice
0 replies
17h7m

love the use of QBE for backend here. will be interesting to follow and see any comparisons against rust with llvm! good luck!

nickpsecurity
0 replies
17h50m

When I learned C a bit, I was looking up how people did C++-like stuff in C. I found objects, exceptions, concurrency, etc.

If mrustc is written in C++, could it be easier to use such C++-like primitives to port its working C++ to C? And do it a bit at a time using strong interoperability between C++ and C?

Before anyone says it, I know that would be a hard, porting effort with many pitfalls. Remember we’re comparing that to writing a Rust compiler in C, though. It might be easier to port the C++ to C.

This also reminds me of the C++ to C compilers that used to exist. I don’t know if they’re still around. I think both Rust to C/C++ and C++ to human-readable C compilers would be useful today. Especially to combine the safety benefits of one with the tooling of the other.

jdbdbebe
0 replies
7h44m

Nice hobby project but in my opinion it's futile since rustc moves quite fast and is using the newest rust features

Having a rust backend emitting C would be an easier way but at that point, just cross compile

fuhsnn
0 replies
7h42m

From there they can bootstrap yacc, basic coreutils, Bash, autotools, and eventually GCC ... it’s a fascinating process.

I would say about half of the list can be trimmed off if you managed to separate GCC 4 and binutils from their original build scripts, notice the sheer amounts of items there are just repeatedly rebuilding auto-stuff and their dependencies[1].

[1] https://github.com/fosslinux/live-bootstrap/blob/master/part...

fsckboy
0 replies
20h50m

TL;DR his goal is rust, but for bootstrapping a first rust compiler for a new environment, the work is already done for C

the article is interesting, and links to some interesting things, but that's what the article is about

his project is https://codeberg.org/notgull/dozer

he references bootstrappable builds https://bootstrappable.org/ a systematized approach to start from the ground up with very simple with a 512 byte "machine coder" (more basic than an assembler) and build up from there rudimentary tools, a "small C subset compiler" which compiles a better C compiler, etc, turtles all the way up.

foldr
0 replies
20h35m

Very cool project.

I'm not totally sold on the practical justification (though I appreciate that might not be the real driving motive here). This targets Cranelift, so it gives you a Rust compiler targeting the platforms that Rust already supports. You could use it to cross compile Rust code from a non-supported platform to a supported one, but then you'd be using a 'toy' implementation for generating your production builds (rather than just to bootstrap a compiler).

Ruq
0 replies
18h25m

It always comes back to C.