return to table of content

Write your own retro compiler

nils-m-holm
20 replies
2d3h

Here it is, my latest compiler book! Basically an expanded version of "Write Your Own Compiler", this time discussing code generation for CP/M on the Z80 (instead of ELF on modern systems), which simplifies some things a lot.

How much complexity do you need to self-compile a compiler in 10 minutes on a 4MHz Z80 system? Take a look and find out! The code is free (but the book is not).

7thaccount
17 replies
2d1h

Always look forward to hearing about what you're working on Nils! I hope your business of doing this is profitable as well! One day I'm going to finally buy a copy of everything and work through it all. It seems like there is so little time though lol.

Edit: I'd also love to see you do a no-nonsense book on Forth and your take on it.

nils-m-holm
16 replies
2d

Thank you!

Books are my biggest source of income, but "big" is relative here: I am earning about $500 per month in revenues. This is mostly a problem visibility, I think. Most people stumble across my books by accident. Reviews, presence on the front page of HN, etc. usually increase revenues significantly for one month.

Regarding FORTH, lets see. The code is already there: http://t3x.org/t3xforth/

pamoroso
7 replies
1d20h

Nils, regarding visibility, have you considered setting up a Mastodon account? Yesterday I shared on Mastodon a link to your book and my post got 28 reshares, 42 likes, and half a dozen comments. And I'm not even the author of the book. In the Fediverse people do notice you, read what you write, and click your links. Which is mind blowing for those used to traditional social platforms.

nils-m-holm
6 replies
1d20h

Thank you, but I do not even know what Mastodon or the Fediverse are. :) Could you point me to some resources that would get me started? Preferably to something for the social media-illiterate! Which software to use (on BSD), where to register (if that is a thing), etc. That would be cool!

pamoroso
2 replies
1d10h

Think of Mastodon as an open-source Twitter with multiple servers ("instances" in Mastodon-ese) instead of the only twitter.com The servers interoperate through a common protocol, thus allowing any user on any server to follow and engage with any other user on any other server. The protocol actually makes it possible to interoperate with platforms other than Mastodon that are part of a larger system called the Fediverse, but you can ignore it for now.

The only software you need to get started is a web browser to use a web client, which is typically the Mastodon server you create an account on. Picking a server is the only potentially confusing choice, so for retrocomputing enthusiasts I recommend creating an account at https://bitbang.social or https://oldbytes.space Once you have an account on a server you can migrate to another one if needed.

For more on Mastodon see https://joinmastodon.org For any other questions feel free to ask here or follow me on Mastodon at @amoroso@fosstodon.org and ask there.

nils-m-holm
1 replies
1d8h

I see. Thanks for the explanation and the links! Unfortunately I pretty much do not know what Twitter is, either. So I looked at the instances (?) you suggested and pretty much saw a doom-scrolling wall of random postings, which I find exhausting.

So when adding an account, I would just add to that wall of postings? How would people interact on such a platform? I am coming from Usenet, which had the greatest user interface ever (IMHO): you just follow-up on a posting and replies will pop up in your stream of unread messages. How would this work on Mastodon?

pamoroso
0 replies
1d7h

On Mastodon (and similar socials like Twitter and Facebook to a certain extent) you mostly follow people instead of themed groups like Usenet.

You set up your "wall of postings" (the "feed" or "timeline" in the jargon of socials) by following (subscribing to) the people you're interested in. On Mastodon you can also follow topics by following "hashtags", which are the sets of posts by any user tagged with a string preceeded by a hash character. For example, following the hashtag #retrocomputing will bring in your timeline the posts about that topic.

So, on socials, you typically scan your feed and the feeds of any additional hashtags you're interested in. For each post you can reshare it, reply (comment), or like it.

Some additional resources: 1) https://opensource.com/article/23/1/mastodon-beginners-guide 2) https://github.com/joyeusenoelle/GuideToMastodon

7thaccount
2 replies
1d13h

Fediverse refers to a collection of decentralized applications that use some kind of common protocol. Like instead of centralized Twitter, you could choose Mastodon and so on. My understanding is very basic though.

Long story short, I'd love to see you advertise some more. I'm not sure how you'd reach your typical audience though.

nils-m-holm
1 replies
1d8h

Long story short, I'd love to see you advertise some more.

Let's see. I have had a look at Mastodon and all I see at the moment is a wall of random distractions. Maybe I am missing something, though. I will investigate further. On HN, for example, there are just headlines and it takes me maybe a minute to scan the front page. Mastodon looks pretty chaotic and time-consuming compared to that.

7thaccount
0 replies
1d7h

Oh I agree and of similar mind to yourself. I was just answering the question above about Mastodon.

Regarding advertising...idk, but there should be some kind of option for you out there. HN is of course a good place for this kind of thing. Is that lambda the ultimate blog still a thing? Not sure how many people listen to the arraycast podcast, or if you're comfortable with that format...but I'm sure they'd be interested in your work on Klong and the implementation choices.

winter_blue
3 replies
1d23h

$500 isn’t really enough to live on — do you have a job that you use to support yourself (or do you live in a low cost-of-living country)?

nils-m-holm
2 replies
1d23h

I have multiple sources of income, but book revenues are the biggest one. The area I live in is a rather expensive one, but I pay no rent, which helps. By the standard of our country I qualify as "poor", but I do not mind much. A little bit of safety would be nice, but I do not need much to live comfortably.

tiberious726
1 replies
1d21h

This is great quality work---you'd crush it writing technical documentation or putting together technical internal corporate training programs.

nils-m-holm
0 replies
1d20h

I tried that some time ago, but do not seem to be compatible with all the business stuff. Thanks anyway! :)

Archbtw97543
1 replies
2d

Thanks for being so transparent

nils-m-holm
0 replies
1d23h

Sure, no problem :)

7thaccount
1 replies
1d16h

Awesome Forth implementation! I'd love to read a high level book on the implementation too once you get around to it!

nils-m-holm
0 replies
1d8h

Thanks! I will keep it in mind!

N3Xxus_6
1 replies
1d16h

Nils just wanted to say I love your work. I have your C compiler book and one of your lisp compiler ones as well. I’ve learned a lot and your work has helped me appreciate compilers a lot more, so thanks!

nils-m-holm
0 replies
1d8h

This is good to hear! Thank you!

freedomben
12 replies
2d

Why such a focus on retro computing? As an oldie I think it's cool (though a bit impractical learning the parts that aren't applicable to modern stuff), but my son is interested in learning operating systems, compilers, etc, and I could never get him to use something so "outdated."

To be clear, I'm not attempting to criticize with this question (my personal opinion is write about what interests you, even if nobody else will care), I'm assuming you choose older targets for a reason, and would like to undestand those reasons :-)

I.e. Do you believe the retro targets to be a lot simpler and easier to understand, so people can iterate/build in layers? Or do you just know the retro stuff better so it makes for a better book?

nils-m-holm
8 replies
2d

My personal perspective is that computing has become much more complicated than necessary in the past decades. Of course abstraction will always create complexity, and some of that can hardly be avoided, but in computing these days complexity is really off the charts.

So for this book I chose a platform that is easy to understand and does not make you wade through tons of abstractions that are only loosely related to compiler construction (e.g. ELF object file format).

082349872349872
3 replies
1d23h

Here's one way to avoid going into detail on ELF: https://news.ycombinator.com/item?id=38592000

"(nearly) constant" means you can pick either (a) a constant blob, at the cost of a fixed image size, à la COM, or (b) patch up length (one or two places, iirc) if you're feeling fancy.

nils-m-holm
2 replies
1d23h

In one of my other compiler books (http://t3x.org/t3x/book.html) I just use a template for the ELF header, but I still think it adds too much complexity. One reader complained about it.

082349872349872
1 replies
1d21h

found it: elfheader() in https://t3x.org/t3x/t.t.html

fwiw, I think you commented it very nicely; de gustibus!

Did you come up with if vs. ie ... else ... independently or inherit it from BCPL?

nils-m-holm
0 replies
1d21h

fwiw, I think you commented it very nicely;

Thanks, I thought so, too, but I can also understand that the comments are not very helpful, if you know nothing about linkers, paging, and object files.

I think I adopted IE/ELSE from BCPL, but thought that IE is nicer than TEST, because it is itself short for If/Else and because it looks almost like IF.

de gustibus!

Funny, I just started to brush up my Latin! :)

mst
2 replies
1d3h

Having very much enjoyed LFN, I think your taste in where to simplify suits me, at least.

I doubt I'll ever get around to it but I'm sort of tempted to see if I can remember enough arm26 assembler to try and get it to compile to that (arm26 does have multiply but not divide, I'd have to dig out the 'standard' fast division asm block that was passed around amongst Archimedes authors back when that was my primary platform).

Just bought a copy so I should at least get as far as "I've read the book but not got around to the arm26 part" though :D

nils-m-holm
1 replies
1d3h

What is ARM26? I know ARMv6, and a T3X/0 back end for the ARMv6 would be seriously cool. So cool in fact that I might write it myself. :) Here is an unsigned divide for the ARMv6 that I borrowed for the SubC compiler: http://me.henri.net/fp-div.html In fact the sources for the SubC compiler might serve as a pretty good foundation for porting T3X/0 to the ARMv6.

mst
0 replies
2h58m

The ARM 2 chip in the early Archimedes machines was a 32-bit chip with 26-bit addressing so it got referred to as arm26 assembler (I'm not sure if that's technically -correct-, but I'm pretty sure it was in the title of the book I learned from and there's a directory in linux kernels that support that arch called asm-arm26 at least, so my memory isn't -just- making it up ;).

sponaugle
0 replies
1d2h

I would align with this philosophy as it relates to learning. Complexity is a likely a necessity of technological advancement, but a hinderance to 'from-scratch' learning. Abstractions hide many details that may not be needed to operate, but are needed to understand.

The entire retro-computing field is interesting and a bit surprising in it's strength. I would not have guessed that in 2023 an IMSAI 8080 would be selling for ~$3-5k, and that there would be such an active community of people doing things in CP/M. Nor would I guess that I would be working on either of these!

One of the most enjoyable personal projects that I have worked on was the creation and implementation of a CPU, starting with the concept ISA and building the microarchitecture, making lots of cool mistakes that require rework, and eventually having a booting CPU running code in an assembly language of my own. While the performance of a CPU like this is closer to the computational power of my coffee maker, it is still an excellent experience.

I ordered my copy of your book and am looking forward to reading it!

unoti
0 replies
1d16h

For me the interest in retro computing is that the computer is so much simpler that it is indeed possible to understand. The set of assembly instructions is small enough that a person can understand every opcode fairly easily, the amount of memory is limited, and the list of things that that your installed ROM can do is relatively accessible. Contrast that to modern machines and operating systems, and no mere mortal can understand everything that is happening in the machine.

Another thing to love about retro computers is how transferrable the knowledge is to modern machines. Once you know the essence of assembly language on a retro computer, you have a good basis for learning RISC-V or something else modern.

Depending on where your son is at, he might enjoy this video where I explain the essence of assembly language-- it's the video I wished someone had shown to me when I was a kid and thought I wasn't smart enough to learn it.

https://www.youtube.com/watch?v=ep7gcyrbutA

tcmart14
0 replies
22h14m

Its sort of like how in a lot of universities when taking a computer architecture course, the focus is usually on MIPS or RISCV or LC-3 and not something like a modern x86_64 processor. Tons of books exists to teach computer architecture in these architectures. They are also small, compact, and straight forward. You can understand every thing in 6502 or simple MIPS processor from logic gates, instruction set and how pipelining instructions work and branch prediction. I could be wrong, but if I had to guess, not even an engineer at Intel can break down their current processors and have all of that fit into their head.

Retro stuff also presents a challenge. There is a guy who writes some crazy software for I believe the Mac-II. It can be a great test of skill to take an old machine have it capable of doing modern things, because it requires getting knee deep in the weeds on optimization and such. One issue I think we have in software is, modern machines are really powerful. To the point of even shitty un-optimized code in average use cases can run decently well. Targeting these kinds of machines can help build those skills in learning how to optimize because, well, you would actually have to optimize to get these older machines to do modern things.

Archbtw97543
0 replies
2d

Newer compilers or other low level programs have become extremely complex. There a loads of features or optimizations which are not strictly necessary but still add a lot of complexity Attemptting to explain how modern compilers work would most likely result in lots of confusion.

amelius
7 replies
2d

This book looks fun. But I'm still waiting for a worthy successor of The Dragon Book, discussing optimizations for modern CPUs (and perhaps GPUs), and also discusses how to design/write a modern VM with a fast concurrent GC (something that some might say is even harder than writing the compiler!)

tralarpa
4 replies
2d

I remember that I didn't like the Dragon Book when I was a student. But I don't remember anymore why :) (I think I found it poorly structured, and with too many details for some topics and not enough details for others).

If you already have some base knowledge, you might like this:

https://www.cs.cmu.edu/~janh/courses/411/18/schedule.html

I particularly liked how they introduced SSA form.

More advanced topics:

https://www.cs.cmu.edu/~15745/handouts.html

kopecs
2 replies
1d19h

I also disliked the Dragon Book as a student. I found it to have too much of an emphasis on lexing/parsing and not enough discussion for optimizations/analysis for my liking. I liked Advanced Compiler Design and Implementation by Muchnick, although it does have some warts (ICAN; less discussion of SSA than I would've liked) and I think it is a bit dated now.

FYI: replacing the 18 with 23 in your link to 411 gets you a slightly updated version.

mst
1 replies
1d4h

I suspect that the emphasis is an artifact when the Dragon Book was written - getting a decent parser was rather more of a challenge, and heavily optimising compilers a lot less common.

Tis one of those things, I guess.

trealira
0 replies
19h14m

heavily optimising compilers a lot less common.

Not just that, but the theory that drives modern compilers, like graph coloring register allocation and static single assignment form, weren't conceived until the mid 1980s (1984 and 1986 each), and better implementations of those theories were written about in the 90s. Linear scan register allocation was first written about in 1999.

You can compare these books:

Engineering a compiler: VAX-11 code generation and optimization, which was published in 1982, talks about the design and implementation of the PL/I compiler for the DEC VAX-11.

You can compare it to Bob Morgan's Building an Optimizing Compiler, which was published in 1997. The techniques he discusses there are a lot closer to how LLVM works today than the 1982 book.

tomcam
0 replies
1d23h

Fantastic resource, thank you.

trealira
0 replies
19h32m

Bob Morgan's book Building an Optimizing Compiler is entirely focused on modern compiler optimizations (and it was published in 1997). It goes over building a control flow graph, various optimizations you can do with that, alias analysis, static single assignment form, CFG dominator-based optimizations, instruction scheduling, register allocation, and emitting object code. It doesn't talk about lexing, parsing, NFAs/DFAs, etc. like most compiler books.

Static Program Analysis[1] also seems helpful to someone trying to write a compiler that does optimizations that require advanced analyses: https://cs.au.dk/~amoeller/spa/

For garbage collection, there's the Garbage Collection handbook[2]. I'm not aware of better additional resources, though.

[1]: https://cs.au.dk/~amoeller/spa/

[2]: https://gchandbook.org/

freedomben
0 replies
2d

Same, that's been my hope too. Stuff is getting so complex nowadays with modern microcode/firmware and such, and there's so many things that seem like "magic" to me. Feeling like something is "magic" is my internal signal that somebody figured out some clever way to get around what I think is/was the limitations, and I love discovering people's clever hacks. Recently been reading about Fake Bass (which is how small speakers seemingly violate the laws of physics by producing bigger bass than they are capable of) and how they accomplish this by (ab)using harmonics to trick the brain into hearing deeper notes than are actually tere. Fascinating stuff!

AlexeyBrin
6 replies
2d2h

This looks really interesting, however a disadvantage is that the reader needs to know or learn a new programming language first T3X. I wonder if one could start from scratch on a CP/M system: write and develop the compiler on a retro system that has no connection to the outside world except the keyboard and display.

nils-m-holm
2 replies
2d2h

however a disadvantage is that the reader needs to know or learn a new programming language first

This is a good point, and I have thought about it a lot before starting the book. What finally made my choose T3X is that its compiler is much smaller[1] than my smallest C-subset compiler and (IMHO) T3X is easier to learn or understand.

[1] SubC: 3815 lines, T3X/0: 2330 lines.

Of course you could start on CP/M without any outside tools, but then you would have to write your bootstrapping compiler in assembly language. Time-consuming, but certainly manageable. I doubt that it would be an interesting reading, though.

JonChesterfield
1 replies
1d22h

You'd have some enthusiastic readers for writing compilers in assembly. Especially if you went down the route of progressively more capable assemblers. But "some" might be fewer than five.

ramilefu
0 replies
1d16h

Make that six! I’m in!

mst
2 replies
1d3h

I don't see that as a disadvantage because (a) if you're planning to write a compiler, 'new programming language' is probably not something you're troubled by, and (b) having a simple+clean 'toy' language probably makes for better pedagogy.

I'm -slightly- surprised it isn't using a C-like syntax rather than ALGOL-like but that's probably my own biases, and mentally mapping 'DO' and 'END' to '{' and '}' isn't much of a hardship.

nils-m-holm
1 replies
1d3h

T3X dates back to the early 1990's, where C-style syntax was not as ubiquitous as it is now.

mst
0 replies
2h56m

I'd apparently misguessed that the minilanguage was created for the book.

Though if you'd care to share a link or something, I'm now curious how it -did- come to exist.

scrawl
3 replies
2d2h

I have a physical copy of Practical Compiler Construction 2nd Ed. and like it a lot. I recommend Nils' books to anyone who may be interested.

nils-m-holm
2 replies
2d2h

Thank you! The 2nd Ed. was quite an endeavor, I am glad you like it!

kqr2
1 replies
2d1h

You may need to update the thumbnail image on your index page. It still shows the first edition:

https://t3x.org/index.html

nils-m-holm
0 replies
2d

Oops, good catch. :) Thank you!

mati365
2 replies
2d3h

Recently I made C multipass compiler (and asembler) in typescript for such old x86 CPUs

https://github.com/Mati365/ts-c-compiler

nils-m-holm
1 replies
2d2h

Cool!

The one in the book is for the Z80, which is a bit older and does not even have multiply or divide instructions. The compiler can also output code for the 8086, though. And the 386.

vanderZwan
0 replies
1d20h

Do you happen to have books that cover the latter two targets?

EDIT: should have taken a look on the rest of your website first. Clearly you do, hahaha

http://t3x.org/index.html

jdwithit
1 replies
1d15h

Apart from the actual book content, I enjoy your appropriately retro and minimalist web design. It's giving me a huge nostalgia hit. Fond memories of hand crafting my own sites with a 6 inch thick book titled something like "HTML 3.2 UNLEASHED!!!!" on my desk :)

nils-m-holm
0 replies
1d2h

Good times!

It is all fun until you want to apply global changes to some 1500 static pages. :) So far the pain is not sufficiently intense to make me write a CMS, though. And even then the design would stay the same!

eterps
1 replies
2d2h

It would also be interesting to have a book on writing your own CP/M-like OS.

retrac
0 replies
2d1h

That's Andrew Tanenbaum's Operating Systems Design and Implementation.

Now, yes, that shows you how to write a Unix-like microkernel OS; just skip everything except the file system chapter. And don't follow the advice about tree data structures. Just use flat tables, and don't bother to implement exact file sizes. Presto: CP/M.

(I prefer the 2nd edition. The 3rd edition needlessly complicates IMO, mostly so the demo Minix code will work on a late 1990s PC instead of a 1980s PC.)

Max-q
1 replies
1d23h

This comment is not ment to be negative, just some insight that might be valuable.

I read the free chapter. One thing I noticed right away was that I think some things can be hard for people with not so much knowledge about the topic: under each headline, it explains a concept from the ground up, no knowledge required. Like "the syntax of a language is...". But just a few sentences in, advanced topics are touched, like assembly instructions, not explained. It feels a bit like "the curse of knowledge", where it's hard to know what the other party knows. But if the reader needs to learn what syntax means, they will probably not understand the next sentences.

So, I think more consistency could improve the product.

This is of course just my meaning and interpretation of the text, it might not be relevant. But maybe something to have in mind for your next masterpiece :)

nils-m-holm
0 replies
1d22h

under each headline, it explains a concept from the ground up, no knowledge required. Like "the syntax of a language is...". But just a few sentences in, advanced topics are touched, like assembly instructions, not explained.

As the blurb of the book states: no prior knowledge in the field of compiler construction is required, but the reader should be familiar with at least one procedural language and one assembly language. So I thought it would be OK to assume that the reader knows about things like assembly instructions.

Then the appendix of the book has a short introduction to Z80 assembly (which still assumes that you know the basics of assembly language).

Every books starts somewhere. It would be hard to write a compiler construction book and assuming zero knowledge about computer programming.

I am not saying that the curse of knowledge is not a thing, though, so I will definitely keep this in mind!

thinkmassive
0 replies
2d3h
pests
0 replies
1d13h

A compiler tutorial that gets past the lexing and parsing stages? First of its kind.

I kid, but it is a common stopping point. Gonna pick this up.