The lex and yacc utilities are part of POSIX.2; is there any reason not to reach for them first?
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/l...
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/y...
All the POSIX.2 standards for shell utilities can be found here:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/
The original introduction to lex and yacc was in the book by Kernigan and Pike:
https://scis.uohyd.ac.in/~apcs/itw/UNIXProgrammingEnvironmen...
As the author of a POSIX standard utility, I would advise you to only reach for such utilities when portability is the most important thing.
POSIX utilities are not great. Lex and Yacc included.
Is your criticism of the relationship of the lexer and the parser, or something more fundamental? Is an LR parser expressed in BNF notation unsatisfactory?
I say this only as it has been many years since I have written one, but I thought this OCaml presentation on a POSIX shell held the structure in high regard.
https://archive.fosdem.org/2018/schedule/event/code_parsing_...
The UX of the BNF notation.
I am partial to hand-coded recursive descent purely because I struggled with Lex and Yacc too much when I tried.
I commiserate, it can be unforgiving, and the use of C is admittedly out of vogue.
Oh, I love C. [1] But yes, unforgiving.
And a lot of magic, with special variables, macros, and functions that you must know. And unclear scoping.
[1]: https://gavinhoward.com/2023/02/why-i-use-c-when-i-believe-i...
Are POSIX utilities even portable? They tend to have poor windows support. And they also tend to target C, which has a high ceiling for portability, but also a high bar for making things portable (whereas more modern languages are often just portable by default).
My take on POSIX utilities would be only to use them on Linux platforms where they effectively form a "native" part of the platform.
I was mostly speaking about POSIX, since that was the focus of GGP.
Yes, you should only use them on POSIX.
My utility does build natively on Windows, though.
I disagree here. The syntax can be infuriating at first, but once you understand it, Lex/Yacc are rock solid and speed up development significantly.
Does a single production compiler use lex and yacc to generate any part of their system, beyond maybe a first pass to test out syntax before it gets rewritten into a hand written parser/lexer? I'm not going to say none exist since I don't know literally every production compiler ever written, but I have never heard of one that used them for the final code.
I think Ruby does? IIRC its source code includes a 10,000+ line “parse.y” file which is converted to C code using Yacc.
Excitingly, Ruby 3.3+ shipped with a new alternative parser called Prism[0]! See also: [1][2].
[0]: https://github.com/ruby/prism
[1]: https://railsatscale.com/2023-06-12-rewriting-the-ruby-parse...
[2]: https://railsatscale.com/2024-04-16-prism-in-2024/
Just went and looked in the github repo. 16,000+ line .y file. I can't imagine that is pleasant to maintain.
Interesting, I'll have to look into this because I'd never heard that before and now I'm curious.
A language with as much going on as Ruby was not one I would have picked as a candidate for yacc
Python notably switched from hand written recursive descent, to a PEG based parser generator.
But indeed, last I checked, recursive descent was the most common choice overall.
Yes, OCaml with its very complex syntax and hundreds of features uses the OCaml equivalents of Lex/Yacc. It is a myth that one cannot use Lex/Yacc in production.
For a few reasons.
One, because actually building the lexer and parser from scratch is a useful exercise in a learning context, which is what this is.
Two, because the book wants to teach Pratt parsers, rather than LALR parsers. There’s a lot of literature out there on LALR parsers and generated parsers in general, but precious little on handrolled parsers. Covering material that hasn’t been covered to hell and back is a Good Thing.
Three, because lex and yacc generate C, and the book has two implementations of the interpreter, in C and Java. You could’ve used ANTLR to generate both parsers, but lex/yacc would only cater to the C version
And finally, because not everybody is on a POSIX system. Before WSL, using anything Unix-y on Windows was miserable. These days it’s mostly fine, but using WSL means you’re not actually building windows-native stuff anymore.
Parser generators create more problems than they solve.
Is there any reason to reach for them first?
The Unix Programming Environment tutorial for building hoc doesn’t even come close to what Nystrom gets into in Crafting Interpreters. Hoc is a very fun little language, but as Nystrom describes several times in the book…parsing just isn’t the interesting part of the game.
Bob addresses this in the introduction^:
^ https://craftinginterpreters.com/introduction.html#the-code