scrapscript.py

I have to admit this broke my brain. This is the first time I'm hearing about content addressable languages, and once you get over that barrier, a distributed language doesn't seem far fetched.

As a big fan of functional programming, is this something that is going to just end up being an esoteric language? Don't get me wrong, I absolutely love the vision of the authors, but after being bitten by the Elm bug and that crashing and burning, I'm just cautious of getting invested in new languages and tools.

Interesting idea, but isn't code addressable already in most languages?

We call them modules/libraries and we pip/npm install them from Github and you can keep track of changes/versions/PRs.

Content addressable has a very specific meaning: https://en.wikipedia.org/wiki/Content-addressable_storage

Modules and libraries are addressable based on their names or URI:s.

"Unison eliminates name conflicts. Many dependency conflicts are caused by different versions of a library "competing" for the same names. Unison references defintions by hash, not by name, and multiple versions of the same library can be used within a project." https://www.unison-lang.org/docs/what-problems-does-unison-s...

"Here's the big idea behind Unison, which we'll explain along with some of its benefits:

Each Unison definition is identified by a hash of its syntax tree.

Put another way, Unison code iscontent-addressed. Here's an example, the increment function on Nat:

increment : Nat -> Nat increment n = n + 1

While we've given this function a human-readable name (and the function Nat.+ also has a human-readable name), names are just separately stored metadata that don't affect the function's hash. The syntax tree of increment that Unison hashes looks something like:

increment = (#arg1 -> #a8s6df921a8 #arg1 1)

Unison uses 512-bit SHA3 hashes, which have unimaginably small chances of collision.

If we generated one million unique Unison definitions every second, we should expect our first hash collision after roughly 100 quadrillion years! " https://www.unison-lang.org/docs/the-big-idea/

Seems like identifying your library with a git tag would drop that risk to zero.

I guess what I'm not understanding here is the utility. Why is it useful to include multiple versions of a library in a project? Is this a limitation I've been coding around without knowing it?

One reason for multiple versions of a library in a project is that the project wants to use 2 different dependencies, which themselves depend on incompatible versions of a third library.

ok, yep, that's one I've had myself. Thanks.

Have you ever had problem where two of your dependencies are each using a different version of the same library? Or have you ever wanted to incrementally upgrade an API so that you don’t have to change your entire code base in one fell swoop? That is where things like Unison or scrapscript can make it very easy.

Ok, I can see "incremental upgrade" as a use-case. Thanks.

Tags are not immutable.

I recommend reading the benefits section in the Unison docs[0].

0: https://www.unison-lang.org/docs/the-big-idea/#benefits

I think it is something like Hoogle for haskell but instead of looking for the types of the functions you look for a hash of some kind of canonical encoding of the definition, so it is like an encoded knowledge graph but you should have to give rules in order to construct that graph in a canonical way.

Edited: What I thought was wrong, anyway the idea of above could be useful for something like copilot to complete definitions.

Content-addressable, not code addressable. It's kind of like global, distributed memoization (IIUC).

edit: not memoization, just hashing the AST of a function.

Content is by definition content addressable. x = 42 is a hardlink to every other instance of x = 42 if you will. What this does is more compact and practical content addressing, like Nix or Git. But realizing that there are always more than one way of expressing the same logic (with different hashes no matter how you canonicalize) makes me doubt it is a killer feature.

That does not sound like it could make any money though...

In case you don't know about this, this is kind of what they are trying to achieve: https://www.unison-lang.org/

I still don't get it, could someone smarter than me explain?

helloWorld : '{IO, Exception} () helloWorld _ = printLine "Hello World"

The example above is followed by explanation "{IO, Exception} indicates which abilities the program needs to do I/O and throw exceptions." Well, which abilities does it need then? No idea.

Well, which abilities does it need then? No idea.

Abilities called IO and Exception.

I am sure you are familiar with effect systems and algebraic effects, right? Abilities are what algebraic effects are called in Unison: https://www.unison-lang.org/docs/fundamentals/abilities/

So, in Haskell you would have IO monad and Exception monad, but in Unison you have an IO ability and an Exception ability.

If you want to know more: https://www.unison-lang.org/docs/language-reference/abilitie... and: Convent, L., Lindley, S., McBride, C. and McLaughlin, C., 2020. Doo bee doo bee doo. Journal of Functional Programming, 30, p.e9. https://arxiv.org/pdf/1611.09259.pdf

I am sure you are familiar with effect systems and algebraic effects, right?

Probably not, based on their question!

These are pretty esoteric concepts. I think it was one of a few bullet points on "other interesting ideas" in the functional programming portion of my programming languages course, and I doubt that most working programmers have taken an academic PL course like that at all.

But effects are indeed an awesome concept, and thanks for the excellent links! The parent is one of today's lucky 10,000: https://xkcd.com/1053/

As a junior that did not do CS at uni, a lot of stuff around here goes right over my head. I often do feel that I might never catch up. I just about understand what functional programming is in terms of a one line definition let alone any concepts that fall under it. To be fair, I only really use Object Oriented.

Thank you!

I still don't get it, could someone smarter than me explain?

It is not about smartness, but it is probably you not having encountered these concepts before.

Thank you, this looks awesome. I already have a use case in mind, so will explore unison since it seems more mature.

Nothing crashed and burned. Elm is in a state where it’s fully usable and has all the futures you need. I use it every day!

Since it’s a DSL to create HTML + JS + CSS websites, you still get all the new features of browsers!

I think, even though I can absolutely see the arguments for doing so, locking down the compiler to no longer accept external native extensions was a huge mistake community wise, since a lot of the people advocating for Elm were the sort of early adopter who really really wants an escape hatch because they're in the habit of getting into situations where they need one no matter what tools they're using.

Certainly that describes me, and when all the people who seemed to be like me got told to go sit on a cactus by the Elm core developers and bailed out to work with something else, my experiments got pretty much immediately shelved and Elm moved into the "interesting place to steal ideas from, actively hostile to my actually using it" category.

This may be unfair, but I'm pretty sure it's a reasonable description of what -did- happen, fair or not.

Except for the compiler bugs, lack of self hosted package management, improvements to tooling, etc. I use it regularly too, but it is frustrating to see it in a state of decay.

They can definitely claim the above points and more are not goals, and Evan absolutely has every right to do so. But don’t be surprised when devs see it as dead.

Making python scripts into an 'Actually Portable Executable' is what really interested me here.

I'm a little confused by this part:

This executable is theoretically runnable on all major platforms without fuss. And the Docker container that we build with it [...]

It sounds like they're putting an APE inside a Docker container, but why would you want both?

perhaps to integrate with tooling that wants to work with containers

Yep, to deploy on fly.io

No, they build the APE with a docker container. The APE itself is... actually portable.

We do both. We also deploy a Docker container that runs scrapscript.com

Because some people throw a tantrum if it doesn't come pre-dockerized, I suppose

Not just python scripts, you can package any C code as well. We've used it to compile up a "python.com" APE file with a python 3.11 interpreter that has lots of packages (including C extensions) that we can just drop straight into old airgapped lab instrument computers and get a modern python data analysis suite up and running.

Now integrate Shedskin the Python compiler. :)

Thanks, TIL. Now we can combine this with the xlcalculator package and transpile models built in Excel right down to C code and build it as a portable executable.

That's both terrifying and wonderful and I hope to see a write up of it on the front page one day :)

That sounds very interesting! I probably could hack it together myself, but do you happen to have a writeup on that, or maybe some pointers on how to include the numpy-scipy stack into the executable?

Is this the reason why there is only one 5+ KLOC module (which includes the tests)? I personally prefer short / shorter modules with clear responsibilities.

No. This is not a limitation of APE/Cosmopolitan. This is just my personal preference because imports get tricky in Python land unless you either have a single file or go Full Package Mode. There's probably a world where we split the tests out, though.

Well, just to better understand how the code is organized (and experiment a bit with it), I have forked it: https://github.com/sfermigier/scrapscript/tree/trunk/src/scr...

Can someone explain what this is? Why it's a good thing? What's it's for? I have to admit based on reading the post I have absolutely no idea.

I should probably write a longer post about this, but scrapscript is an attempt to fix a lot of the "in-between" problems in software engineering.

Instead of working on "real" problems, I find myself battling untyped/undocumented YAML/JSON configurations, syncing JSON encoders/decoders, massaging incompatible dependencies, writing unholy SQL, etc.

I obviously don't have all the answers, but a system with the following properties seems like a worthwhile pursuit: (1) small enough to be used like JSON yet powerful enough to used like Javascript, (2) cryptographic guarantees that code is compatible over time, (3) a compiler that checks live servers for compatibility before deploying, (4) simple but expressive type system, (5) a package manager that facilitates all of this at a granular level... and so on.

On top of all that, I think these properties lend themselves to some grand ambitions like "a new internet" and a "google-docs live coding editor experience". Maybe I'm just full of myself though haha

scrapscript.py is the first real attempt at making scrapscript a reality, so some folks who feel these pains are getting excited to see some movement on the project.

EDIT: Here’s my recent scrapyard demo, if you want to see it in action: https://www.youtube.com/watch?v=SngOLU5G1Eg

I find myself battling untyped/undocumented YAML/JSON configurations, syncing JSON encoders/decoders, massaging incompatible dependencies

I feel your pain on having to manage so many dependencies. I write primarily in Python, and the various pip / Pipenv / pipx / PDM / Poetry dependency managers drive me pretty crazy. That's not even accounting for the multiple Python versions I need!

That said, I'm surprised that you're trying to _alleviate_ this by implementing your FP language in Python. The Python ecosystem is full of half-documented config files, incompatible dependency trees, etc.

Have you considered implementing it in any other languages after the Python one proves its worth? For example, if the language becomes strong enough, would you consider writing a scrapscript compiler in scrapscript, itself?

None of these config/dependency problems are present in scrapscript.py because it has no external dependencies and is written in one file. This is intentional!

Yeah, I'm not a huge fan of Python, but Max and Chris are world-class in that domain, so that's what we're doing for now.

Max has already started working on a meta scrapscript compiler: https://github.com/tekknolagi/scrapscript/pull/100

One thing I think we all agree on is that the implementations should be simple enough to easily port themselves to other languages. For example, one could probably port the existing scrapscript.py to Rust or Javascript using GPT in a single weekend.

You can see echoes of what I'm talking about in my tiny JS POC: https://github.com/tekknolagi/scrapscript/blob/trunk/scrapsc...

Some languages like Rust and Go put a lot of weight on the "official" implementation. I think scrapscript can be more like Lisp/Json where the spec guides parallel implementations. There are obvious downsides to this in general, but I think that content-addressability makes some of those problems moot.

I assume you're well aware of: https://www.unison-lang.org/ - as well as 9p and union mounting from plan9.

Yes, I'm aware :) I actually built the first scrapscript demo in 2018, drawing on inspiration from Ethereum's Solidity. Somebody pointed me toward Unison when I attended Strange Loop in 2019, and I chatted with Paul Chiusano, and it seemed like Unison and Scrapscript had incompatible design goals. Even now, I don't see much overlap outside of content-addressability. Unison is super cool though, and I wish their team the best!

(2) cryptographic guarantees that code is compatible over time,

What does this mean? You hash dependencies?

(3) a compiler that checks live servers for compatibility before deploying,

Why does a compiler need to talk to a server? Why should it? Seems like a huge step backwards in what a compiler is and expecting it to work later on.

> What does this mean? You hash dependencies?

Yes, but everything is hashed at the expression-level rather than at the file-level, which prevents a few classes of errors.

> Why does a compiler need to talk to a server? Why should it? Seems like a huge step backwards in what a compiler is and expecting it to work later on.

Imagine if Javascript tooling could throw an error when a client implementation diverges from the server's expected input/output types:

  > const res = await fetch("https://example.com/api", [1, 2, 3]);

  ERROR: You're sending this REST endpoint a list of integers, but it expects a string!

Wouldn't that be nice in some applications?

It might be interesting to include a comparison with Dhall and Jsonnet while you're writing docs.

I'd been kind of interested by https://yglu.io/ and now ingy's new piece of insanity https://yamlscript.org/ - helm appears to let you inject your own script to template charts and I was wondering about trying a wrapper around one of those (because text templating an indentation sensitive language like YAML makes me itch).

I think scrapscript is a really interesting idea, mind, this isn't a "here's an alternative" type comment, it's a "here's things that I think are neat in a similar way to how I think scrapscript is neat" :)

Edit: I forgot something! https://trout.me.uk/lisp/termite-r7rs.pdf is a paper on adding library support to the cross-network (kinda erlangish) termite scheme extensions - and leans heavily on content addressable-ness. Termite itself has gone the way of small lisp projects but I kept this around specifically for the content addressable stuff having been solidly worked out in a language I understood; maybe that'll come in handy for ideas for you as well.

Still not sure I fully understand, but that is more than likely down to my ignorance. I really appreciate your effort in explaining here. I should mention I'm not a full time developer and certainly not a webdev so this might be why I'm not grokking this. Thanks.

From reading https://scrapscript.org/ it sounds like its main feature is that things can be split up, put on platforms like IPFS, and distributed allowing you to access them from wherever.

Interesting tidbit: the book series that this website is named for is actually spelled berEnstAin bears, emphasis on the letters that everyone (including myself) remembers being spelled the other way. I literally learned this yesterday

I concur! This is considered part of her Mandela effect^1 isn’t it?

The Wikipedia article on the aforementioned bears even has a section on it —

https://en.wikipedia.org/wiki/Berenstain_Bears#Name_confusio...

^1 - which was the Mandala effect, in my original universe, I’m sure.

Why would it be the Mandala effect? It's named for the mass false memory of Nelson Mandela dying in prison in the 80s.

One high profile archetype of the mandala is the sand mandala, where practitioners painstakingly construct an intricate mandala out of sand over the course of days and then ritually sweep it away once it's complete, leaving no trace, as a meditation on impermanence or something like that.

Much like how in the Mandela effect the original universe is wiped at least partially away, leaving no trace of what was a complex and fully featured aspect of the timeline, other than what remains in your memory. Other people say "no, that's always been a table" while you remember the sand that was on top of it. Or something along those lines! For some people the resonance is strong enough for the mandala imagery to potentially overwrite the Mandela etymology. Especially if you're a person who's never experienced the effect about Mandela himself.

I think the website name is a pun based on the last name of its author.

Correct!

everyone

The books were a very minor part of my childhood, but I noticed immediately, and my family always pronounced it correctly.

I enjoyed the link to the language checklist https://www.mcmillen.dev/language_checklist.html

Programming in this language is an adequate punishment for inventing it

I was already laughing hard by this final punchline. Bravo.

Same here. It's the first time I've ever seen it.

This is about how scrapscript is implemented.

There was a popular Show HN about 9 months ago, about scrapscript itself: https://news.ycombinator.com/item?id=35712163

Since this uses cosmopolitan and the build script already downloads portable binaries from https://cosmo.zip has there any thought been given to wrap other portable binaries in scrapscript / download them?

Small, pure, functional, content-addressable and network-first sounds a lot like a mini Nix+ca-derivations [1]

[1] https://www.tweag.io/blog/2021-12-02-nix-cas-4/

This reminds me of a talk Tim Berners-Lee did in 2002 (at the 10th Python conference):

https://www.w3.org/2002/Talks/0206-python/ ("Webizing Python")

I wasn't there but I remember hearing that this wasn't well received by the participants.

Also, TBL references a post by Aaron Swartz at the end of his slides: https://web.archive.org/web/20050208021219/logicerror.com/we... (also titled "Webizing Python")

Elegant and pure.

I also like the Javascript lambda calculus this is a fork of.

Like early Haskell when it was just for fun before Haskell's Meta-monadic library sprawl that upped the learning curve