Vesuvius Challenge 2023 Grand Prize awarded: we can read the first scroll

When I first came across this project on HN (early last year), I was taken aback by how impossible the project looked and how smart were people working on this. Despite seeing a few intelligent names behind the project, I subconsciously believed that this would at least take 5-10 years before a breakthrough.

Today I sit with the same amazement, taken aback again, appreciating how ridiculously awesome this is. Congratulations to the winners and everyone involved!

So many things that look insane are becoming a reality. You look at those scrolls burnt to a crisp and the idea of reading them is nonsense.

The fact I have a computer writing flowery alt text descriptions of my photos with unnerving accuracy is something I would not have predicted for another 20 years. But, here we are...

Right? Imagine trying to explain some of it to one of the ancients - so, you have this quartz sand, see?...

That would be like explaining a baking recipe by starting with protons and electrons.

State-of-the-art machine learning architectures aren't actually that complex. Diffusion models and transformers can be explained to a bright high schooler. I'm sure Archimedes and Euclid would have no problem understanding them.

What they might have a problem understanding (or even imagining) is the mind-boggling amount of computation required to make those systems do anything useful. Getting Llama to produce a single token of text takes more calculations than all of humanity did by hand during all of Classical Antiquity.

I think the quartz sand metaphor was to illustrate how advanced our silicon-based technology has become, not just the ML parts.

Imagine all the stuff... transistors, Turing/Von Neumann machines, lithography, theoretical computer science, OS and compilers, the Internet... and lastly there's modern day machine learning that builds on top of all the above.

The base level stuff isn't exactly protons and electrons, but given the nanometer scale of our chips, it's not that far away from the truth, and we (humanity) has somehow built amazing stuff on top of that.

The basic of concept how such a thing is powered (electricity) is so far removed from anything people did in ancient times that it would be hard to get them to understand that neither gods nor magic are invovled.

Smart "intellectual" people would certainly be willing to challenge basically everything they assume about nature, but I don't think your run of the mill farmer would be able to do that.

Electricity isn't much different from water. The concept of a water mill has been around a long time. So understanding the functional ideas wouldn't be all that difficult. As far as Gods I think you will have a tough case minimizing their involvement even today.

Well we are wizards who speak in arcane languages to thinking rocks to convince them to do our bidding. We speak to golems.

That would be like explaining a baking recipe by starting with protons and electrons

It would be like explaining a baking recipe just talking about wheat and flour and heat. The point is that from first principles, ML is a huge jump. From first principles, baking is not.

Sufficiently advanced technology is indestinguishable from magic...

Kudos for caring about alt tags. Blind user here. While we are at it. I was also thinking how I could make good use of the new vision models. And after a while...

https://github.com/mlang/tracktales

The fact I have a computer generate spoken narration for my MPD playlist with descriptions of album art included just blows my mind. 2023 was indeed a fucking milestone.

That's awesome. GPT vision is unreal, to me. It's neat that you can finally get the computer to describe the album covers, something that no-one would have thought to bother with previously as it seems so trivial. A lot of album covers are surreal, so you're really going to put GPT to work there!

My grandma was blind, and I just spent 6 months looking after a guy who was blind (but has now had surgery and beat me in the eye test at the doctor's!), so I think about blindness a lot when I design.

It's all about incentives. $1 million is a lot of money. The vast majority of hard problems don't have much brainpower dedicated to them, because the bang/buck ratio doesn't work out. Machine learning, math, and adjacent fields already have many careers that pay very well, so getting top-notch experts to dedicate their attention to what might be a futile endeavor is difficult.

And this isn't only about the monetary value itself, but also the fact that a large cash prize attached to a challenge boosts the prestige of finding a solution. Nobel Prizes come with about a million bucks on top of them, after all.

I'm quite confident that if someone offered $100 million for deciphering the Voynich manuscript or Linear A, we'd have a solution within 3 years.

Not to be argumentative, but $1M isn’t very much money, certainly not for a project of this scope. It’s a testament to the creativity, competence, and dedication of those involved they’ve gotten this far with such little funding. Hopefully their early success will attract more resources to this very worthy project.

it's $700k divided three ways, too. $234k is well within FAANG compensation range, but you get to work on such an awesome project.

FAANG compensation range but no benefits and 100x the risk

Make it $8m or $12m and FAANG employees can actually start to justify working on it seriously from a money perspective

Most of them are not good enough to make a dent in this task.

Most of the developers are. They might not have the right background and so it might take an extra year or two to get up to speed with the difficult areas of this project. However most developers I know have the "smarts" to switch to a different complex area and figure it out. Sure most of them are just doing standard CRUD apps that move a bit of data - but that is because that is what we need a lot of not because they can't do something else.

What do you mean by "of this scope"? The winning solution was produced by students and interns who coordinated over the Internet, in less than a year. The problem isn't scope, the problem is attracting lots of bright individuals to work on such a task (for free). And offering a substantial monetary incentive to the winner is probably the best way to do that.

And yes, $1 million is very substantial for an individual. And the cool thing about offering it as a prize (from the point of view of the organizers, that is) is they only have to pay one person or team, although potentially thousands ultimately contribute to the solution, directly or indirectly.

That's not at all what they did. They explicitly made endpoints to doll out the prizes, to ensentivize collaboration. That tactical aspect of the the whole project and how they set it up is worth highlighting on its own.

I think most of all this is a testament to just how much raw talent and intellectual potential is locked up in the winner-takes-all dynamics and shortsightedness of the stock market. Imagine the exploits and results in a world where everyone had the baseline resources and opportunities for extra funding for pursuing niche interests.

You discount passion as motivator.

But it's not clear what current tech could help with. Machine learning can't be applied to something you don't have training data for.

I'm 90% sure the people that did this project did it because they got nerd sniped by it and got to hang out with nat while earning a reasonable salary

One aspect of archaeology that I really find fascinating is the practice of leaving certain artifacts unexplored. The original discoverers of the scrolls tried to unroll a few, apparently found it was impossible without completely destroying the scroll, and then just left the rest undisturbed. Rather than pushing forward and destroying everything, they left these as a mystery for a future age. Two centuries (!!) later we can finally begin to understand these, with the aid of technology that would be utterly unthinkable to those people who very thoughtfully restrained themselves.

Rather than pushing forward and destroying everything

In the early days they wouldn't have accomplished anything by pushing forward, so it doesn't take all that much restraint.

I'm more impressed by people in, say, the 1990s or early 2000s. They might've had a shot but there was still too much risk, so they restrained themselves until it was a safer bet.

On the other hand, we ground up mummies for paint to the point that we ran out and used fresher corpses to meet demand.

It is a bit of miracle that they were preserved, and not just discarded.

Where can I read more about this? Frankly I'm a bit surprised that I've never heard about this considering how shocking it sounds.

Wild, isn't it? Hands down my favorite historical fact learned in 2023.

https://en.m.wikipedia.org/wiki/Mummy_brown

Wild? Wait until you read about people actually eating mummies and corpses

https://www.nationalgeographic.com/history/article/mummy-eat...

Wow. I like aged cheese but that is a bridge too far.

The things Futurama made me look up on Wikipedia...

Worse than that, lots were ground up (and consumed) for medicines.

In one of the Futurama episodes, Fry eats one of Farnsworth's mini mummies and Farnsworth is upset because he wanted to eat it. Fry said it tasted like jerky I think.

Yeah, I can't give the King's men much credit here. They destroyed a lot of scrolls, and it was only because they weren't getting much of anything that they stopped and abandoned excavations or focused on digging out sculptures they could show off (many now in the Getty Museum - great museum, but I did feel a bit melancholy thinking about the scrolls while I was there in 2019).

Similarly, there are large sections of Pompeii, which remain unexcavated -- left for the future.

Herculaneum, where these scrolls are from, is 75% unexcavated! And it will likely remain this way for some time, as Naples sits right on top of it.

The town of Ercolano sits on top of it. Of course effectively it’s a suburb of Naples these days

An example of the same thing at a macro level:

https://www.smithsonianmag.com/smart-news/archaeologists-reb...

Bigger yet by far https://en.wikipedia.org/wiki/Mausoleum_of_the_First_Qin_Emp...

He of the terracotta army. Not excavated yet for fear of damage, but I would so love to know...

The feeling you get when you’ve gone into one of those aircraft hanger-size buildings and then you see some of the information they’ve gotten with ground tests ( radar, mercury, etc ) is wild. The site is huge.

One of the suppositions is that the main chamber contained a model of his entire kingdom, replete with rivers of mercury.

So yes. Archaeology is a bit destructive, and sometimes the destruction can go both ways. Proceed with caution.

One aspect of that time period is they absolutely idolized the romans. A lot of education at the time consisted of learning latin and at the same time people were well aware that only a fraction of the classical texts had been preserved. I find it very believable that they understood the significance of preserving and potentially unlocking these scrolls.

Over 2000 years ago a chap called Philodemus sits in the library of a luxurious villa owned by a rich guy who likes collecting art and writing. He writes his thoughts on pleasure, and the relationship between the quantity of something and the pleasure that might derive from it. The scroll goes on the shelves with the others. He writes lots. At some point later the villa is covered by lava from mt vesuvius.

2000 years later we scan the carbonised scrolls with (basically) magic rays and use thinking machines to reconstruct what Philodemus wrote.

I wish we could tell him. Sounds like he was a thinker, he would really appreciate it.

I have too wild imagination sometimes. I picture him with a shocked look on his face, similar to what one of the modern 'thinkers' would have if they forgot to clear their browser history and somehow someone restored it in the future.

"You recovered... uh... everything?"

That recovery is much easier though, it is using the device as intended.

And of course, rm just unlinks, doesn’t actually delete, so even going a step further and recovering deleted content is hardly magic.

This is more like if, sometime in the future, they somehow successfully reconstructed a snapshot of our computers’ volatile memory by examining the power supply, or something ridiculous like that.

Big if.

We’ll lose a lot of digital data simply because we won’t have the means to read it. CD-readers aren’t manufactured anymore in volume. It’s easy to imagine society in 40 years not having any CD readers handy but having a bunch of CDs they want to read. Now multiply that by all the funny storage formats we’ve created over the years.

No need for a CD reader if you have a CT scanner and software that converts those ridges into bits. The bigger question is how well preserved those CDs will be.

I’m almost certain it is impossible to actually do what I said. But then again, I bet anybody 2000 years ago would say the same of reading scrolls that have been consumed by a volcano!

rm just unlinks, doesn’t actually delete,

On HDDs. On SSDs it'll lead to now-unusued space getting TRIMed which actually erases the blocks. Back to scraping the papyrus.

Your comment prompted me to go in search of something I'd seen several years ago: something about an advertisement in Pompeii for prostitutes, or something like that. Anyway, I couldn't find exactly what I went in search of, but I did stumble upon this oddly specific, yet interesting, Wikipedia entry:

https://en.wikipedia.org/wiki/Erotic_art_in_Pompeii_and_Herc...

Priapus had it goin' on! Reading the Priapeia for the first time is a treat...

something about an advertisement in Pompeii for prostitutes, or something like that

Maybe something along these lines?

https://en.wikipedia.org/wiki/Lupanar#Graffiti

Epicureans were atomists (https://en.wikipedia.org/wiki/Epicureanism#Physics), so if you explained it to him, he might not be nearly as shocked as most people of the era would be. Epicureans also tended to invoke extremely insubstantial 'images' composed of especially tiny gossamer arrangements of atoms as explanations for things like dreams, and you can see how well that would work as an analogy with using X-rays to look at subtle changes in the arrangement of atoms in charred scrolls. (These are covered in https://en.wikipedia.org/wiki/De_rerum_natura - if you have some time, I highly recommend reading Stalling's rhyming translation. You'll be shocked and admire the rationality & scientificness of Epicurean materialist atomist explanations of the world, even where they get it totally wrong.)

It's a testimony to the power of mind equivalent to Einstein's 'Gedankenexperimenten'. The only reason they got some of it totally wrong is because they lacked the scientific apparatus to test their hypothesis, but they were on the right track.

... and what would he think of us, 2000 years later thought has led us to amazing advances in technology that would be inconceivable to him. But the sort of things the people of his day thought about, virtue, happiness, how to live the right way, not a lot of progress has been made. He might be surprised by that. In his time at the dawn of written thought-for-thoughts sake, they might have reasonably expected that soon people might think their way to a golden age of happiness and contentment. But 2000 years later we have learnt that you dont seem to be able to think your way to happiness.

I think we have learned not that you can’t think your way to happiness, but that for the sake of the consumer economy and worker productivity it’s best that you don’t.

Ash, not lava.

Maybe in 20 years when resurrection is possible (singularity event) we'll be able to let him know?

This is the coolest thing I've read this year. It reads like science fiction. Who would even imagine it's possible to read text from a 2000 year-old rolled up burnt-crisp paper?

Speaking of "reading like sci-fi", what's that book where they scan an entire library of books, descructively, by feeding them into a "book chipper" like device that chops the books up into little pieces, vacuums those pieces up and scans the pieces as they flow through, reconstructing the original text by putting the scanned results together like so many jigsaw puzzles? It was a subplot of the book, but I can't for the life of me remember what book it was.

FYI it seems ChatGPT could have answered this for you.

The book you're describing sounds like "Rainbows End" by Vernor Vinge. In this near-future sci-fi novel, set in 2025, one of the subplots involves a project called the "Library Project," where the UCSD (University of California, San Diego) library decides to digitize its entire collection. The process is somewhat as you described: books are destructively scanned by being shredded into tiny pieces, which are then scanned and digitized, with the text being reconstructed from the scans. This process is a part of the broader themes of the book, which include the effects of technology on society and the concept of "wearable computing" and augmented reality. Vernor Vinge, a retired San Diego State University professor of mathematics, computer scientist, and Hugo Award-winning author, is well-known for his works in the science fiction genre, especially for exploring the concept of the technological singularity.

I'm not surprised ChatGPT can answer it - I'm not sure why, but _Rainbows End_ is one of the most commonly-asked about SF books like that. Everyone remembers the book-tornado doing shotgun sequencing, but they can never remember its name or anything else that happens. I guess that's the problem with having a technology whose mental image is so compelling but also mostly disconnected to the rest of the book. (I know I can't tell you much about the rest without rereading the WP entry.)

Rainbow's End, and Vernor Vinge in general, probably also fall in that category of referenced often, yet not actually read. Somebody actually reads the book, finds the one cool quote or example everybody likes, that idea gets repeated a huge amount, and most people don't actually read Rainbows End. Yet you hear about it so much tangentially, you feel like you have (or must have).

http://www.technovelgy.com/ct/content.asp?Bnum=1109

Rainbows End by Vernor Vinge (ChatGPT helped with the search)

Vacuuming up an entire library of books, chopping them into little pieces, reconstructing the original text by putting together the little pieces like so many jigsaw puzzles?

That's not a sci-fi novel, that's OpenAI's business model!

I’m now wavering a bit on my earlier dismissal of people freezing their bodies for an indeterminate future revival. I could probably get into a science fiction story with this premise:

Instead of relying upon machinery, some zillionaire has their body dry frozen and stashed in a lunar south pole crater, with a foundation funding interstellar propulsion research to move the body to the coldest stable points discovered along the way towards the Boomerang Nebula (1° Kelvin) and research to revive back from burnt-crisp state.

The foundation incites all sorts of advancements along the way like working out practical fusion and ever more exotic energy generation, AGI, gravity manipulation, Drexlerian nanotech, Dyson swarm, star wisps, self-modifying bodies and so on, in its quixotic quest to fulfill its mandate.

I kinda wanna write an horror story about people freezing their bodies or heads only to be revived in the future bat shit insane from the excruciating experience of existing for several decades in a sort of limbo...

read the jaunt by stephen king

Yeah big influence... I may need to change my idea further before writing anything otherwise it's too close to plagiarism. Black Mirror got away with it tho with the Cookie episode...

It’s a 270 year archaeological and technological culmination. The scrolls were dug up in 1752. It took the collective developments of the Industrial Revolution, the sciences all our engineering and manufacturing prowess to discover, preserve and scan the scrolls. Then the final cherry on top of the current AI revolution that can create inferences and connections that are beyond the human mind to even understand. And out pops 2000 year-old wisdom of the ancients.

One other thing that was required: the patience not to ruin it.

So much has been lost to well-meaning archaeologists who dug up and threw away things that they didn't think were important. They tried cleaning and preservation techniques on artifacts without testing, sometimes ruining them in the process. They ripped things out of context, and "restored" them based on guesses that were sometimes flagrantly wrong.

Of course they couldn't be expected to know everything that would come in the future, so blame can sometimes perhaps be muted. But it's especially positive that they extricated these particular objects very carefully and just waited for a way to extract information that they could hardly have hoped for.

It is a bit fitting that it turns out the scroll is about the relationship between enjoyment and abundance.

It looks like, from what we can gather, the author decides that should something be hard to get, that doesn’t lead to greater enjoyment. But, it seems that the archaeologists have found an awful lot of joy in how “rare” access to these scrolls is!

I remember first reading about Herculaneum Papyri more than a decade ago, and pondered about them being read one day. After all, research into virtually unwrapping these scrolls had been ongoing since 2007 (https://en.wikipedia.org/wiki/Herculaneum_papyri#Virtual_unr...), but I certainly did not expect it to happen so soon. Exponential technology acceleration once again proves itself true.

We estimate that the scrolls we have in Naples contain more than 16 megabytes of text. Some members of our papyrology team say that revealing this text will be the greatest revolution in the classics since the Renaissance

Amazing achievement, let's hope the Italian government allows for additional excavation of the villa.

they likely would, Pompeii and Herculaneum are _still_ being excavated after two centuries, it's not like things are still.

But we have only read 5% of this scroll and there are a ton more already excavated, it will probably take years before we manage to process what we already have.

it will probably take years

In the direction things are going ... maybe a few months :)

The biggest bottleneck is getting the experts to read it. You need a decade or so of graduate level education and interpreting things takes apparently quite a long time.

Maybe that's another AI application.

As much as I'd like to endorse study of the classics, I'm almost certain that AI will be better at interpreting the texts than humans very soon.

GPT 4.0 isn’t even remotely close to being useful at all in this case so I have some doubts about ‘very soon’

I doubt it. Most universities have some professor in this area and a few graduate students. There is very little else for them to read (the entire of all classic books we have fits on a shelf), and most of them can read the original language. They will read this as soon as it is available just because it they don't have much else to read that is a primary source material. Thus everything new is of great interest because there is so little to work with.

There’s more to processing than scanning, it has has to be reviewed, transcribed and translated by linguistics experts, and then analyzed and studied by academics and researchers who can put in context and integrate it with what we currently know about history, cultural, philosophy, etc of the time.

If you can automate the input, you can probably automate much of the basic analysis (things that would be "revolutionary" to undergrads).

"ChatGPT, give me the highlights of these ancient Greek scrolls ..."

The big problem is that the Villa of the Papyri is underneath modern buildings. That doesn't mean that excavation without demolition is impossible (see the Scavi underneath the Basilica of Saint Peter), but it makes things far more difficult.

If the prospect is very high to multiply by several times the total remaining classical works, I doubt that the money will be particularly hard to find ?

David W. Packard (HP heir) has been trying to throw money at doing this for years, so the money isn't as much of an issue as you'd think. The larger issue is that the locals don't want digging underneath their buildings, no matter how careful the excavators are. Also, all the money that would be necessary to excavate has made the project a target for the mafia who wants to get their share.

the money isn't as much of an issue as you'd think. The larger issue is that the locals don't want digging underneath their buildings, no matter how careful the excavators are

This sounds like the money IS a huge issue. How expensive can it be to buy out the locals? We're talking about priceless cultural artifacts

Also, all the money that would be necessary to excavate has made the project a target for the mafia who wants to get their share.

I wonder how far Italy will go once - if - they get rid of the mafia, it is like trying to drive a car with the handbrake on.

“Any sufficiently advanced technology is indistinguishable from magic.”

Absolutely insane the level of wizardry being applied here to turn a lump of blackened, charred scrolls into readable text.

Having only cursory knowledge with machine learning are some of the techniques used in the article only recently discovered or have they been around for a while?

Is it due to us having reached an inflection point with these types of algorithms that they have become more popular and thus we are seeing new ways to apply them to old problems?

Absolutely insane the level of wizardry being applied here to turn a lump of blackened, charred scrolls into readable text.

Imagine what we'll be able to do to brains, dead or alive, in 100 years.

And in 10,000, maybe we'll be reconstructing the light cone. Maybe that's what we are right now. (Not serious, but it's a fun thought experiment.)

This is why I am going for cryopreservation if I ever have the luxury of choosing the way I die.

The part that I do not understand there is: why would anyone/everyone else want you to be alive again? And do not understand me wrong, I would very much be interested to talk to people from various ages. Specific ones, but also commoners. But why should someone want to restore thousands of random people from (e.g) 1284 on their own cost? That only works if those people have big stashes of money that are legally still theirs. And while I understand that some may want to keep ownership after death, I think it is super dangerous to have something like that. Just image, 10% of the world today still belonging to Dschinghis Khan. That cannot be good.

Perhaps in a near-post-scarcity world, resurrecting dead people is seen as one of the great humanitarian projects of civilization.

This.

Not to make a low effort comment, but this.

But that'll depend on things going smoothly and non-evil, non-dictators winning in the end. It'd be horrific if evil and malicious entities won and decided they just wanted to fuck with everyone.

Reverse parenting.

You raise your parents from the dead, they raise their parents, and so on and so forth.

Doesn't apply to raising random people from a thousand years ago, but it's a reason people today might be resurrected in a couple of centuries.

Because death is the enemy, and the Jean Le Flambeur series by Hannu Rajiemi touches on this in pretty good ways. Won't spoil the plot, though.

It does put into perspective though, how distracted and selfish our species can be; oh, let's fight each other because of skin colour, sexuality, ethnicity, religion, fighting wars over resources, etc. Meanwhile people are getting cancer, have disabilities like blindness and paraplegia and generally just...dying, especially when it comes early, after a hard life. It's just so sad and disappointing that we have the resources to give everyone a pretty decent life while we work on solving these bigger problems...but we just don't.

Well, that could go lots of ways. Maybe some rich trillionaire buys you and spawns you into an endless horror simulation. They might be into torture and get off on it.

("No real humans harmed.")

But if the future can reverse the light cone, nobody is immune to that fate.

Who knows what the future holds. These are just sci-fi flights of fancy.

What is this light cone you keep saying?

https://en.wikipedia.org/wiki/Light_cone

Basically reverse time.

I remember learning about ancestor simulations by the vile offspring in accelerando, but reversing the light cone is quite chilling - is there any sci-fi novel that deals with that you would recommend?

Argh, what has Black Mirror done to our sense of optimism?

(I agree with you).

There has definitely been a virtuous cycle between GP-GPU processing capability, algorithms, libraries and software that use that hardware, and researchers working with those tools.

Apparntly the bad news is that the remaining scrolls most likely contain yet more Epicurean philosophy, maybe largely from not-top-rated guys like Philodemus. (Apparently it's possible that the library actually is, or incorporates, Philodemus' personal library.) https://twitter.com/DrFrancisYoung/status/175453630645602754...

That's not necessarily bad news.

Everyone interested in this story should read Stephen Greenblatt's The Swerve (https://www.pulitzer.org/winners/stephen-greenblatt).

It traces the story of a Renaissance humanist who tracked down and translated the Epicurean philosopher/poet Lucretius' De Rerem Natura, which Greenblatt describes as portraying a strikingly modern way of seeing the world.

In particular Lucretius and the Epicureans denied the existence of supernatural causes, were opposed to religious fear, and posited the ideas of atomism and biological evolution. Of course they're better known for their approach to living life, which Greenblatt shows is more sophisticated than sometimes caricatured, and which he portrays as a breath of fresh air compared to the oppressive moralism and hypocrisy of the Church at the time. (Jefferson and many of the American Founders described themselves as Epicureans.)

He goes on to imply that Epicureanism was influential and widespread in the ancient world but suppressed by the early Church, so that we now know little of it.

Anyone, one of the tantalizing parts of the book is where he describes the carbonized and unreadable Herculaneum scrolls, since they were the private library of a wealthy patron of the Epicureans. I think he thinks being able to read the scrolls will really change our understanding of the ancient world.

And remember: if they hadn't been carbonized, they would have crumbled to dust. That's why we only have the texts that managed to get copied. (Anthony Doerr's Cloud Cuckoo Land is a novel about the survival and 21st century rediscovery of an imaginary Greek play, and ... I'll let you read it yourself - https://www.anthonydoerr.com/books/cloud-cuckoo-land)

(Apologies for any errors above, as basically all I know about this subject is what I read in the book!)

Everyone interested in this story should read Stephen Greenblatt's The Swerve (https://www.pulitzer.org/winners/stephen-greenblatt).

It's interesting reading for a layperson, but as with any other pop-history book, one should read this with a heaping plate of salt at hand. (I'm... not sure what that metaphor actually means or if this is an appropriate way to extend it.)

Things are always more nuanced than can be laid out in a sweeping narrative format and the compression required can lose some critical information, even with the best of intentions. There's also just getting things wrong, which most non-historians do and many historians will do on topics that aren't their expertise.

I'd read this criticism from AskHistorians (not infallible, I know)

https://old.reddit.com/r/AskHistorians/comments/ejfxe5/comme...

The "grain of salt" reference relates to some antidote, which contained a grain of salt. The reduction to "handle with care" is modern.

So the extended metaphor makes no literal sense according to the Pliny text, but it makes sense according to our interpretation of it, which is what matters.

I realize citing Wikipedia risks some serious error, but my impression is that by late antiquity (after AD 200), the main philosophical systems in the Roman world were Christianity and Neoplatonism (itself heavily influenced by Christianity) and to a lesser extent Stoicism. Stoicism, Epicureanism, and Middle Platonism were more characteristic of Classical Antiquity (200 BC-200 AD). The Wikipedia page on Epicureanism[0] supports this impression: "By the late third century CE, however, there was little trace of its existence.[7] With growing dominance of Neoplatonism and Peripateticism, and later, Christianity, Epicureanism declined."

[0] https://en.wikipedia.org/wiki/Epicureanism

You might also be interested in the Charvaka school of ancient India [1], which is a close counterpart to Epicureanism. The Charvaka school was likewise influential and widespread, and it likewise become obscure over time, for reasons I don't know.

[1]: https://en.wikipedia.org/wiki/Charvaka

That’s quite premature, but even if we are looking at a personal library containing only personal writings, you’d be looking at a massive increase of information on the ancient world, like a neural map of a single ancient mind that contained all their experiences and thoughts.

The worse case would be that it was 800 copies of the same scroll waiting to be sold off to other libraries.

All the 847 chapters of Philodemus fan fiction of MLP (my little Plato)

It says, its says.... "Drink Ovalteenus"

In all seriousness though, I find this such an amazing project to follow regardless of the outcome(s)

Probably true, but there are more rooms in the Villa yet to be excavated. What we have is essentially a bookshelf in a larger library. If it is sorted by alphabet, it might be representative, but what if it is sorted by topic?

Here is the link to their "master plan" to read all of the excavated scrolls: https://scrollprize.org/master_plan

It looks like there are two main bottlenecks to reading more: the need for manual intervention in segmenting the scanned scrolls, and the cost in scanning new scrolls.

Funding is a huge one as well. Funding is the wheel that drives the project (source, have been hanging around the project people for a little while).

If you know anyone that would help chip in for the Phase 2 of the project (scaling up, please let Nat know! (not directly affiliated with the project management team, just pointing to him as a great contact for that.... <3 :')))) ) )

It seems "weird" none of the mega rich has committed a few million dollars for this, it looks like a very good way to build a legacy while benefiting humanity, and e.g. Bezos would probably find a million dollars behind the couch pillows.

Bezos funded the reading of the Archimedes palimpsest, didn't he? I guess this would be up his wheelhouse then. Unless, of course, he is only willing to finance of the decryption of his own property...

it's almost nauseating to me that every month or so our nation deminstrates it is capable and willing to collectively chip in enough money to turn one random nobody into a near-billionaire, muvh of which gets promptly vaporized on drugs and tacky status symbol purchases for themselves and maybe some immediate family, when the same money would fund a hundred Vesuvius Challenges a year at several times the scale of this project.

What a refreshingly clear and thought-out plan. This project honestly gives me a lot of hope.

Yes. Now it can be done, but costs too much. Once they get a scanning unit near the scrolls, it will be much cheaper. The data reduction will probably get cheaper, too.

Scanning unit? Seems like the scanning was done using a synchrotron beamline. Maybe there is a suitable beamline at Elettra. I haven't looked closely why the synchrotron is needed. A Sigray instrument might work here, or even something simpler.

As for scanning: $30mm doesn't seem like a ton of money to scan 800 scrolls with untold history and other works, compared to other uses of that amount of money I could name now. Maybe someone will donate that cost and perhaps all the scrolls can be transported at one time or in a few bigger groups to be closer to the particle accelerator. Another million bucks and I bet you could build a climate-controlled container to take them all at once, or something. If I had $30mm I would definitely donate to this cause, it seems like one of the best uses of that kind of money I can think of. That would bypass the need to research and develop a bench top scanner or another solution. You could even crowdfund this!

As for segmentation: get some sort of collective solution going, like the Seti@Home did, but for people who are bored as hell, instead of them scrolling Reddit or Twitter all day. Maybe do it like a CAPTCHA so you get it done for free? I'd segment for a few hours a month if I had the ability to do so.

This is a cool project that has taken a community to build to this point, why not try and open and expand the collective of humans working to understand the scrolls? Get millions of people involved and you don't need to rely on technological crutches and development, though that is not the worst way to go either.

At $30mm, they'll have billionaire philanthropists lining up around the block to get their name on this!

I was recently reading the Greek Myths series by Stephen Fry and he makes a point of how there are so many stories we know of but have been lost from ancient texts. Stories and authors that were famous enough to be mentioned by multiple other authors but which themselves have been lost. This collection of scrolls could contain some of those lost stories and the possibility of that is terribly exciting.

Not just works of fiction or mythology but also histories, and works of science and philosophy.

I want Aristotle's treatise on Comedy, and if I'm allowed to be terribly greedy, just one more play by Aeschylus.

This is the most exciting thing in the world to me right now, these scrolls, along with the thought that there might be literally thousands more still in the ground.

Might want to put on a pair of gloves and a respirator if you find that Aristotle volume. Some people were pretty offended by it, I understand.

stories we know of but have been lost from ancient texts

So many there's a lengthy list on Wikipedia about it. It's fascinating reading ancients casually referencing works that we otherwise know nothing else about. Without the careful, laborious copying (often imperfect) over the centuries most things would've been lost completely. There's also other works such as maps that did not survive, the Tabula Peutingeriana for example is thought to be a derivative work of one commissioned by Augustus of the known world at the time (to Romans) and of which there's a few mentions in some works by historians at the time.

https://en.wikipedia.org/wiki/Lost_literary_work

A great example about lost work is that the insights we have onto Viking mythology was pretty much documented by a single guy, Snorri Sturluson. What we know about Norse mythology is just a tiny piece of their mythos, as they didn't have the habit of writing down their tales/legends/stories and most of it got lost after they converted to Christianism.

Seutonius' Lives of Famous Whores is a lost text I've always hoped we would recover at some point.

I would love to read the Telegony. Homer did such a good job with episodes I and II that I'm really curious how the story ends.

And those are just the known unknowns. Besides those, to borrow from Rumsfeld, there are the unknown unknowns.

I was ridiculously excited when I first read about this in October (if I remember correctly) last year, when a few of the first results were beginning to pop out. I found the methodology fascinating. First of all the digital unwrapping of the scrolls, then the recognition that crackling in the paper was the sign of ink, and finally putting together a model to detect it, piece by piece. I need to look into the final repository to understand what exactly they did, but they seem to have used a TimeSFormer. I'm confused by this choice as I thought it was for video. How did they apply this to images? In the end though, what a wonderful day for archeology. These young minds deserve a huge round of applause for what they have achieved.

my understanding is that the scan they did on the scrolls returned the layers themselves. Like so:

```

xxxxxxxxxx <- The surface of the scroll

xxxxxxxxxx

...

xxxxxxxxxx <- The bottom of the scroll

```

So, by tiling the image on the surface you get data that is size_x * size_y * n_layers. So, it can be seen as a video stream with size_x * size_y * 1 channel * n_layers where the layers replace the temporal dimension.

The scans used for the grand prize look like this : https://scrollprize.org/img/grandprize/scroll1.mp4

It's a cut through the scroll, with the time dimension in this video representing the location of the cut along the scroll lengthwise.

As you can see from the mess it's far from trivial to find the surface of any of the sheets in the scroll, often they layers are blended together messes.

You may be thinking of the scans they used of an unwrapped sheet, those were as you describe and were used to help figure out methods for the real challenge.

I wonder what that perfect circle gap is at 0:06, south-west from center?

That's amazing! I wonder if the solution technique has any bearing on potential improvements in diagnostic CT / MRI scans?

They explain it on the methodology sections. The scans result in a stack of tiff images that can be rendered as videos of the scan or as 3d models.

ditto about being excited!

I still think this methodology can be tested by creating equivalent carbonised scrolls, where you write some specific text and check the result against what you wrote. Ie you run your test artifacts through the scanner and software, and the process should return you your test text that you wrote before carbonising the scroll. At that point you can be assured that the process is not making stuff up from noise.

But, without that process, you can have no comfort than what is occurring is valid. The software might reliably see a letter in some noise, but so? It doesn't mean the letter is actually there... One can't verify the scroll, and one hasn't verified the process.

I think this suggestion of mine was downvoted last time too, but I don't get why!

One would always want to test stuff in software development, especially if it was fraught and can easily be tested.

Mostly there are no tests to be undertaken in history -hence it it's so much hearsay. But here is an opportunity to gain some genuine certainty, in a way that is normally unavailable! The implementors of this method should absolutely test their process!

You're being downvoted because they literally did what you suggest already.

It does not say that in the article - this one and on a previous HN submission where I read about this.

However, you are right - if you go to Tutorials and Scanning there is reference to the creation of a 'campfire scroll'. And now we have some detail..... and the detail is problematic.

In tutorial 3, halfway down this page (https://scrollprize.org/tutorial3) there is a before and after comparison of the scroll. They ask a question "If you look back to the last page of the campfire scroll (before carbonization), can you see which area of the scroll this segment came from?".

This is mean to be obvious to answer - and one does have the 2 images to compare.

My thoughts on the comparison is - yes at a glance the there is a section that appears to match up - the 'angular squiggle' next to the @ symbol. However, if I look more closely at the 'angular squiggle' I see features and spaces in the generated image that do not correlate with the photo before carbonisation. The troughs are too deep, the spaces are too big. It seems to be a superficial similarity only.

I wish I could show what I mean by referencing the images.. But I will provide links for others to see what I mean when I say there is a superficial correlation only.

https://scrollprize.org/img/tutorials/vc-segment.png

https://scrollprize.org/img/tutorials/campfire-last-page.jpg

Final thought - why not map the 2 images one over the other on the site? Why ask a leading question rather than provide a proof? I hate that kind of presentation - it smacks of providing enough information for someone to make an incorrect snap judgement.

I don't know man... implying people are publishing nonsense is what smacks the most here. If you think they are over fitting, show it, don't just air doubts idly.

AFAIK the scanning process is pretty expensive

The methods are incredible, but it seems like the text is ordinary and nothing to write home about

Not too surprising, but I like the random ancient texts

https://en.wikipedia.org/wiki/Complaint_tablet_to_Ea-n%C4%81...

There really is one for everything:

https://xkcd.com/2758/

We don't decipher ancient texts to have our minds blown about the universe but to know more about what life was like back then and to conduct historical research about it.

I dunno, I found it pleasurable.

Sturgeon's law would imply that the overwhelming likelihood that any random unknown text we find from the ancient world _in situ_ is likely to be not be very good.

Most of what we have left from the ancient world is material that people felt worth copying for _centuries_ after they were written. That's a fairly amazing quality filter and gives people a skewed perspective on the overall quality of material written at the time.

A selection of work from a random library has also likely to be filtered for quality to an extent, and for the most part, the "good stuff" is going to be works that we _already_ have copies of or fragments from. Anything we don't already have a copy is is most likely going to be something that there weren't many copies made of, and usually there's a reason for that.

Which isn't to say that anything we find isn't going to be interesting for other reasons -- even bad writing is going to be incredibly useful for historical research.

Given that AI is able to hallucinate, how can we be convinced the results are accurate? Did they create a new scroll, burn it, and compare the results to what was actually written?

read the article

Thanks. From the article, for anyone else skimming who had the same question:

    Technical reproduction. The Vesuvius Challenge Technical Review Team reproduced the winning submissions manually. We made sure to clearly understand every part of the code, and that when we run it independently we get similar output images. Since all code and training data is now open source, you can do the same!
    Multiple submissions of the same area. You might have noticed that all submission images above show the same area of the scroll. This is because we released 3d-mapped papyrus sheets within the CT-scan (“segments”) created by our segmentation team, which were then used by all contestants. The resulting output images — created by different ML models and training labels — have produced extremely similar results. This holds not just for the winners and runner ups, but also for the other submissions that we received.
    Small input/output windows. The ink detection models are not based on Greek letters, optical character recognition (OCR), or language models. Instead, they independently detect tiny spots of ink in the CT scan, the writing appearing later when these are aggregated. As a result, the text appearing in the images is not the imagined output of a machine learning model, but is instead directly tied to the underlying data in the CT scan.

tl;dr: cross-validation between competing submissions

Nobody used generative LLMs at any step of this process...

The ML models were trained without knowledge of the scrolls' language. The models extracted images, and human experts were able to read the images as text. There was no text corpus fed in that could be leaked into the output.

I was worried it would contain something boring like tax records, but it's even worse than that. Fingers crossed it gets a little more interesting than the average social media post.

I'm pretty sure there are academics who would bite their own legs off for tax records from this period.

Isn't that what most ancient Roman texts we do have are? Tax records.

Do we have any or any significant number of Roman tax receipts? Most Roman/Greek texts that we have had to survive until ~1000 AD. If a text was available back then there is a reasonably good chance that we have it (or rather a medieval copy of it), why would anyone waste time copying tax receipts though?

Herculaneum was one of the highlights of my trip to Italy with the wife. I didn't realize the scope of just how much ash and soil had to be removed for excavation. It was dozens of meters [1]. It's an absolute shame that the site is given a fraction of the attention that Pompeii receives, I thought it was vastly better preserved and truly awe-inspiring [2].

I highly recommend spending a few hours wandering the site, it is an absolute wonder.

1: https://www.icloud.com/photos/#08dJAA5eM9jpbhlEa3fzkl5ng 2: https://www.icloud.com/photos/#076Pof4FziA7WgcI8hZrGZmzg

The modern Italian town of Ercolano lies just over Herculaneum, so excavations of the rest of the ancient town are a bit tricky. Only about a quarter has been excavated so far, in contrast to Pompeii, which are two-thirds out.

I enjoyed the attention given to Herculaneum in a computer game called Rome: Pathway to Power (released in 1992). You start the game as a slave who has to escape Herculaneum before Vesuvius erupts. I loved the game as a kid. It's sort of like an isometric immersive sim (with a clunky interface). It got me interested in ancient Rome.

I hope to visit Herculaneum some day.

Quite incredible work, with the original breakthrough model being trained on a 1070: https://twitter.com/LukeFarritor/status/1754532281690243339

Large Language Models have skewed the perception on the amount of compute required to do useful things with ML.

my favorite fun fact about AI is that computer vision has been able to outperform humans at recognizing handwritten digits for more than 30 years

I could help but wonder while reading this how everyone involved would feel if the "last step" involving the papyrologists was automated as well.

Amazing? As whomever made it would have likely created an incredibly powerful generally useful tool. That would be much more exciting than probabilistically distinguishing layers of ink in soot.

We truly live in a golden age of physical and mathematical discovery. To get here required many thousands of years of technological development.

The developers of the transformer, as a group, should win some sort of significant prize; it has had more impact in a short time than anything I've seen before. Will we find better architectures in the near future?

Just don’t say something like this out aloud near Gary Marcus.

First word discovered in unopened Herculaneum scroll by CS student - https://news.ycombinator.com/item?id=37857417 - Oct 2023 (207 comments)

The Vesuvius Challenge - https://news.ycombinator.com/item?id=35322809 - March 2023 (32 comments)

Vesuvius Challenge - https://news.ycombinator.com/item?id=35169869 - March 2023 (32 comments)

From today there's also this article, which maybe goes into more background (I haven't checked):

Can AI Unlock the Secrets of the Ancient World? - https://news.ycombinator.com/item?id=39261465 - Feb 2024 (1 comment)

and this tweet which presumably covers the same ground as OP:

The $700k Vesuvius Challenge prize has been won - https://news.ycombinator.com/item?id=39261933 - Feb 2024 (2 comments)

This is amazing and very, very exciting!

It is wild to me, though, that if I have an SSD fail it's essentially unrecoverable, but a 2,000-year old, rolled-up, lava-burnt scroll of Papyrus can be read using Technology™! I love to see it!

Are you sure it wouldn't be recoverable using something on a similar level than they did (particle accelerator scanning) ?

An awesome achievement. I don't know if it's possible but I wonder if they could do energy-resolved CT and try to look for a differential absorption edge in that ink. I have no idea what material it is actually made of, but if the IR photo is different, then there's a nonzero chance they could generate quite significant contrast by hitting one of the K-shell edges of a characteristic ion (iron?) present in the ink but not in the papyrus itself. I've been to Diamond in the past but don't know enough about this experiment. Some of them are definitely energy resolved.

Utterly brilliant. I'm so glad it appears to be a bit of scholarly writing too, which is what I know most classicists secretly love!

The ink is carbon-based, without iron. However there's plenty of contrast when viewed in infrared.

tl;dr

The general subject of the text is pleasure, which, properly understood, is the highest good in Epicurean philosophy. In these two snippets from two consecutive columns of the scroll, the author is concerned with whether and how the availability of goods, such as food, can affect the pleasure which they provide. Do things that are available in lesser quantities afford more pleasure than those available in abundance? Our author thinks not: “as too in the case of food, we do not right away believe things that are scarce to be absolutely more pleasant than those which are abundant.” However, is it easier for us naturally to do without things that are plentiful? “Such questions will be considered frequently.”

I first looked at the results and thought "this is kinda cool". Then I proceeded to read about the whole competition and how thoughtfully it was organized and thought that "this is extremely cool". I wonder what else could be achieved with such a well designed incentive program.

So excited to see how much text is recoverable and to what extent this can bolster our collection of Epicurean writings!

TIL that I need to now shred my secret documents before throwing them into the burn barrel. Also, possibly, mix the resulting ashes into a slurry and dump into the sea.

My God, once in a while, you need to read something like this, reflect, ask the question "can I do better in my work ?", such an inspiring story of technical feat, persistence, and ingenuity.

This is really cool! Is the output resolution limited by the granularity of the CT scans?

> “as too in the case of food, we do not right away believe things that are scarce to be absolutely more pleasant than those which are abundant.” However, is it easier for us naturally to do without things that are plentiful? “Such questions will be considered frequently.”

This reminded me, since they're scarce but also abundant... Has anyone actually eaten these giant waterbugs at Nue in Seattle? Is that like, a reasonable thing to subject a date to?

Incredible effort. Text of scroll is essentially "drink more Ovaltine" which is on theme.

Netflix documentary soon please!

Big ups for the winners - this is so cool and hopefully can be replicated for deciphering many other lost manuscripts.

The scroll segmentation looks like it'd be a very manageable task for distributed volunteer work too, at least as a kickstart if that's really the bottleneck now. Just a 100 or 1000 halfway dedicated volunteers (small by internet standards, for a useful and simple task) could make a big dent. Don't know how many scan layers are in a single scroll, but for instance Project Gutenberg's proofreading network has processed millions of pages, one by one.

Pretty cool engineering to pull this off

Lack of information retrieval technology ≠ loss of information.

It’s weird to see my home town on hacker news, kinda nice considering its historically rich but not technically relevant

This is amazing, sci-fi-like-reality. I wonder if they pull off an auto-segmentation breakthrough if the techniques might apply in other areas, like automating neuron mapping (eg. see EyeWire).

This is awesome. But it makes me sad to think that our current DIGITAL artifacts might not survive even 20 years, unless explicitly archived. I'm pretty sure a harddrive or SSD thrown on a landfill won't be able to be resurrected in 2000 years even with alien tech, but maybe? Would be cool if someone studied this :)

But the main problem is probably not reading thrown out drives - it's that stuff is just too transient nowadays. People put something up on the net and decides to withdraw it next year. archive.org can't store the walled gardens, and even if a social network for example wants to archive stuff they might not be allowed to for legal issues (not that it has prevented them before, but anyway...)

The rewriting history comment from Nat Friedman is super interesting. It'll be amazing once this data passes into the hands of historians.

Wow. I remember this being announced and thought it'd be a while or that it wouldn't be possible. Very happy to be wrong!

I'm sure the original authors didn't expect to be incinerated by a volcano (and I'm also sure that the future legibility of their writings would be the very least of their concerns!), but it really bends the mind to imagine their reaction if they could have known how this would all unfold (unroll? Sorry...)

I don't know if anyone who worked on this project is going to read this but if you are: good job! This looked like it was really hard.

I wish billionaires would spend their money on this, rather than the crap that the WEF comes out with.

This is great news for preservation efforts as we have a large set of stuff that can't be opened.

I wonder what this means for the maya codices, many of which are in similar shape: https://en.wikipedia.org/wiki/Maya_codices#Other_Maya_codice...

"4 passages of 140 characters each, with at least 85% of characters recoverable"

oh good grief, even back then, we were limited to this value

Even if I really do think this is an amazing effort, I'll spoil the techno-optimist party here by noting that's a bit sad that this kind of achievements are pulled off only by rich patrons that became millionaires doing another unrelated thing and they just want to scratch a personal itch/curiosity. Yeah, patrons and maecenas have been around forever, but I would really prefer that we as society were mature enough to achieve such things collectively. We did it, at least for a few decades and even if coming from wrong incentives, with space exploration.

This is exciting, it's very unusual for new texts to be added to our collection of ancient literature.

My initial concern was that the details may have been hallucinated but it seems they accounted for that as well!

Great things can be achieved if some smart people organize others to work towards common goals.

I think the biggest success isn't the recovered text but organizing such an endeavor with such success.

Totally inspiring!

Better but less technical writeup: https://www.bloomberg.com/features/2024-ai-unlock-ancient-wo...