He coined the concept 'singularity' in the sense of machines becoming smarter than humans what a time for him to die with all the advancements we're seeing in artificial intelligence. I wonder what he thought about it all.
The concept and the term "singularity" were popularized by Vernor Vinge first in 1983 in an article that claimed that once humans create intelligences greater than their own, there will be a technological and social transition similar in some sense to "the knotted space-time at the center of a black hole",[8] and later in his 1993 essay The Coming Technological Singularity,[4][7] in which he wrote that it would signal the end of the human era, as the new superintelligence would continue to upgrade itself and would advance technologically at an incomprehensible rate. He wrote that he would be surprised if it occurred before 2005 or after 2030.
Looks like he was spot on.
with respect, we don’t know if he was spot on. Companies shoehorning language models into their products is a far cry from the transformative societal change he describes will happen. nothing like a singularity has yet happened at the scale he describes, and might not happen without more fundamental shifts/breakthroughs in AI research.
Imagine the first llm to suggest an improvement to itself that no human has considered. Then imagine what happens next.
OK. I'm imagining a correlation engine that looks through code as a series of prompts that are used to generate more code from the corpus that is statistically likely to follow.
And now I'm transforming that through the concept of taking a photograph and applying the clone tool via a light airbrush.
Repeat enough times, and you get uncompilable mud.
LLMs are not going to generate improvements.
Saying they definitely won't or they definitely will are equally over-broad and premature.
I currently expect we'll need another architectural breakthrough; but also, back in 2009 I expected no-steering-wheel-included self driving cars no later than 2018, and that the LLM output we actually saw in 2023 would be the final problem to be solved in the path to AGI.
Prediction is hard, especially about the future.
GPT4 does inference at 560 teraflops. Human brain goes 10,000 teraflops. NVIDIA just unveiled their latest Blackwell chip yesterday which goes 20,000 teraflops. If you buy an NVL72 rack of the things, it goes 1,400,000 teraflops. That's what Jensen Huang's GPT runs on I bet.
AFAICT, both are guesses. The low-end estimate I've seen for human brains are ~ 162 GFLOPS[0] to 10^28 FLOPS[1]; even just the model size for GPT-4 isn't confirmed, merely a combination of human inference of public information with a rumour widely described as a "leak", likewise the compute requirements.
[0] https://geohot.github.io//blog/jekyll/update/2022/02/17/brai...
[1] https://aiimpacts.org/brain-performance-in-flops/
They're not guesses. We know they use A100s and we know how fast an A100 goes. You can cut a brain open and see how many neurons it has and how often they fire. Kurzweil's 10 petaflops for the brain (100e9 neurons * 1000 connections * 200 calculations) is a bit high for me honestly. I don't think connections count as flops. If a neuron only fires 5-50 times a second then that'd put the human brain at .5 to 5 teraflops it seems to me. That would explain why GPT is so much smarter and faster than people. The other estimates like 1e28 are measuring different things.
That assumes that you can represent all of the useful parts of the decision about whether to fire or not to fire in the equivalent of one floating point operation, which seems to be an optimistic assumption. It also assumes there's no useful information encoded into e.g. phase of firing.
Imagine that there's a little computer inside each neuron that decides when it needs to do work. Those computers are an implementation detail of the flops being provided by neurons, and would not increase the overall flop count, since that'd be counting them twice. For example, how would you measure the speed of a game boy emulator? Would you take into consideration all the instructions the emulator itself needs to run in order to simulate the game boy instructions?
Already considered in my comment.
Yah, there's -bajillions- of floating point operation equivalents happening in a neuron deciding what to do. They're probably not all functional.
BUT, that's why I said the "useful parts" of the decision:
It may take more than the equivalent of one floating point operation to decide whether to fire. For instance, if you are weighting multiple inputs to the neuron differently to decide whether to fire now, that would require multiple multiplications of those inputs. If you consider whether you have fired recently, that's more work too.
Neurons do all of these things, and more, and these things are known to be functional-- not mere implementation details. A computer cannot make an equivalent choice in one floating point operation.
Of course, this doesn't mean that the brain is optimal-- perhaps you can do far less work. But if we're going to use it as a model to estimate scale, we have to consider what actual equivalent work is.
I see. Do you think this is what Kurzweil was accounting for when he multiplied by 1000 connections?
Synapses might be akin to transistor count, which is only roughly correlated with FLOPs on modern architectures.
I've also heard in a recent talk that the optic nerve carries about 20 Mbps of visual information. If we imagine a saturated task such as the famous gorilla walking through the people passing around a basketball, then we can arrive at some limits on the conscious brain. This does not count the autonomic, sympathetic, and parasympathetic processes, of course, but those could in theory be fairly low bandwidth.
There is also the matter of the "slow" computation in the brain that happens through neurotransmitter release. It is analog and complex, but with a slow clock speed.
My hunch is that the brain is fairly low FLOPs but highly specialized, closer to an FPGA than a million GPUs running an LLM.
They might generate improvements, but I’m not sure why people think those improvements would be unbounded. Think of it like improvements to jet engines or internal combustion engines - rapid improvements followed by decades of very tiny improvements. We’ve gone from 32-bit LLM weights down to 16, then 8, then 4 bit weights, and then a lot of messy diminishing returns below that. Moore’s is running on fumes for process improvements, so each new generation of chips that’s twice as fast manages to get there by nearly doubling the silicon area and nearly doubling the power consumption. There’s a lot of active research into pruning models down now, but mostly better models == bigger models, which is also hitting all kinds of practical limits. Really good engineering might get to the same endpoint a little faster than mediocre engineering, but they’ll both probably wind up at the same point eventually. A super smart LLM isn’t going to make sub-atomic transistors, or sub-bit weights, or eliminate power and cooling constraints, or eliminate any of the dozen other things that eventually limit you.
Bro, Jensen Huang just unveiled a chip yesterday that goes 20 petaflops. Intel's latest raptorlake cpu goes 800 gigaflops. Can you really explain 25000x progress by the 2x larger die size? I'm sure reactionary America wanted Moore's law to run out of steam but the Taiwanese betrayal made up for all the lost Moore's law progress and then some.
Saying that AI hardware is near a dead end because Moore's law is running out of steam is silly. Even GPUs are very general purpose, we can make a lot of progress in the hardware space via extreme specialization, approximate computing and analog computing.
LLMs are so much more than you are assuming… text, images, code are merely abstractions to represent reality. Accurate prediction requires no less than usefully generalizable models and deep understanding of the actual processes in the world that produced those representations.
I know they can provide creative new solutions to totally novel problems from firsthand experience… instead of assuming what they should be able to do, I experimented to see what they can actually do.
Focusing on the simple mechanics of training and prediction is to miss the forest for the trees. It’s as absurd as saying how can living things have any intelligence? They’re just bags of chemicals oxidizing carbon. True but irrelevant- it misses the deeper fact that solving almost any problem deeply requires understanding and modeling all of the connected problems, and so on, until you’ve pretty much encompassed everything.
Ultimately it doesn’t even matter what problem you’re training for- all predictive systems will converge on general intelligence as you keep improving predictive accuracy.
LLM != AI.
An LLM is not going to suggest a reasonable improvement to itself, except by sheerest luck.
But then next generation, where the LLM is just the language comprehension and generation model that feeds into something else yet to be invented, I have no guarantees about whether that will be able to improve itself. Depends on what it is.
Yes, eventually one gets a series of software improvements which eventually result in the best possible performance on currently available hardware --- if one can consistently get an LLM to suggest improvements to itself.
Until we get to a point where an AI has the wherewithal to create a fab to make its own chips and then do assembly w/o human intervention (something along the lines of Steve Jobs vision of a computer factory where sand goes in at one end and finished product rolls out the other) it doesn't seem likely to amount to much.
What we're seeing right now with LLMs is like music in the late 30s after the invention of the electric guitar. At that point people still have no idea how to use it so, so they were treating it like an amplified acoustic guitar. It took almost 40 years for people to come up with the idea of harnessing feedback and distortion to use the guitar to create otherworldly soundscapes, and another 30 beyond that before people even approached the limit of guitar's range with pedals and such.
LLMs are a game changer that are going to enable a new programming paradigm as models get faster and better at producing structured output. There are entire classes of app that couldn't exist before because there there was a non-trivial "fuzzy" language problem in the loop. Furthermore I don't think people have a conception of how good these models are going to get within 5-10 years.
Pretty sure it's quite the opposite of what you're implying: People see those LLMs who closely resemble actual intelligence on the surface, but have some shortcomings. Now they extrapolate this and think it's just a small step to perfection and/or AGI, which is completely wrong.
One problem is that converging to an ideal is obviously non-linear, so getting the first 90% right is relatively easy, and closer to 100% it gets exponentially harder. Another problem is that LLMs are not really designed in a way to contain actual intelligence in the way humans would expect them to, so any apparent reasoning is very superficial as it's just language-based and statistical.
In a similar spirit, science fiction stories playing in the near future often tend to have spectacular technology, like flying personal cars, in-eye displays, beam travel, or mind reading devices. In the 1960s it was predicted for the 80s, in the 80s it was predicted for the 2000s etc.
This book
https://www.amazon.com/Friends-High-Places-W-Livingston/dp/0...
tells (among other things) a harrowing tale of a common mistake in technology development that blindsides people every time: the project that reaches an asymptote instead of completion that can get you to keep spending resources and spending resources because you think you have only 5% to go except the approach you've chosen means you'll never get the last 4%. It's a seductive situation that tends to turn the team away from Cassandras who have a clear view.
Happens a lot in machine learning projects where you don’t have the right features. (Right now I am chewing on the problem of “what kind of shoes is the person in this picture wearing?” and how many image classification models would not at all get that they are supposed to look at a small part of the image and how easy it would be to conclude that “this person is on a basketball court so they are wearing sneakers” or “this is a dude so they aren’t wearing heels” or “this lady has a fancy updo and fancy makeup so she must be wearing fancy shoes”. Trouble is all those biases make the model perform better up to a point but to get past that point you really need to segment out the person’s feet.)
Singularity doesn't necessarily rely on LLMs by any means. It's just that communication is improving and the number of people doing research is increasing. Weak AI is icing on top, let alone LLMs, which are being shoe-horned into everything now. VV clearly adds these two other paths:
https://edoras.sdsu.edu/~vinge/misc/singularity.htmlBlackwell.
Stable Diffusion.
iPhone and Android.
Cicero.
Trump.
Ingress.
Twitter.
Atom Limbs.
Neuralink.
---
https://justine.lol/dox/singularity.txt
It has, anyway, already had a profound effect on the IT job market.
He popularized and advanced the concept, but originally it was by von Neumann.
The concept predates von Neuman.
First known person to present the idea was mathematician and philosopher Nicolas de Condorcet in the late 1700s. Not surprising, because he also laid out most ideals and values of modern liberal democracy as they are now. Amazing philosopher.
He basically invented the idea of ensemble learning (known as boosting in machine learning).
Nicolas de Condorcet and the First Intelligence Explosion Hypothesis https://onlinelibrary.wiley.com/doi/10.1609/aimag.v40i1.2855
That kind of niche knowledge is what I come to HN for!
Also "Darwin among the Machines"[0] written by Samuel Butler in 1863, that's 4 years after Darwin's "On the Origin of Species".
Butlerian jihad[1] is the war against machines in the Dune universe.
[0] https://en.wikipedia.org/wiki/Darwin_among_the_Machines
[1] https://dune.fandom.com/wiki/Butlerian_Jihad
Butler also expanded this idea in his 1872 novel Erewhon, where he described a seemingly primitive island civilization that turned out to once had greater technology than the West, including mechanical AI, but they abandoned it when they began to fear its consequences. A lot of 20th century SF tropes in the Victorian period.
https://en.wikipedia.org/wiki/Erewhon
That essay is written by a political scientist. His arguments aren't very persuasive. Even if they were, he doesn't actually cite the person he's writing about, so I have no way to check the primary materials. It's not like this is uncommon either. Everyone who's smart since 1760 has extrapolated the industrial revolution and imagined something similar to the singularity. Malthus would be a bad example and Nietzsche would be a good example. But John von Neumann was a million times smarter than all of them, he named it the singularity, and that's why he gets the credit.
There are some quotes but they guy seems to be talking about improving humans rather than anything AI like:
"...natural [human] faculties themselves and this [human body] organisation could also be improved?"
Check out "Sketch for a Historical Picture of the Progress of the Human Mind", by Marquis de Condorcet, 1794. The last chapter, The Tenth epoch/The future progress of the human mind. There he lays out unlimited advance of knowledge, unlimited lifespan for humans, improvement of physical faculties, and then finally improvement of the intellectual and moral faculties.
And this was not some obscure author, but leading figure in the French Enlightenment. Thomas Malthus wrote his essay on population as counterargument.
Just to clarify, the “singularity” conjectures a slightly different and more interesting phenomenon, one driven by technological advances, true, but its definition was not those advances.
It was more the second derivative of future shock: technologies and culture that enabled and encouraged faster and faster change until the curve bent essentially vertical…asymptotimg to a mathematical singularity.
An example my he spoke of was that, close to the singularity, someone might found a corporation, develop a technology, make a profit from it, and then have it be obsolete by noon.
And because you can’t see the shape of the curve on the other side of such a singularity, people living on the other side of it would be incomprehensible to people on this side.
Ray Lafferty’s 1965 story “Slow Tuesday Night” explored this phenomenon years before Toffler wrote “Future Shock”
Note that the "Singularity" turns up in the novel
https://en.wikipedia.org/wiki/Marooned_in_Realtime
where people can use a "Bobble" to freeze themselves in a stasis field and travel in time... forward. The singularity is some mysterious event that causes all of unbobbled humanity to disappear leaving the survivors wondering, even 10s of millions of years later, what happened. As such it is one of the best pretenses ever in sci-fi. (I am left wondering though if the best cultural comparison is "The Rapture" some Christians believe in making this more of a religiously motivated concept as opposed to sound futurism.)
I've long been fascinated by this differential equation
which has solutions that look like which notably blows up at time t₀. It's a model of an "intelligence explosion" where improving technology speeds up the rate of technological process but the very low growth when t ≪ t₀ could also be a model for why it is hard to bootstrap a two-sided market, why some settlements fail, etc. About 20 years ago I was very interested in ecological accounting and wondering if we could outrace resource depletion and related problems and did a literature search for people developing models like this further and was pretty disappointed not to find much also it did appear as a footnote in the ecology literature here and there. Even papers likehttps://agi-conf.org/2010/wp-content/uploads/2009/06/agi10si...
seem to miss it. (Surprised the lesswrong folks haven't picked it up but they don't seem too mathematically inclined)
---
Note I don't believe in the intelligence explosion because what we've seen in "Moore's law" recently is that each generation of chips is getting much more difficult and expensive to develop whereas the benefits of shrinks are shrinking and in fact we might be rudely surprised that the state of the art chips of the new future (and possibly 2024) burn up pretty quickly. It's not so clear that chipmakers would have continued to invest in a new generation if governments weren't piling huge money into a "great powers" competition... That is, already we might be past the point of economic returns.
I'm also a bit sceptical of an intelligence explosion but compute per dollar has increased in a steady exponential way long before Moore's law and will probably continue after it. There are ways to progress other than shrinking transistors.
Even though we understand a lot more about how LLMs work and have cut resource consumption dramatically in the last year we still know hardly anything so it seems quite likely there is a better way to do it.
For one thing dense vectors for language seem kinda insane to me. Change one pixel in a picture and it makes no difference to the meaning. Change one letter in a sentence and you can change the meaning completely so a continuous representation seems fundamentally wrong.
IMHO Marooned in Realtime is the best Vinge book. Besides being a dual mystery novel, it really explores the implications of bobble technology and how just a few hours of technology development near the singularity can be extreme.
Yep. I like it better than Fire Upon the Deep but I do like both of them. I didn’t like A Deepness in the Sky as it was feeling kinda grindy like Dune. (I wish we could just erase Dune so people could enjoy all of Frank Herbert’s other novels of which I love even the bad ones)
Here's a link to the full text of _Slow Tuesday Night_: https://web.archive.org/web/20060719184509/www.scifi.com/sci...
Being surprised is also an exciting outcome. Was he thinking about that too?