I think people might be missing what this enables. It can make plausible continuations of video, with realistic physics. What happens if this gets fast enough to work _in real time_.
Connect this to a robot that has a real time camera feed. Have it constantly generate potential future continuations of the feed that it's getting -- maybe more than one. You have an autonomous robot building a real time model of the world around it and predicting the future. Give it some error correction based on well each prediction models the actual outcome and I think you're _really_ close to AGI.
You can probably already imagine different ways to wire the output to text generation and controlling its own motions, etc, and predicting outcomes based on actions it, itself could plausibly take, and choosing the best one.
It doesn't actually have to generate realistic imagery or imagery that doesn't have any mistakes or imagery that's high definition to be used in that way. How realistic is our own imagination of the world?
Edit: I'm going to add a specific case. Imagine a house cleaning robot. It starts with an image of your living room. Then it creates a image of your living room after it's been cleaned. Then it interpolates a video _imagining itself cleaning the room_, then acts as much as it can to mimic what's in the video, then generates a new continuation, then acts, and so on. Imagine doing that several times a second, if necessary.
Sounds like simulation theory is closer and closer to being proven.
Except there is always an original at the root. There’s no way to prove that’s not us.
The root world can spawn many simulations and simulations can be spawned within simulations. It becomes far more likely that we exist in a simulated world than in the root world.
This argument always bothers me.
The probability of being in any simulation is conditional on the one above, which necessarily decreases exponentially. Any simulation running an equivalent simulation will do so much slower, so you get a geometric series of degrading probabilities.
The rate of decay will be massive, imagine how long and how much resources it would take us to simulate our universe, even in a hand waved AI+lazy compute way that also spawns subconsciousnesses. The inverse of that is the sequences ratio.
So even in theory, the probability of you being in any of the simulated universes is P(we can simulate a universe) / (1-1/time to simulate) - prob we’re in the top universe.
Thinking this probability is overwhelming because of the nesting effect is false.
That's not a good argument, because we have no way of knowing what is the ration between our time and that of the universe where our simulation runs. Even if it takes hours to generate one second of our universe, we only experience our own time.
Besides, time is not absolute, and having it run slower near massive objects or when objects accelerate would be a neat trick to save on compute power needed to simulate a universe.
I wouldn’t have used the slower argument but rather the information encoding argument. A simulation must necessarily encode less information than the universe being simulated, in fact substantially less. It wouldn’t be possible to encode the same information as the root universe in any nested simulation, even the first level. That would require the root entire universe to encode. This information encoding problem is what geometrically gets worse as you nest. At a certain point the simulation must be so simplified and lossy of simulated information that it’s got no meaningful information and the simulation isn’t representative of anything.
However - just because probabilistic reasoning explodes and the likelihood of something is vanishing doesn’t make it true.
First, the prior assumption is the universe can be simulated in any meaningful way at any substantial scale. That’s not at all obvious that the ability to simulate is high enough fidelity to lead to the complexity we see around us without some higher dimensional universe simulating what we see and the realities we see are achieved through dimensional reduction and absurdly powerful technology. This is also a probability in the conditional probability and I would not put it at 1.0. I would actually make it quite small, but its term as a prior will be significant.
Second, the prior is that whatever root universe that exists has yet to achieve the simulation in the flow of time, assuming time started at some discrete point. Our observations lead us to conclude time and space both emerged at a discrete point. The coalescence of the modern universe, evolution of life itself, emergence of intelligent beings, the technology required to simulate an enormous highly complex universe in its entirety, etc, are all priors. These are non trivial factors to consider and greatly reduce the likelihood of the simulation theory.
Third, it’s possible the clock rate of the simulation is fast enough that the simulation operates much faster than time evolves in the root universe, but to the original posts point, without enormous lossy optimizations, the nested universes can’t run at a faster clock rate in their simulation than the first level simulation. This is partially related to the information encoding problem but not directly. I don’t agree it geometrically gets worse, but it doesn’t get better either without further greatly reducing the quality of the simulation. That means either the quality converges to zero very fast, or they run at a synchronicity of the first level universe, requiring 1:1 time. Assuming it actually simulates the universe and not just some sort of occlusion scoped to you as an individual, that might mean it’ll take billions of years within the first level simulation for each layer of the nesting. This seems practically unlikely even in a simulated universe, so either those layers must not achieve a nesting or they must converge to simulations that have lost so much fidelity they simulate nothing very quickly.
Well first you imply a base universe is finite. That is not a given at all.
You don't need to simulate the full universe. Just the experience of consciousness inside it. You don't even have to simulate full consciousness for every 'conscious' being. In fact, I've always seen the simulation argument as a thought experiment arguing for consciousness being more fundamental than matter. There is no need to imagine a human made computer simulating an entire universe in subatomic detail for this thought experiment to intrigue us.
We being able to pinpoint a start of all time is actually a pretty good argument for it being simulated. Why would we be able to calculate a 'start time' for reality? That is not obvious to be a necessity at a base universe at all. There are theoretical cosmologies out there that do away with that need to conceptualize a universe.
The simulated universe doesn't have to run time faster then 'real time' at the base universe at all. In fact, running slower would be a feature if the beings in the base universe wished to escape into the simulation for whatever reason.
The only thing about the branching simulations is they are likely simplified approximations. There’s no reason it doesn’t nest and that the approximations can observe their approximations of the prior level is strictly more complex than can be observed in the simulation. That should be fundamentally impossible meaning any branch can’t know if they’re the root or the branch, only that they create a branch.
Haven't you heard of "turtles all the way down"?
Something something linked list with loop.
Our ability to build somewhat convincing simulations of thing has never been a proof of living in a simulation…
i mean everyone's mind builds a convincing internal simulation of reality and it's so good that most people think they're directly experiencing reality.
So what happens to someone suffering a psychotic episode, their reality gets distorted? But what reality though if it’s all an internal simulation? I think there’s partly an internal simulation to some aspect of reality but there’s a lot more to it.
The world model is not the world. It's the old map and territory thing.
Buddhist insight meditation actually proves that’s not true, fwiw.
In theory, yes. The problem is we've had AGI many times before, in theory. For example, Q learning, feed the state of any game or system through a neural network, have it predict possible future rewards, iteratively improve the accuracy of the reward predictions, and boom, eventually you arrive at the optimal behavior for any system. We've know this since... the 70's maybe? I don't know how far Q-learning goes back.
I like to do experiments with reinforcement learning and it's always exciting to think "once I turn this thing on, it's going to work well and find lots of neat solutions to the problem", and the thing is, it's true, that might happen, but usually it doesn't. Usually I see some signs of learning, but it fails to come up with anything spectacular.
I keep watching for a strong AI in a video game like Civilization as a sign that AI can solve problems in a highly complex system while also being practical enough that game creators are able to implement it in a practical way. Yes, maybe, maybe, a team with experts could solve Civilization as a research project, but that's far from being practical. Do you think we'll be able to show an AI a video of people playing Civilization and have the video predict the best moves before the AI in the game is able to predict the best moves?
I’ve been dying for someone to make a Civilization AI.
It might not be too crazy of an idea - would love to see a model fine-tuned on sequences of moves.
The biggest limitation of video game AI currently is not theory, but hardware. Once home compute doubles a few more times, we’ll all be running GPT-4 locally and a competent Civilization AI starts to look realistic.
I am 100% certain that the training of such an AI will result in winning a game without ever building a single city* and 1,000 other exploits before being nerfbatted enough to play a 'real' game.
(That doesn't mean I don't want to see the ridiculousness it comes up with!)
* https://www.youtube.com/watch?v=6CZEEvZqJC0
I knew it, I knew it! It would be a Spiffing Brit video.
That guy is a genius at finding exploits in computer games. I don't know how he does it, I think you need to play a fair bit of each game before you find these little corners of the ruleset.
Idk maybe he uses some sort of fuzzer
But wouldn't this be amazing for the developer to fix a lot of edge cases/bugs?
Maybe, maybe not. The stochastic, black-box nature of the current wave of ML systems gives me a gut feeling that using them like this is more of a Monkey's Paw wish granter than useful tool without a lot of refinement first. Time will tell!
If you train the model purely based on win rate, sure. Fortunately, we can efficiently use RLHF to train a model to play in a human-like way and give entertaining matches.
I think it's also a matter of "shape". Like, GPT4 solves one "shape" of problem, given tokens, predict the next token. That's all it does, that's the only problem it has to solve.
A Civilization AI would have many problem "shapes". What do I research? Where do I build my city, what buildings do I build, how do I move my units, what units do I build, what improvements do I build, when do I declare war, what trade deals do I accept, etc, etc. Each of those is fundamentally different, and you can maybe come up with a scheme to make them all into the same "shape", but then that ends up being harder to train. I would be interested to see a good solution to this problem.
You can constrain LLMs (like LLAMA) to only output certain tokens that match some schema (e.g. valid code syntax).
I don't see why you can't get a LLM to output something like "research tech332; build city3 building24".
Would love to see someone to make an AI that can predict our economy, perhaps by modeling all the actors that participate in the economy using AI agents.
Tbh I don't think an AI for Civ would that impressive, my experience is that most of time you can get away with making locally optimal decisions I.e growing your economy and fighting weaker states. The problem with current civ AI is that their economies can be often structured nonsensically, but optimized economies is usually just the matter of stacking bonuses together into specialized production zones, which can be solved via conventional algorithms.
Maybe, but a lot of people would like better AIs in strategy video games, it only adds to the frustration when people say "it wouldn't be that impressive". It's like saying "that would be easy... but it's not going to happen." (And I'm not focused on Civilization, it's just a well known example, I'd like to see a strong AI in any similar strategy game.)
I think it might be harder than StarCraft or Dota. Civilization is all about slow decision making (no APM advantages for the AI), and all the decisions are quite different, and you have to make them in a competitive environment where an opponent can raid and capture your cities.
The problem with game AI is that they "cheat". They don't play like a human. The civ AI straight up gets extra resources, AlphaStar in SC2 performed inhuman feats like giving commands in two different areas of the map simultaneously or spiking actions per minute to inhuman levels briefly. But even with all of that the AI still eventually loses. And then they start losing consistently as players play more against them.
Why? Because AI doesn't learn on the fly. The AI does things a certain way and beating it becomes a puzzle game. It doesn't feel like playing against a human opponent (although AlphaStar in SC2 probably came pretty close).
Learning on the fly is probably the biggest thing that (game) AI is lacking in. I'm not sure there's an easy solution for it.
And even if it succeeds, it fails again as soon as you change the environment because RL doesn't generalise. At all. It's kind of shocking to be honest.
https://robertkirk.github.io/2022/01/17/generalisation-in-re...
You're talking about an agent with a world model used for planning. Actually generating realistic images is not really needed as the world model operates in its own compressed abstraction.
Check out V-Jepa for such a system: https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-jo...
V-Jepa is actually super impressive. I have nothing but respect for Yann LeCun & his team, they really have been on a rampage lately.
One more French brain we didn't manage to keep.
The drain is just crazy at this point.
What would fix the issue?
Based on talking to my coworkers from Europe on the West Coast (some have more nuanced position, but some were outright "everyone in tech in their right mind moves away from Europe"), nothing short-term.
If you forget specifics, and consider on the abstract level what the differences are... Let's say there was an equal pile of "resources" per person available in Europe and the US. The way this pile is (abstractly) distributed in Europe is egalitarian and safety-net focused; in the US it is distributed more unequally, closer to some imperfect approximation of merit. Most of the (real) advantages and disadvantages that people bring up for the US stem from that. The more of this approximation of merit you have, the more "resources" you'd have in US. No matter what the specific slopes are, unless one place is much richer (might be the US anyway), at some point these lines cross. The higher the person is above this point the more it makes sense for them to go to the US...
There are also 2nd order effects like other people above that point having already gone (not just from Europe, from everywhere in the world), making the US more attractive, probably. That might matter more for top talent.
And although this probably doesn't matter for the top talent, "regular" Europeans can actually have the cake and eat it too - make the money in the US, then (in old age or if something happens) move back home and avail themselves of the welfare state. A non-German guy who worked in Germany for a few years told me that's what he'd do if he was German - working in Germany sucks, but being lazy in Germany is wonderful, so he'd move to the US then move back ;)
Is it an issue that needs fixing?
Not sure.
It's a lot of things at this point:
- If you are skilled, the money is much better in the US, even after paying for health care yourself.
- French entities and investors are risk adverse. This means your original projects will get canned more often, funding is going to be super hard, and will bring less money if you succeed.
- The French speaking market is smaller, so whatever you try, if you try in France, you either you target a market outside of France, which is harder than be where you clients are, or your target France and French speaking countries, with a much lower pay off.
- Customer and worker protection is higher, and laws are everywhere. This is usually good for citizens, but of course, it also means Uber or airbnb could never have __started__ in France.
- The network effect means if you go to the US, you will meet more opportunities, more skilled people and more interesting projects. There is also an energy there you won't find elsewhere.
- Administration is heavy, for companies, it's of course a burden, but for universities, it's a nightmare. And they are really under paid. Not to mention academics in France have a hard time promoting an idea, an innovations or anything they came up with. While in the US things have a catchy name before they are even proven to work.
All those things mean the US is professionally highly attractive, and actively trying to get talents with the resource to pay for it and the insistence of their market pressure.
Who is "we"?
Nous.
The people of France presumably.
Do you have a list ?
I don't, I just seem to have this moment of "oh, him as well" regularly.
And I get it, I went to the valley as well for some times, the money is better, the taxes are lower, you get more opportunities, meet more talented people and projects are way cooler.
What I find interesting is that b/c we have so much video data, we have this thing that can project the future in 2d pixel space.
Projecting into the future in 3d world space is actually what the endgame for robotics is and I imagine depending on how complex that 3d world model is, a working model for projecting into 3d space could be waaaaaay smaller.
It's just that the equivalent data is not as easily available on the internet :)
That's what estimation and simulation is for. Obviously that's not what's happening in TFA but it's perfectly plausible today.
Not sure how people are concluding that realistic physics is feasible operating solely in pixel space, because obviously it can't and anyone with any experience training such models would recognize instantly the local optimum these demos represent. The point of inductive bias is to make the loss function as convex as possible by inducing a parametrization that is "natural" to the system being modeled. Physics is exactly the attempt to formalize such a model borne of human cognitive faculties and it's hard to imagine that you can do better with less fidelity by just throwing more parameters and data at the problem, especially when the parametrization is so incongruent to the inherent dynamics at play.
There are also models that are trained to generate 3D models from a picture. Use it on videos, and also train it on output generated by video games.
Depth estimation improved a lot as well e.g. with Depth-Anything [0]. But those are mostly relative depth instead of metric. Also when even converted to metric they still seems have a lot of pointclouds at the edges that have to be pruned - visible in this blog [1]. Looks like those models trained on Lidar or Stereo depthmaps that has this limitations. I think we don't have enough clean training data for 3d unless we maybe train on synthetic data (then we can have plenty, generate realistic scene in Unreal Engine 5 and train on rendered 2d frames)
[0] https://github.com/LiheYoung/Depth-Anything
[1] https://medium.com/@patriciogv/the-state-of-the-art-of-depth...
imagine it going a few dimensions further, what will happen when i tell this person 'this'. how will this affect the social graph and my world state :)
A 3d model with object permanency is definitely a step in the right direction of something or other but for clarity let us dial back down the level of graphical detail.
A Pacman bot is not AGI. Might get it to eat all the dots correctly where as before if something scrolled off the screen it'd forget about it and glitch out - but you didn't fan any flames of consciousness into existence as of yet.
Is a human that manages to eat all the skittles and walk without falling into deadly holes AGI? Why?
Object permanence is a necessary but not sufficient condition for spatial reasoning but the definition of consciousness remains elusive unless you have some news to share.
A blind human is AGI. So is a drunk or clumsy one that falls into deadly holes. This is super cool and a step on the way to... something.. but even the authors don't claim it is somehow the whole ballgame.
I totally agree that a system like Sora is needed. By itself, it’s insufficient. With a multimodal model that can reason properly, then we get AGI or rather ASI (artificial super intelligence) due to many advantages over humans such as context length, access to additional sensory modalities (infrared, electroreception, etc), much broader expertise, huge bandwidth, etc.
future successor to Sora + likely successor to GPT-4 = ASI
See my other comment here: https://news.ycombinator.com/item?id=39391971
I call bullshit.
A key element of anything that can be classified as "general intelligence" is developing internally consistent and self-contained agency, and then being able to act on that. Today we have absolutely no idea of how to do this in AI. Even the tiniest of worms and insects demonstrate capabilities several orders of magnitude beyond what our largest AIs can.
We are about as close to AGI as James Watt was to nuclear fusion.
A definition of general intelligence may or may not include agency to act. There is no consensus on that. To learn and to predict, yes, but not necessarily to act.
Does someone with Locked-In Syndrome (LIS) continue to be intelligent? I’d say yes.
Obviously, agency to act might be instrumental for learning and predicting especially early in the life of an AI or a human, but beyond a certain point, internal simulations could substitute for that.
This comment is brilliant. Thank you. I’m so excited now to build a bot that uses predictive video. I wonder what the most simple prototype would be? Surely one that has a simple validation loop that can say hey, this predicted video became true. Perhaps a 2D infinite scrolling video game?
Imagine having real-time transfer of characteristics within your world in a VR/mixed reality setup. Automatically generating new views within the environment you are currently in could create pretty interesting experiences.
Imagine putting on some AR goggles
staring at a painting in a Museum
Then immediately jumping into an entire VR world based off the painting generated by an AI rendering it out on the fly
BlockadeLabs has been doing a 3D text to skybox and not exactly runtime at the moment but I have seen it work in a headset and it definitely feels like the future.
That’s how we think:
Imagine where you want to be (eg, “I scored a goal!”) from where you are now, visualize how you’ll get there (eg, a trick and then a shot), then do that.
There was that article a few months ago about how basically that's what the cerebellum does.
So basically a brain in a vat, reality as we experience it, our thoughts as prompts.
Figure out how to incorporate a quantum computer as a prediction engine in this idea, and you've got quite the robot on your hands. :)
(and throw this in for good measure https://www.wired.com/story/this-lab-grown-skin-could-revolu... heh)
This sounds like it has military applications, not that I’m excited at the prospect.
FWIW, you've basically described at a high level exactly what autonomous driving systems have been doing for several years. I don't think anyone would say that Waymo's cars are really close to AGI.
Adding to this: Sora was most likely trained on video that's more like what you'd normally see on YouTube or in a clip art or media licensing company collection. Basically, video designed to look good as a part of a film or similar production.
So right now, Sora is predicting "Hollywood style" content, with cuts, camera motions, etc... all much like what you'd expect to see in an edited film.
Nothing stops someone (including OpenAI) from training the same architecture with "real world captures".
Imagine telling a bunch of warehouse workers that for "safety" they all need to wear a GoPro-like action camera on their helmets that record everything inside the work area. Run that in a bunch of warehouses with varying sizes, content, and forklifts, and then pump all of that through this architecture to train it. Include the instructions given to the staff from the ERP system as well as the transcribed audio as the text prompt.
Ta-da.
You have yourself an AI that can control a robot using the same action camera as its vision input. It will be able to follow instructions from the ERP, listen to spoken instructions, and even respond with a natural voice. It'll even be able to handle scenarios such as spills, breaks, or other accidents... just like the humans in its training data did. This is basically what vehicle auto-pilots do, but on steroids.
Sure, the computer power required for this is outrageously expensive right now, but give it ten to twenty years and... no more manual labour.
The flip side of video or image gen is always video or image identification. If video gets really good then an AI can have quite an accurate visual view into the world in real time
how would you define AGI?
Thanks for adding the specific case. I think with testing these sort of limited domain applications make sense.
It'll be much harder for more open ended world problems where the physics encountered may be rare enough in the dataset that the simulation breaks unexpectedly. For example a glass smashing into the floor. The model doesn't simulate that causally afaik
As another comment points out that's Yann LeCun's idea of "Objective-Driven AI" introduced in [1] though not named that in the paper (LeCun has named it that way in talks and slides). LeCun has also said that this won't be achieved with generative models. So, either 1 out of 2 right, or both wrong, one way or another.
For me, I've been in AI long enough to remember many such breakthroughs that would lead to AGI before - from DeepBlue (actually) to CNNs, to Deep RL, to LLMs just now, etc. Either all those were not the breakthroughs people thought at the time, or it takes many more than an engineering breakthrough to get to AGI, otherwise it's hard to explain why the field keeps losing its mind about the Next Big Thing and then forgetting about it a few years later, when the Next Next Big Thing comes around.
But, enough with my cynicism. You think that idea can work? Try it out. In a simplified environment. Take some stupid grid world, a simplification of a text-based game like Nethack [2] and try to implement your idea, in-vitro, as it were. See how well it works. You could write a paper about it.
____________________
[1] https://openreview.net/pdf?id=BZ5a1r-kVsf
[2] Obviously don't start with Nethack itself because that's damn hard for "AI".