Has anyone come across a solution where model can iterate (eg, with prompts like "move the bicycle to the left side of the photo")? It feels like we're close.
I feel like we're close too, but for another reason.
For although I love SD and these video examples are great... It's a flawed method: they never get lighting correctly and there are many incoherent things just about everywhere. Any 3D artist or photographer can immediately spot that.
However I'm willing to bet that we'll soon have something much better: you'll describe something and you'll get a full 3D scene, with 3D models, source of lights set up, etc.
And the scene shall be sent into Blender and you'll click on a button and have an actual rendering made by Blender, with correct lighting.
Wanna move that bicycle? Move it in the 3D scene exactly where you want.
That is coming.
And for audio it's the same: why generate an audio file when soon models shall be able to generate the various tracks, with all the instruments and whatnots, allowing to create the audio file?
That is coming too.
In the video towards the bottom of the page, there are two birds (blue jays), but in the background there are two identical buildings (which look a lot like the CN Tower). CN Tower is the main landmark of Toronto, whose baseball team happens to be the Blue Jays. It's located near the main sportsball stadium downtown.
I vaguely understand how text-to-image works, and so it makes sense that the vector space for "blue jays" would be near "toronto" or "cn tower". The improvements in scale and speed (image -> now video) are impressive, but given how incredibly able the image generation models are, they simultaneously feel crippled and limited by their lack of editing / iteration ability.
Has anyone come across a solution where model can iterate (eg, with prompts like "move the bicycle to the left side of the photo")? It feels like we're close.