Can someone convince me that 3D gaussian splatting isn't a dead end? It's an order of magnitude too slow to render and order of magnitude too much data. It's like raster vs raytrace all over again. Raster will always be faster than raytracing. So even if raytracing gets 10x faster so too will raster.
I think generating traditional geometry and materials from gaussian point clouds is maybe interesting. But photogrammetry has already been a thing for quite awhile. Trying to render a giant city in real time via splats doesn't feel like "the right thing".
It's definitely cool and fun and exciting. I'm just not sure that it will ever be useful in practice? Maybe! I'm definitely not an expert so my question is genuine.
Mesh based photogrammetry is a dead end. GS or radiance field representation is just getting started. Not just rendering but potentially a highly compact way to store large 3D scenes.
Is it? So far it seems like the storage size is massive and the detail is unacceptably low up close.
Is there a demo that will make me go “holy crap I can’t believe how well this scene compressed”?
Here is a paper if you are interested. https://arxiv.org/pdf/2311.13681.pdf
The key is not to compress but to leverage the property of neural radiance fields and optimize for entropy. I suspect NERF can yield more compact storage since it's volumetric.
Not sure what you mean by "unacceptably low up close". Most GS demos don't have LoD lol.
When the camera gets close the "texture" resolution is extremely low. Like, roughly 1/4 what I would expect. Maybe even 1/8. Aka it's very blurry.
Saying its a dead end considering the alternative has no concept of animation or the ability for an artist to remix the asset? That just makes the comment seem naive.
It's not an order of magnitude slower. You can easily get 200-400 fps in Unreal or Unity at the moment.
100+FPS in browser? https://current-exhibition.com/laboratorio31/
900FPS? https://m-niemeyer.github.io/radsplat/
We have 3 decades worth of R&D in traditional engines, it'll take a while for this to catch up in terms of tooling and optimization but when you look where the papers come from (many from Apple and Meta), you see that this is the technology destined to power the MetaVerse/Spatial Compute era both companies are pushing towards.
The ability to move content at incredibly low production costs (iphone movie) into 3d environments is going to murder a lot of R&D made in traditional methods.
Don't know the hardware involved, yet that first link is most definitely not 100 FPS on all hardware. Slideshow on the current device.
Maybe not, but it's relatively smooth on my 3 year old phone, which is crazy impressive
Edit: I was in low power mode, it runs quite smoothly
Does anyone know how the first link is made?
You are in luck, the author has been sharing
https://medium.com/@heyulei/capture-images-for-gaussian-spla...
Photogrammetry struggles with certain types of materials (e.g. reflective surfaces). It's also very difficult to capture fine details (thin structures, hair). 3DGS is very good at that. And people are working on improving current shortcomings, including methods to extract meshes that we could use in traditional graphics pipelines.
3DGS is absolutely not good with non Lambertian materials..
After testing it, if fails in very basic cases. And it is normal that it fails, non Lambertian materials are not reconstructed correctly with SfM methods.
I don't understand the connection you're making between SfM (Structure from Motion) and surface shading.
I might be misunderstanding what you're trying to say. Could you elaborate?
You use SfM to find the first point cloud. However SfM is based on the hypothesis that the same point 'moves' linearly in between any two views. This hypothesis is important because it allows you to match a point in two pictures, and given the distance between the two images, you can triangulate the point in space. Therefore find it's depth.
However, non-Lambertian points move non linearly in viewing space (eg a specular point depends on the viewer pose).
So, automatically, their positions in space will be false, and you'll have floating points.
Gaussian 'splats' may have the potential to render non-Lambertian stuff using for example the spherical harmonics (even if I don't think the viewer use them if I'm not mistaken). But, capturing non-Lambertian points is very difficult and an open research problem.
You have to ask about what it's a dead end for. It seems pretty cool for the moral equivalent of fully 3D photographs. That's a completely legitimate use case.
For 3D gaming engines? I struggle to see how the fundamental primitive can be made to sing and dance in the way that they demand. People will try, though. But from this perspective, gaussians strike me more as a final render format than a useful intermediate representation. If they are going to use gaussians there's going to have to be something else invented to make them practical to use for engines in the meantime, and there's still an awful lot of questions there.
For other uses? Who knows.
But the world is not all 3D gaming and visual special effects.
You are missing where this is coming from.
Many of the core papers for this came from Meta's VR team (codec avatars), Apple ML (Spatial Compute) and Nvidia - companies deeply invested in VR/Spatial compute. It's clear that they see it as a key technology to further their interests in the space, and they are getting plenty of free help:
After being open sourced in May last year, there were 79 papers overall published on the topic.
It's more than 150 this year, more than one a day, advancing this "dead end" forward.
A small selection:
https://animatable-gaussians.github.io/ https://nvlabs.github.io/GAvatar/ https://research.nvidia.com/labs/toronto-ai/AlignYourGaussia... https://github.com/lkeab/gaussian-grouping
Goals aren't results. Maybe gaussian splatting will be the wave of the future and in 10 years it'll be the only graphics tech around.
In the meantime, if it isn't, it will hardly be the first promising new graphics technology to turn out to be completely unsuited for all the things people hoped for.
Most of what you linked to appears to correspond to what I intuitively described as them being an output format rather than useful directly; the last paper appear to go in the other direction to extract information from them but again doesn't function on the splats directly. The actual work isn't being done in the gaussians themselves, and the interesting results are precisely in what is not being done through the splats... but pointing that out explicitly that's not how you get funding nowadays. Two otherwise-identical proposals, but one that sings the praises of the buzzwords while the other is phrased to be critical of it, will have very different outcomes.
How can it be a legitimate use case for a "3D photo"? Realistically how long does it take to capture the photos needed to construct the scene?
Hardware evolves with production in mind. If method saves 10x time/labour even using 50x more expensive compute/tools then industry will figure out way to optimize/amortize compute cost on that task over time and eventually deseminate into consumer hardware.
Maybe. That implies that hardware evolution strictly benefits Bar and not Foo. But what has happened so far is that hardware advancements to accelerate NewThing also accelerate OldThing.
I think hardware evolution has to benefit Bar and Foo for production continuity anyways, OldThing still has to be supported until it becomes largely obsolete to both industry and consumer. In which case fringe users have to hold on to old hardware to keep processes going.
How is it too slow? You can easily render scenes at 60fps in a browser or on a mobile phone.
Heck, you can even train one from scratch in a minute on an iPhone [1].
This technique has been around for less than a year. It's only going to get better.
[1] https://www.youtube.com/watch?v=nk0f4FTcdmM
This technique exists from more than 10 years, and real time renderers exist too from very long.
That's pretty cool. It's not clear if it's incorporating Lidar data or not though. It's very impressive if not.
It's currently unparalleled when it comes to realism as in realistic 3D reconstruction from the real world. Photogrammetry only really works for nice surfacic data, whereas gaussian splats work for semi-volumetric data such as fur, vegetation, particles, rough surfaces, and also for glossy/specular surfaces and volumes with strong subdivision surface properties, or generally stuff with materials that are strongly view-dependent.
This seems like impressive work. You mention glossy / specular. I wonder why nothing in the city (first video) is reflective, not even the glass box skyscrapers. I noticed there is something funky in the third video with the iron railway rails from about :28 to :35 seconds. They look ghostly and appear to come in and out. Overall these three videos are pretty devoid of shiny or reflective things.
Current photogrammetry to my knowledge requires much more data than NeRfs/Gaussian splatting. So this could be a way to get more data for the "dumb" photogrammetry algorithms to work with.
Right? I'm surprised I don't hear this connection more often. Is it perhaps because photogrammetry algorithms require sharp edges, which the splats don't offer?
And? It's always going to be even faster to not have lighting at all.
In regards of contentproduction for virtual production, it is quicker to capture a scene and process the images into a cloud of 3d-gaussians, but on the other hand it is harder to edit the scene after its shot. Also, the light is already captured and baked into it. The tools to edit scenes will probably rely a lot on ai, like delighting and change of settings. right now there are just a few, the process is more like using knife to cut out parts and remove floaters. You can replay this of course with the unreal engine, but in the long term you could run it in a browser. So in short, if you want to capture a place as it is with all its tiny details, 3dgaussians are a quicker and cheaper way to afford this than using modelling and texturing.
Yes this has tons of potential. It's analogous but different to patented techniques used by Unreal engine. Performance is not the focus in most research at the moment. There isn't even alignment on unified format with compression yet. The potential for optimization is very clear and straightforward to adapt to many devices, it's similar to point cloud LOD, mesh culling, etc. Splat performance could be temporary competitive advantage for viewers, but similar to video decompression and other 3d standards that are made available via open source, it will likely become commonplace in a few years to have high quality high fps splat viewing on most devices as tablestakes. The next question is what are the applications thereof.
I'll be honest, I don't have a ton of technical insights into these but anecdotally, I found that using KIRI Engine's Gaussian Splatting scans (versus Photogrammetry scans) the GS scans were way more accurate and true to life and required a lot less cleanup!
Try animating a photogrammetric model! How about one that changes its shape? You get awful geometry from photogrammetry…
In practice the answer to will this be useful is yes! Subdivision surfaces coexist with nurbs for different applications.
Nothing comes close to this for realism, it's like looking at a photo.
Traditional photogrammetry really struggles with complicated scenes, and reflective or transparent surfaces.