HN comments for: UE5 Nanite in WebGPU

soulofmischief

16 replies

23h46m

2024-09-05 18:45:46 UTC

Is the demo using user agent strings to determine compatibility? That's not good, and feature compatibility should be determined on a case-by-case basis by simply attempting to detect/use the specific feature.

I am on Chromium, not Chrome, and use WebGPU all the time, but the demos tell me to use Chrome, which I cannot do ethically. Would love to try the demos out, this looks like a lot of hard work!

Twirrim

5 replies

22h32m

2024-09-05 19:59:47 UTC

It's not working for me on Chrome under Linux, nor on Android, for what it's worth (though Firefox is what I use for practically all my browsing needs). Something really odd with their detection logic.

pjmlp

4 replies

21h50m

2024-09-05 20:41:50 UTC

WebGPU is not supported on Linux, and it won't for the foreseeable future.

On Android you should have at least Android 12, with good enough Vulkan drivers, not blacklisted.

sva_

2 replies

21h6m

2024-09-05 21:26:17 UTC

WebGPU is not supported on Linux, and it won't for the foreseeable future.

A lot of it runs fine with a flag.

pjmlp

1 replies

12h11m

2024-09-06 06:21:16 UTC

That isn't really something for production use.

Nathanael_M

0 replies

5h21m

2024-09-06 13:10:36 UTC

Neither is this experiment.

littlestymaar

0 replies

4h36m

2024-09-06 13:55:38 UTC

and it won't for the foreseeable future.

Do you know what's blocking?

gpm

4 replies

22h49m

2024-09-05 19:42:47 UTC

and use WebGPU all the time

I'm curious, what for?

soulofmischief

3 replies

21h52m

2024-09-05 20:40:12 UTC

I've used it to build and/or run various machine learning models for text generation, speech recognition, image generation, depth estimation, etc. in the browser, in support of an agentic system I've been building out.

Lots of future possibilities as well once support is more ubiquitous!

password4321

2 replies

17h1m

2024-09-06 01:30:42 UTC

Your ideas are intriguing to me and I wish to subscribe to your newsletter.

soulofmischief

1 replies

3h12m

2024-09-06 15:20:21 UTC

I appreciate that, anything in particular catch your interest?

password4321

0 replies

56m

2024-09-06 17:36:32 UTC

I am most interested in speech recognition including diarization.

robocat

0 replies

3h5m

2024-09-06 15:26:49 UTC

feature compatibility should be determined on a case-by-case basis by simply attempting to detect/use the specific feature

That's a fine goal.

When writing my own component framework for browsers, detection was regularly impossible and I had to depend on browser sniffing. Modernizr code has some very smart hacks (sometimes very dirty hacks) to detect features - a large amount of work for them to develop trustworthy detection code. And detection was usually via side-effects.

My educated guess is that feature detection for Web3D is not simple. A quick google and I didn't find an obvious Web3D feature detection library.

Here's part of the detection code for :checked support in Modernizr:

  Modernizr.addTest('checked', function(){
   return Modernizr.testStyles('#modernizr input {width:100px} #modernizr :checked {width:200px;display:block}', function(elem, rule){

robin_reala

0 replies

23h28m

2024-09-05 19:03:49 UTC

Don’t think so. I’m on a Firefox that has experimental WebGPU support enabled, and it fails with shader compilation errors rather than any message.

drusepth

0 replies

23h27m

2024-09-05 19:05:34 UTC

If this is the case, I imagine it'd be pretty easy to spoof your UA and see the demo, even from Chromium.

bakugo

0 replies

22h10m

2024-09-05 20:21:44 UTC

Is the demo using user agent strings to determine compatibility

I am on Chromium, not Chrome

Don't know about your build, but I'm using Ungoogled Chromium, and it has the exact same user-agent string as Google Chrome.

Have you enabled the WebGL permission for the site in site settings? I think it was disabled by default for me.

Const-me

0 replies

8h20m

2024-09-06 10:12:34 UTC

It seems the demos are just broken. I'm getting this error:

WebGPU error [frame][validation]: Fill size (7160950) is not a multiple of 4 bytes. - While encoding [CommandEncoder "main-frame-cmd-buffer"].ClearBuffer([Buffer "rasterize-sw"], 0, 7160950).

Animats

16 replies

23h31m

2024-09-05 19:00:47 UTC

Oh, nice. Third party implementations of Nanite playback.

Nanite is a very clever representation of graphics meshes. They're directed acyclic graphs rather than trees. Repetition is a link, not a copy. It's recursive; meshes can share submeshes, which in turn can share submeshes, all the way down. It's also set up for within-mesh level of detail support, so the submeshes drop out when they're small enough. So you can have repetitive content of very large size with a finite amount of data and fast rendering times. The insight is that there are only so many pixels on screen, so there's an upper bound on rendering work really needed.

There's a really good SIGGRAPH video on this from someone at Epic.

Current GPU designs are a mismatch for Nanite, Some new hardware operations are needed to do more of this in the GPU, where it belongs. Whether that will happen, with NVidia distracted by the AI market, is a good question.

The scene needs a lot of instancing for this to pay off. Unreal Engine demos show such things as a hall of identical statues. If each statue was different, Nanite would help far less. So it works best for projects where a limited number of objects are reused to create large areas of content. That's the case for most AAA titles. Watch a video of Cyberpunk 2077, and look for railings and trash heaps. You'll see the same ones over and over in totally different contexts.

Making a nanite mesh is complicated, with a lot of internal offsets for linking, and so far only Unreal Engine's editor does it. With playback now open source, someone will probably do that.

Those internal offsets in the format present an attack surface which probably can be exploited with carefully crafted bad content, like hostile Microsoft Word .doc files.

Jasper_

5 replies

23h8m

2024-09-05 19:24:11 UTC

Repetition is a link, not a copy. It's recursive; meshes can share submeshes, which in turn can share submeshes, all the way down.

While it does construct a DAG to perform the graph cut, the final data set on disk is just a flat list of clusters for consideration, along with their cutoffs for inclusion/rejection. There seems to be a considerable misunderstanding of what the DAG is used for, and how it's constructed. It's constructed dynamically based on the vertex data, and doesn't have anything to do with how the artist constructed submeshes and things, nor does "repetition become a link".

The scene needs a lot of instancing for this to pay off. Unreal Engine demos show such things as a hall of identical statues. If each statue was different, Nanite would help far less.

What makes you say this? The graph cut is different for each instance of the object, so they can't use traditional instancing, and I don't even see how it could help.

Animats

4 replies

22h55m

2024-09-05 19:37:00 UTC

It may not be based on what the mesh's creator considered repetition, but repetition is encoded within the mesh. Not sure if the mesh builder discovers some of the repetition itself.

Look at a terrain example:

https://www.youtube.com/watch?v=DKvA7NZRUcg

Jasper_

3 replies

22h31m

2024-09-05 20:00:45 UTC

I'm not seeing what you claim to be seeing in that demo video. I see a per-triangle debug view, and a per-cluster debug view. None of that is showing repetition.

Animats

2 replies

21h44m

2024-09-05 20:47:43 UTC

If there wasn't repetition, you'd need a really huge GPU for that scene at that level of detail.

jms55

0 replies

21h1m

2024-09-05 21:30:37 UTC

Not necessarily. Nanite compresses meshes (including in-memory) _very_ heavily, and _also_ streams in only the visible mesh data.

In general, I wouldn't think of Nanite as "one thing". It's a combination of many, many different techniques that add up into some really good technology.

gmueckl

0 replies

21h4m

2024-09-05 21:27:58 UTC

I don't want to estimate storage space right now, but meshes can be stored very efficiently. For example, I think UE uses an optimization where vertex positions are heavily quantized to just a few bits within the meshlet's bounding box. Index buffers can be constructed to share the same vertices across LOD levels. Shading normals can be quantized quite a bit before shading artifacts become noticeable - if you even need them anymore at that triangle density.

If your triangles are at or below the size of a texel, texture values could even be looked up offline and stored in the vertex attributes directly rather than keeping the UV coordinates around, but that may not be a win.

pcwalton

1 replies

21h23m

2024-09-05 21:09:25 UTC

Making a nanite mesh is complicated, with a lot of internal offsets for linking, and so far only Unreal Engine's editor does it.

meshoptimizer [1] is an OSS implementation of meshlet generation, which is what most people think of when they think of "Nanite's algorithm". Bevy, mentioned in a sibling reply, uses meshoptimizer as the generation tool.

(Strictly speaking, "Nanite" is a brand name that encompasses a large collection of techniques, including meshlets, software rasterization, streaming geometry, etc. For clarity, when discussing these concepts outside of the context of the Unreal Engine specifically, I prefer to refer to individual techniques instead of the "Nanite" brand. They're really separate, even though they complement one another. For example, software rasterization can be profitably used without meshlets if your triangles are really small. Streaming geometry can be useful even if you aren't using meshlets. And so on.)

[1]: https://github.com/zeux/meshoptimizer

jms55

0 replies

20h59m

2024-09-05 21:32:45 UTC

Small correction: meshoptimizer only does the grouping triangles -> meshlets part, and the mesh simplification. Actually building the DAG, grouping clusters together, etc is handled by Bevy code (I'm the author, happy to answer questions).

That said I do know zeux was interested in experimenting with Nanite-like DAGs directly in meshoptimizer, so maybe a future version of the library will have an end-to-end API.

DaoVeles

1 replies

18h34m

2024-09-05 23:57:41 UTC

In a past life (2000's) I was doing some dev stuff on Ps3, trying to figure out so decent uses for Cell's mass of compute and working around RSX's limited memory bandwidth while having the luxury of Blu-ray storage to burn through.

One such thing I did get a fair way into was something like Nanite - I called it compressive meshing. It is the typical case of misguided engineering hubris at work.

The initial work looked promising but the further into the problem I get the more complicated the entire thing become. Having to construct the entire asset generation pipeline was just way beyond what I could manage in the time frame that would look anything decent and not blow out the memory required.

I did manage to get something that vague resembled large scale meshes being rendered in a staggered level of detail but it ran SLOW and looked like rubbish unless you hammered the GPU to get sub-pixel accuracy. It was a fun experiment but it was far too much for the hardware and too big of a task to take on as a single programmer.

When Epic showed off Nantine... wow they did what I never could in a fashion way beyond even my best vision! It is one of those technologies that when it came along really was a true solution rather than just hype. Yes there are limits as with anything on that scale but it is one of the technical jewels of the modern graphics world. I have said that if Epic was public traded company I would considered putting in a sizable amount of money just based on Nanite tech alone.

runevault

0 replies

14h10m

2024-09-06 04:21:58 UTC

Keep in mind, it took Epic a long time to get it sorted. I think I saw the primary creator say it took him a decade of research and work to come to the initial implementation of Nanite that shipped in Unreal.

vinkelhake

0 replies

22h0m

2024-09-05 20:32:21 UTC

Nanite playback

That's not what this is though. It's an implementation of the techniques/technology used in Nanite. It doesn't load data from Unreal Engine's editor. One of the mentioned goals:

   Simplicity. We start with an OBJ file and everything is done
   in the app. No magic pre-processing steps, Blender exports, etc.
   You set the breakpoint at loadObjFile() and F10 your way till
   the first frame finishes.

turtledragonfly

0 replies

23h15m

2024-09-05 19:17:14 UTC

I think the SIGGRAPH talk you referred to is: "A Deep Dive into Nanite Virtualized Geometry" (https://www.youtube.com/watch?v=eviSykqSUUw)

There's also this short high-level intro (2.5 min) that I thought was decent: "What is virtualized micropolygon geometry? An explainer on Nanite" (https://www.youtube.com/watch?v=-50MJf7hyOw)

ksec

0 replies

15h5m

2024-09-06 03:27:28 UTC

Current GPU designs are a mismatch for Nanite, Some new hardware operations are needed to do more of this in the GPU, where it belongs. Whether that will happen, with NVidia distracted by the AI market, is a good question.

Unreal 5 was only released in 2022, and we have been iterating the Nanite idea since then. With Unreal 5.5 and more AAA Gaming titles coming in and we can take what we learned and put into hardware. Not to mention the lead time is 3-4 years down the road. Even if Nvidia decided to make one in 2023 it would have been at least 2026 before we see any GPU acceleration.

jiggawatts

0 replies

21h14m

2024-09-05 21:17:53 UTC

I read through the papers and my impression was that the biggest gains were from quantised coordinates and dynamic LOD for small patches instead of the entire mesh.

The logic behind nanite as I understood it was to keep the mesh accuracy at roughly 1 pixel precision. So for example, a low detail mesh can be used with coordinates rounded to just 10 bits (or whatever) if the resulting error is only about half a pixel when perspective projected onto the screen.

I vaguely remember the quantisation pulling double duty: not only does it reduce the data storage size it also helps the LOD generation because it snaps vertices to the same locations in space. The duplicates can then be eliminated.

hyperthesis

0 replies

19h45m

2024-09-05 22:46:45 UTC

This is like when Joel said git stores diffs.

diggan

0 replies

22h44m

2024-09-05 19:47:40 UTC

and so far only Unreal Engine's editor does it

Not a major/mainstream engine by any means (a small Rust ECS game engine) but Bevy also supports something similar under the feature name "Virtual Geometry", mentioned here: https://bevyengine.org/news/bevy-0-14/#virtual-geometry-expe...

Also, a technical deep dive into the feature from one of the authors of the feature: https://jms55.github.io/posts/2024-06-09-virtual-geometry-be...

SaintSeiya

11 replies

23h38m

2024-09-05 18:54:00 UTC

Honest question: It is calim that software rasterizer is faster than hardware one. Can someone explain me why? isn't the purpose of the GPU to accelerate rasterization itself? Unless is a recent algorithm or the "software rasterizer" is actually running on the GPU and not the CPU I don't see how

NotGMan

2 replies

23h30m

2024-09-05 19:02:24 UTC

I'm a bit out of the GPU game but so this might be slightly wrong in some places: the issue is in small triangles because you end up paying a huge cost. GPUs ALWAYS shade in 2x2 blocks of pixels, not 1x1 pixels.

So if you have a very small triangle (small as in how many pixels on the screen it covers) that covers 1 pixel you will still pay the price of a 2x2 block (4 pixels instead of 1), so you just wasted 300% of your performance.

Nanite auto-picks the best triangle to minimize this and probably many more perf metrics that I have no idea about.

So even if you do it in software the point is that if you can get rid of that 2x2 block penalty as much as possible you could be faster than GPU doing 2x2 blocks in hardware since pixel shaders can be very expensive.

This issue gets worse the larger the rendering resolution is.

Nanite then picks larger triangles instead of those tiny 1-pixel ones since those are too small to give any visual fidelity anyway.

Nanite is also not used for large triangles since those are more efficient to do in hardware.

kllrnohj

1 replies

19h42m

2024-09-05 22:49:57 UTC

So even if you do it in software the point is that if you can get rid of that 2x2 block penalty as much as possible you could be faster than GPU doing 2x2 blocks in hardware since pixel shaders can be very expensive.

Of course the obvious problem with that is if you don't have most of the screen covered in such small triangles then you're paying a large cost for nanite vs traditional means.

01HNNWZ0MV43FF

0 replies

2h13m

2024-09-06 16:19:01 UTC

Nanite has an heuristic to decide between pixel-sized compute shader rasterizing and fixed-function rasterizing. You can have screen-sized quads in Nanite and it's fine

bob1029

1 replies

23h14m

2024-09-05 19:17:53 UTC

I thought it was a software rasterizer running inside fragment shader on the GPU. Not actually on the CPU. I need to watch that video again to be sure, but I cant see how a CPU could handle that many triangles.

raphlinus

0 replies

21h34m

2024-09-05 20:58:14 UTC

To be precise, this is running in a compute shader (rasterizeSwPass.wgsl.ts for the curious). You can think of that as running the GPU in a mode where it's a type of computer with some frustrating limitations, but also the ability to efficiently run thousands of threads in parallel.

This is in contrast to hardware rasterization, where there is dedicated hardware onboard the GPU to decide which pixels are covered by a given triangle, and assigns those pixels to a fragment shader, where the color (and potentially other things) are computed, finally written to the render target as a raster op (also a bit of specialized hardware).

The seminal paper on this is cudaraster [1], which implemented basic 3D rendering in CUDA (the CUDA of 13 years ago is roughly comparable in power to compute shaders today), and basically posed the question: how much does using the specialized rasterization hardware help, compared with just using compute? The answer is roughly 2x, though it depends a lot on the details.

And those details are important. One of the assumptions that hardware rasterization relies on for efficiency is that a triangle covers dozens of pixels. In Nanite, that assumption is not valid, in fact a great many triangles are approximately a single pixel, and then software/compute approaches actually start beating the hardware.

Nanite, like this project, thus actually uses a hybrid approach: rasterization for medium to large triangles, and compute for smaller ones. Both can share the same render target.

[1]: https://research.nvidia.com/publication/2011-08_high-perform...

TinkersW

1 replies

21h33m

2024-09-05 20:58:47 UTC

A couple reasons

1. HW does 2x2 blocks of pixels always so it can have derivatives, even if you don't use them..

2. Accessing SV_PrimitiveID is surprisingly slow on Nvidia/AMD, by writing it out in the PS you will take a huge perf hit in HW. There are ways to work around this, but they aren't trivial and differ between vendors, and you have to be aware of the issue it in the first place! I think some of the "software" > "hardware" raster stuff may come from this.

The HW shader in this demo looks wonky though, it should be writing out the visibility buffer, and instead it is writing out a vec4 with color data, so of course that is going to hurt perf. Way too many varyings being passed down also.

In a high triangle HW rasterizer you want the visibility buffer PS do a little compute as possible, and write as little as possible, so it should only have 1 or 2 input varyings and simply writes them out.

andybak

0 replies

11h0m

2024-09-06 07:32:29 UTC

What's PS? Pixel shader? I'm guessing here.

Animats

1 replies

23h29m

2024-09-05 19:03:33 UTC

The answer to that is in this hour-long SIGGRAPH video.[1] Some of the operations needed are not done well, or at all, by the GPU.

[1] https://www.youtube.com/watch?v=eviSykqSUUw

janito

0 replies

22h50m

2024-09-05 19:41:51 UTC

Here's the relevant part of the (really cool!) video: https://www.youtube.com/watch?v=eviSykqSUUw&t=1888s

janito

0 replies

23h33m

2024-09-05 18:58:36 UTC

I'm also curious. From what I could read in the repository's references, I think that the problem is that the GPU is bad at rasterizing small triangles. Apparently each triangle in the fixed function pipeline generates a batch of pixels to render (16 in one of the slides I saw), so if the triangle covers only one or two pixels, all others in the batch are wasted. I speculate that the idea is to then detect these small triangles and draw them quickly using less pixel shaders (still on the GPU, but without using the graphics specific fixed functions), but I'm honestly not sure I understand what's happening.

SaintSeiya

0 replies

23h23m

2024-09-05 19:09:14 UTC

thanks all, yes it start making sense now

KronisLV

9 replies

20h48m

2024-09-05 21:44:28 UTC

I wonder how other engines compare when it comes to LODs and similar systems.

Godot has automatic LOD which seems pretty cool for what it is: https://docs.godotengine.org/en/stable/tutorials/3d/mesh_lod...

Unity also has an LOD system, though despite how popular the engine is, you have to create LOD models manually: https://docs.unity3d.com/Manual/LevelOfDetail.html (unless you dig through the asset store and find a plugin)

I did see an interesting approach in a lesser known engine called NeoAxis: https://www.neoaxis.com/docs/html/NeoAxis_Levels.htm however that engine ran very poorly for me on my old RX580, although I haven't tried on my current A580.

As far as I can tell, Unreal is really quite far ahead of the competition when it comes to putting lots of things on the screen, except the downside of this is that artists will be tempted to include higher quality assets in their games, bloating the install sizes quite far.

kllrnohj

6 replies

19h51m

2024-09-05 22:40:56 UTC

In theory Nanite is superior to precomputed LODs. In practice it's less clear cut as they aren't going to be as good as artist-created LODs and it's not entirely reasonable to expect them to do so. Also the performance cost is huge as Nanite/virtual geometry is a poor fit for modern GPUs. iirc peak fill rate is 1/4th or something like that as GPU rasterization works on 2x2 quads not per-pixel like shaders do.

jsheard

5 replies

19h40m

2024-09-05 22:52:05 UTC

Rasterizing very small triangles in hardware is indeed inefficient due to the 2x2 quad tax, but one of Nanites tent-pole features is a software rasterizer which sidesteps that problem entirely. IIRC they said that for a screen entirely filled with triangles roughly the size of a pixel, their software raster ends up being about 3x faster than using the dedicated raster hardware.

kllrnohj

4 replies

18h15m

2024-09-06 00:16:49 UTC

Yes but I'm talking the other way around. Nanite is 1/4th the performance for triangles that aren't 1-3 pixels in size, which is the majority of the time.

The main selling point of Nanite is really just to reduce artist costs by avoiding manual LODs. But a high quality automatic LOD at build time may (read: almost certainly does) strike a much better balance for both current and near future hardware

jms55

1 replies

16h41m

2024-09-06 01:51:08 UTC

But a high quality automatic LOD at build time may (read: almost certainly does) strike a much better balance for both current and near future hardware

You can't have a manual LOD for a cliff where half is near the player and should be high resolution, and half is further away and can be low resolution. Nanite's hierarchical LODs are a huge improvement for this.

You're also underestimating the amount of time artists have to spend making and tweaking LODs, and how big of an impact skipping that is.

incrudible

0 replies

9h40m

2024-09-06 08:52:15 UTC

This is assuming you have this "one big cliff mesh". This is the Nanite mindset: Just let the artists throw anything that comes out of their DCC at it. That is a great value proposition for studios, especially the ones that fail the marriage of art and engineering.

It's a bad value proposition for end-users. Nanite is much slower for the same image quality that a bespoke solution would offer, which is evident with several AAA titles that choose to use in-house tech over UE.

jsheard

0 replies

18h12m

2024-09-06 00:20:20 UTC

Nanite batches up triangles above a certain size threshold and sends them to the hardware rasterizer instead, since it is faster to use it in those cases. This was all documented from very early on.

guitarlimeo

0 replies

11h53m

2024-09-06 06:38:51 UTC

Nanite is actually good at AAA graphics for hardware stacks that don't do raytracing well, but if raytracing gets faster, Nanite becomes more useless as it doesn't work well with raytracing. What Nanite is actually good with is providing LODs for only parts of the mesh, so that might get some longer use in the industry.

https://threadreaderapp.com/thread/1809936882278469878.html

elabajaba

1 replies

18h8m

2024-09-06 00:24:25 UTC

Intel Arc GPUs are terrible for Nanite rendering, since they lack hardware support for both indirect draws (widely used in GPU driven renderers, Intel emulates it in software which is slow) and 64bit atomics, which are required for nanite.

cubefox

0 replies

11h1m

2024-09-06 07:31:21 UTC

That's interesting. Are there benchmarks on UE5 games which back this up?

jsheard

8 replies

2024-09-05 18:01:18 UTC

It's cool that it kind of works, but they had to make some nasty compromises to get around WebGPUs lack of 64 bit atomics. Hopefully that will be added as an optional extension at some point, hardware support is almost ubiquitous on desktop-class hardware at least (AMD and Nvidia have had it forever but Apple has only had it since the M3).

throwaway17_17

5 replies

2024-09-05 18:22:15 UTC

What is the use case for atomics in the rasterizer? I can’t figure out what the atomic operations do inside the rendering pipeline. I looked at the GitHub, but couldn’t find the place the atomic were hoped for.

jsheard

1 replies

23h54m

2024-09-05 18:38:12 UTC

With traditional hardware rasterization there are specialized hardware blocks which handle atomically updating the framebuffer to whichever sample is currently the closest to the camera, and discarding anything behind that. Nanite does software rasterization instead, and one of their insights was figuring out a practical way to cram all of the data needed for each pixel into just 64 bits (depth in the high bits and everything else in the low bits) which allows them to do efficient depth sorting using min/max atomics from a compute shader instead. The 64 bits are crucial though, that's the absolute bare minimum useful amount of data per pixel so you really need 64 bit atomics. Nanite doesn't even try to work without them.

To kind of get it working with 32 bit atomics this demo is reducing depth to just 16 bits (not enough to avoid artifacts) and only encoding a normal vector into the other 16 bits, which is why the compute rasterized pixels are untextured. There just aren't enough bits to store any more material parameters or a primitive ID, the latter being how Nanite does it.

01HNNWZ0MV43FF

0 replies

2h17m

2024-09-06 16:15:19 UTC

Software rasterization, in a GPGPU compute shader, in hardware.

hrydgard

1 replies

2024-09-05 18:28:33 UTC

Pack Z and 32-bit color together into a 64-bit integer, then do an atomic min (or max with reversed Z) to effectively do a Z-query and a write really, really fast.

jsheard

0 replies

23h29m

2024-09-05 19:03:19 UTC

Nanite writes out the ID of the primitive at that pixel rather than the color, but otherwise yeah that's the idea. After rasterization is done a separate pass uses that ID to fetch the vertex data again and reconstruct all of the material parameters, which can be freely written out without atomics since there's exactly one thread per pixel at that point.

neomantra

0 replies

23h58m

2024-09-05 18:33:55 UTC

Visibility buffer needing atomics is noted briefly in the long README. Link to discussion detailing it: https://github.com/Scthe/nanite-webgpu/issues/1

my123

1 replies

21h49m

2024-09-05 20:43:23 UTC

Since the M2

jsheard

0 replies

21h43m

2024-09-05 20:49:00 UTC

Right you are, 64 bit atomics were added with the Apple8 GPU but only in M-series chips (M2 and up) and then the Apple9 GPU made it universal (A17 Pro and up).

https://developer.apple.com/metal/Metal-Feature-Set-Tables.p...

theogravity

3 replies

23h18m

2024-09-05 19:14:06 UTC

Using latest chrome on M2 Max for the jinx demo:

  WebGPU error [frame][validation]: Fill size (7398781) is not a multiple of 4 
  bytes.
  - While encoding [CommandEncoder "main-frame-cmd-buffer"].ClearBuffer([Buffer 
  "rasterize-sw"], 0, 7398781).

stephc_int13

0 replies

22h51m

2024-09-05 19:41:03 UTC

I have the same error on Windows 11, GPU is a RTX4090. Browser is Edge.

flockonus

0 replies

18h13m

2024-09-06 00:19:07 UTC

If helpful to author, on M1 no errors, can see 15+ fps at all times.

daemonologist

0 replies

19h23m

2024-09-05 23:09:03 UTC

Same (different number) on Chrome on Android (Pixel 7).

smartmic

3 replies

23h20m

2024-09-05 19:12:23 UTC

Wow, I can't remember the last time I read a project summary with so much jargon - I literally didn't understand anything:

UE5's Nanite implementation using WebGPU. Includes the meshlet LOD hierarchy, software rasterizer and billboard impostors. Culling on both per-instance and per-meshlet basis.

bogwog

1 replies

22h17m

2024-09-05 20:14:51 UTC

UE5 Nanite -> https://dev.epicgames.com/documentation/en-us/unreal-engine/...

WebGPU -> https://developer.mozilla.org/en-US/docs/Web/API/WebGPU_API

Meshlet -> https://developer.nvidia.com/blog/introduction-turing-mesh-s...

LOD -> https://en.wikipedia.org/wiki/Level_of_detail_(computer_grap...

Software rasterizer -> https://en.wikipedia.org/wiki/Rasterisation ("software" means it runs on the CPU instead of GPU)

Billboard imposters -> https://www.alanzucconi.com/2018/08/25/shader-showcase-satur...

Culling -> https://en.wikipedia.org/wiki/Hidden-surface_determination

nicebyte

0 replies

21h41m

2024-09-05 20:51:13 UTC

("software" means it runs on the CPU instead of GPU)

no, in this context it means that the rasterisation algorithm is implemented in a compute kernel, rather than using the fixed hw built into the gpu. so rasterization still happens on the gpu, just using programmable blocks.

goodcjw2

0 replies

23h13m

2024-09-05 19:19:21 UTC

Guess this really shows how much domain specific knowledge in the Computer Graphics...

Yet still, this post is now ranked top 1 on HN.

macawfish

2 replies

2024-09-05 18:28:58 UTC

The camera controls on my phone are very hard to get down

bob1029

0 replies

23h55m

2024-09-05 18:37:25 UTC

Couldn't we use something like this to provide a more intuitive experience for the mobile web targets?

https://developer.mozilla.org/en-US/docs/Web/API/Device_orie...

aaroninsf

0 replies

23h59m

2024-09-05 18:33:17 UTC

Browser/touchpad also :)

eigenvalue

2 replies

22h31m

2024-09-05 20:00:57 UTC

Whenever I see rendered scenes like this (I.e., lots of repetitive static geometry) I imagine that annoying guy’s voice going on about “unlimited detail” from that old vaporware video. I guess nanite really did solve that problem for real, as opposed to whatever that old thing was using (I remember something about oct-trees or something).

HappMacDonald

1 replies

20h21m

2024-09-05 22:10:40 UTC

I recall those claims being made by a company called "Euclidean", from Australia I think. Online rumors suggested they might have been using octtrees, but later Euclidean videos flatly denied that.

raphlinus

0 replies

19h53m

2024-09-05 22:38:55 UTC

It's Euclideon. And it is octtrees. My interpretation after reading a fascinating Reddit thread [1] is that these denials were misdirection. There's definitely new interest in splatting techniques (Gaussian in particular), though they've long been an alternative to triangles in the 3D world. I think it'd be fun to experiment with implementing some of that using modern compute shaders.

[1]: https://www.reddit.com/r/VoxelGameDev/comments/1bz5vvy/a_sma...

devit

2 replies

23h26m

2024-09-05 19:06:05 UTC

Name and description are very confusing and a trademark violation since despite the claims it seems to be completely unrelated to actual Nanite in UE5, just an implementation of something similar by a person unaffiliated with UE5.

There is also Bevy's Virtual Geometry that provides similar functionality and is probably much more useful since it's written in Rust and integrated with a game engine: https://jms55.github.io/posts/2024-06-09-virtual-geometry-be...

KMnO4

1 replies

21h15m

2024-09-05 21:17:35 UTC

I don’t think it’s really an issue. It’s clear from the readme that it’s an implementation.

If I made an “implementation of OpenAI’s GPT-3 in JS” you would understand that to mean I took the architecture from the whitepaper and reimplemented it.

cubefox

0 replies

16h36m

2024-09-06 01:56:03 UTC

The technique is commonly called virtual geometry or virtualized geometry, or dynamic LOD in research papers. Really no need to reuse the name of a specific implementation.

TaylorAlexander

2 replies

23h56m

2024-09-05 18:35:38 UTC

It says my iPhone 12 Pro Max doesn’t have WebGPU, but I enabled it in experimental features and another website[1] with WebGPU demos now works. Has anyone gotten this working on iPhone? Would be nice if the web app gave more info on what failed.

[1] https://webgpu.github.io/webgpu-samples/?sample=texturedCube

nox101

0 replies

14h0m

2024-09-06 04:31:59 UTC

WebGPU support is not finished in Safari which is why it's still experimental.

KMnO4

0 replies

23h37m

2024-09-05 18:55:25 UTC

I enabled WebGPU in Safari but I'm seeing a bunch of shader errors.

WebGPU error [init][validation]: 6 errors generated while compiling the shader: 50:22: unresolved call target 'pack4x8snorm' 50:9: cannot bitcast from 'â¥' to 'f32' 54:10: unresolved call target 'unpack4x8snorm' 59:22: unresolved call target 'pack4x8unorm' 59:9: cannot bitcast from 'â¥' to 'f32' 63:9: unresolved call target 'unpack4x8unorm'

tech-no-logical

1 replies

21h16m

2024-09-05 21:16:31 UTC

getting the message

    No WebGPU available. Please use Chrome.

on chrome (Version 129.0.6668.29 (Official Build) beta (64-bit)) , under windows

Joel_Mckay

0 replies

12h59m

2024-09-06 05:32:56 UTC

Look at the #enable-unsafe-webgpu flag in chrome.

Turn it back off when done, as tools like noscript only block webgl tags.

Cheers =3

readyplayernull

1 replies

21h47m

2024-09-05 20:44:45 UTC

Will virtual geometry be integrated into GPUs some day?

cubefox

0 replies

7h36m

2024-09-06 10:56:02 UTC

Some aspects of it will likely receive hardware acceleration in the future. For example support for a standard virtual geometry mesh format, like this one proposed by AMD: https://gpuopen.com/download/publications/DGF.pdf

moffkalast

1 replies

23h37m

2024-09-05 18:55:20 UTC

No WebGPU available. Please use Chrome.

Getting that on Chromium, lol.

gpm

0 replies

23h12m

2024-09-05 19:20:07 UTC

I'm getting that in google chrome proper, but what completes the joke is that in firefox I just get a blank page without the message to use chrome.

Edit: WebGPU in chrome is behind a flag on linux: https://github.com/gpuweb/gpuweb/wiki/Implementation-Status#...

desdenova

1 replies

8h4m

2024-09-06 10:27:42 UTC

The examples don't work, though.

swiftcoder

0 replies

7h54m

2024-09-06 10:37:55 UTC

They run fine for me on Chrome/Mac

astlouis44

1 replies

20h34m

2024-09-05 21:58:11 UTC

Here's an actual implementation of UE5 in WebGPU, for anyone interested.

Just a disclaimer that it will only work on WebGPU-enabled browser on Windows (Chrome, Edge, etc) unfortunately Mac has issues for now. Also, there is no Nanite in this demo, but it will be possible in the future.

https://play.spacelancers.com/

mdaniel

0 replies

20h0m

2024-09-05 22:32:32 UTC

I was curious what "issues" Mac has, and at least for me it didn't explode for any good reason, it puked trying to JSON.stringify() some capabilities object into localStorage which is a pretty piss-poor reason to bomb loading a webpage, IMHO

vladde

0 replies

12h41m

2024-09-06 05:51:02 UTC

Got to love in the stated goals:

I could have built this with Vulkan and Rust. None would touch it.

replete

0 replies

23h10m

2024-09-05 19:22:32 UTC

Intel Mac, Chrome and ungoogled chromium: index.web.ts:159 Uncaught (in promise) OperationError: Instance dropped in popErrorScope

pjmlp

0 replies

5h37m

2024-09-06 12:55:30 UTC

On Windows it can't handle device loss.

  ID3D12Device::GetDeviceRemovedReason failed with   DXGI_ERROR_DEVICE_HUNG (0x887A0006)
   - While handling unexpected error type Internal when allowed errors are (Validation|DeviceLost).
      at CheckHRESULTImpl (..\..\third_party\dawn\src\dawn\native\d3d\D3DError.cpp:119)
      at CheckAndUpdateCompletedSerials (..\..\third_party\dawn\src\dawn\native\d3d12\QueueD3D12.cpp:179)
      at CheckPassedSerials (..\..\third_party\dawn\src\dawn\native\ExecutionQueue.cpp:48)
      at Tick (..\..\third_party\dawn\src\dawn\native\Device.cpp:1730)

  Backend messages:
   \* Device removed reason: DXGI_ERROR_DEVICE_HUNG (0x887A0006)

nox101

0 replies

13h54m

2024-09-06 04:38:32 UTC

This is amazing! I will be so great when Safari and Firefox but finish their WebGPU implementations so it runs everywhere.

moralestapia

0 replies

2024-09-05 18:25:06 UTC

Outstanding work. Also, thanks for proving actual demos of the tech. I get 60-120fps on my MBP which is phenomenal given the amount of triangles in the scene.

meindnoch

0 replies

3h10m

2024-09-06 15:22:17 UTC

Can someone explain what Nanite is? The other day someone was saying it uses software rendering because the triangles are so small. Wtf?

mbforbes

0 replies

19h13m

2024-09-05 23:19:35 UTC

Funny coincidence, was just reading through an amazing thread on the three.js form a couple days ago about a web graphics implementation of virtual geometry (nanite). webgl, 2021: https://discourse.threejs.org/t/virtually-geometric/28420

it's closed source, but I found the discussion and description of the tradeoffs interesting

jms55

0 replies

20h51m

2024-09-05 21:40:38 UTC

It's been mentioned a couple of times in this thread, but Bevy also has an implementation of Nanite's ideas (sometimes called Virtual Geometry). I'm the author of that, happy to answer questions :)

As for this project, Scthe did a great job! I've been talking with them about several parts of the process, culminating in some improvements to Bevy's code based on their experience (https://github.com/bevyengine/bevy/pull/15023). Always happy to see more people working on this, Nanite has a ton of cool ideas.

jesse__

0 replies

20h40m

2024-09-05 21:52:26 UTC

If you want to add this tech to the existing engine, I'm not a person you should be asking (I don't work in the industry).

Fucking .. bravo man.

hising

0 replies

20h48m

2024-09-05 21:44:32 UTC

I would love to see this but it wont work on Linux + Chrome even if WebGPU is enabled.

forrestthewoods

0 replies

21h56m

2024-09-05 20:36:24 UTC

Note: this isn't actually UE5 Nanite in WebGPU. It's a totally independent implementation of the same idea as Nanite.

This technique is starting to appear in a variety of places. Nanite definitely made the idea famous, but Nanite is the name a specific implementation, not the name of the technique.

cubefox

0 replies

17h7m

2024-09-06 01:24:54 UTC

Here is the somewhat neglected original 2009 dissertation by Federico Ponchio, the guy who invented the dynamic mesh simplification algorithm on which Nanite is based, with lots of illustrations:

https://vcg.isti.cnr.it/~ponchio/download/ponchio_phd.pdf (107 pages!)