15
12bitfloat
242d

Sometimes I just don't know what to say anymore

I'm working on my engine and I really wanna push high triangle counts. I'm doing a pretty cool technique called visibility rendering and it's great because it kind of balances out some known causes of bad performance on GPUs (namely that pixels are always rasterized in quads, which is especially bad for small triangles)

So then I come across this post https://tellusim.com/compute-raster... which shows some fantastic results and just for the fun of it I implement it. Like not optimized or anything just a quick and dirty toy demo to see what sort of performance I can get

... I just don't know what to say. Using actual hardware accelerated rasterization, which GPUs are literally designed to be good at, I render about 37 million triangles in 3.6 ms. Eh, fine but not great. Then I implement this guys unoptimized(!) software rasterizer and I render the same scene in 0.5 ms?!

IT'S LITERALLY A COMPUTE SHADER. I rasterize the triangles manually IN SOFTWARE and write them out with 64-bit atomic image stores. HOW IS THIS FASTER THAN ACTUAL HARDWARE!???
AND BY LIKE A ORDER OF MAGNITUDE AT THAT???

Like I even tried doing some optimizations like backface cone culling on the meshlets, but doing that makes it slower. HOW. Im rendering 37 million triangles without ANY fancy tricks. No hi-z depth culling which a GPU would normally do. No backface culling which a GPU with normally do. Not even damn clipping of triangles. I render ALL of them ALL the time. At 0.5 ms

Comments
  • 5
    Oh and the worst thing is, even if I remove the call to the rasterize function, at which point the shader has no externally visible side effects and thus should be optimized to basically a no-op, it still takes 0.25 ms... of 0.6 ms total... without doing literally anything

    Does jensen huang have a deal with the devil or something, sacrificing babies in order for this graphics cards to be able to go backwards in time and compute things before they even exist??
  • 1
    If this is generalized then games are going to get really interesting really quick. Is this method patented? Are we going to be soft locked for 20 years?
  • 1
    As far as what to say. If you can use it say: thank you to them.
  • 6
    iirc GPUs now have more overhead cost to do simple functionality because they're optimized to do complicated functionality

    so basically if you do something simple it goes through the pipeline to do complex things and the two have the same performance

    this way they didn't have to put different pipelines in the GPU and could stuff more raw power into it for advanced games without wasting space real-estate for basic old video games functionality... basically power over adaptability
  • 2
    @Demolishun No, it's basically what UE5's Nanite is doing. It's pretty crazy though because "compute shader all the things" has been a meme for quite a time now, but holy shit, I didn't know that "don't use the literal built-in hardware at all" was also a thing
  • 4
    @jestdotty I feel like theres an xkcd about this.. the more features you have the slower it gets. And then some newcomer comes and is insanely fast.... until they have feature parity, at which time they are just as slow lol
  • 2
    @12bitfloat this is really exciting then! I hope this makes its way into Godot.

    This also reminds me of a dude that was doing some sort of voxel projection stuff years ago. I cannot pretend I understood what he was doing. But it was able to do high res stuff on shitty hardware in 3D. Not sure how fast it was. It just sorta disappeared. I figured the guy prob got hired by a hardware company.
Add Comment