The tech looks promising. Massive savings in VRAM (and potentially in disk space, too.) The best results occur at 1440p using TAA. When using DLSS and/or a higher resolution, you lose more performance on the 40-series. I will be testing this on the 5090 after it arrives. Demo can be found here: github.com/NVIDIA-RTX/RTXNTC
The performance comparison very likely doesn't translate to actual games. A texture using less VRAM also means it uses less of the memory bus. But this test probably doesn't have any sort of VRAM bandwidth bottleneck, and it's framerate is unrealistically high. So any sort of fixed overhead in memory access is amplified due to high framerate, and the memory bandwidth bottlenecks lots of games run into are ignored. In full games it's very much possible performance is better with neural textures than without.
Reminder, with the existence of CAMM2 RAM module, it's literally possible for end-users to add their own VRAM and nobody is stopping any GPU makers to make it a reality. Sure, the Read-Write-Delete speed is slower than normal soldered VRAM. But it's still faster than calling it from Motherboard's RAM.
i dont think thats how it works. if it was so simple we could have already fitted a sodimm slot to a gpu, or a full size slot with how big these 4-5 slot cards are now. gddr is not the same as ddr, and runs faster because it is soldered.
Yes, it's likely making use of the Tensor cores, so now the bottleneck is being shifted from raster performance to AI accelerator performance. Utterly ridiculous, and all to save adding an extra $20-50 of VRAM to cards just so they can maintain their eroding stranglehold on AI. Nvidia needs to move away from their expensive monolithic dies (Blackwell is not a "true" chiplet architecture, it's two monolithic dies strapped together) or otherwise they might be in for a rude awakening once Zluda gets back up off the ground. Seriously heartbreaking that AMD cancelled big UDNA this generation because from the performance leaks of the 9070XT, it appears like it would have been a fairly fierce competitor.
so, less ram requirement for a card that isnt short on VRAM but you lose tons of FPS because you need to share the rest of the AI cores.... I think this tech might be more beneficial to low VRAM cards...
Memory bandwidth can often be a bottleneck, so it wouldn’t only be about saving vram allocation. I’m guessing you’d only save bandwidth on the sample mode though.
Very interesting. Do you think it is enough to have int8 and fp8 hardware to be able to support such tech then. Or does Nvidia have some other special hardware sauce to make it possible. I am obviously asking if this is generalized enough, so that is can and will get implemented in future vulkan and directx from more hardware vendors.
But also typically you have much more than just 90MB of textures in a game. The more neural textures you have, more time the GPU needs to decompress them.
This is dogshit, the "reference materials" could be reduced to 50-60mb with minimal quality loss without losing any performance, this is just another lazy optimization ai trick that will make games look even worse.
The tech looks promising. Massive savings in VRAM (and potentially in disk space, too.) The best results occur at 1440p using TAA. When using DLSS and/or a higher resolution, you lose more performance on the 40-series. I will be testing this on the 5090 after it arrives.
Demo can be found here: github.com/NVIDIA-RTX/RTXNTC
Interesting, there is a perfomance hit, but the VRAM reduction is insane! I wonder how this would run on an RTX 4060
The performance comparison very likely doesn't translate to actual games. A texture using less VRAM also means it uses less of the memory bus. But this test probably doesn't have any sort of VRAM bandwidth bottleneck, and it's framerate is unrealistically high. So any sort of fixed overhead in memory access is amplified due to high framerate, and the memory bandwidth bottlenecks lots of games run into are ignored. In full games it's very much possible performance is better with neural textures than without.
Reminder, with the existence of CAMM2 RAM module, it's literally possible for end-users to add their own VRAM and nobody is stopping any GPU makers to make it a reality.
Sure, the Read-Write-Delete speed is slower than normal soldered VRAM. But it's still faster than calling it from Motherboard's RAM.
i dont think thats how it works. if it was so simple we could have already fitted a sodimm slot to a gpu, or a full size slot with how big these 4-5 slot cards are now. gddr is not the same as ddr, and runs faster because it is soldered.
Is DLSS sharing the hardware and that's why it lowers performance? Would be interesting to see results from a more complex scene.
It looks that way. I am interested to see how it performs on Blackwell. I will be testing that soon, hopefully.
Yes, it's likely making use of the Tensor cores, so now the bottleneck is being shifted from raster performance to AI accelerator performance. Utterly ridiculous, and all to save adding an extra $20-50 of VRAM to cards just so they can maintain their eroding stranglehold on AI. Nvidia needs to move away from their expensive monolithic dies (Blackwell is not a "true" chiplet architecture, it's two monolithic dies strapped together) or otherwise they might be in for a rude awakening once Zluda gets back up off the ground. Seriously heartbreaking that AMD cancelled big UDNA this generation because from the performance leaks of the 9070XT, it appears like it would have been a fairly fierce competitor.
so, less ram requirement for a card that isnt short on VRAM
but you lose tons of FPS because you need to share the rest of the AI cores....
I think this tech might be more beneficial to low VRAM cards...
Memory bandwidth can often be a bottleneck, so it wouldn’t only be about saving vram allocation. I’m guessing you’d only save bandwidth on the sample mode though.
So, whats the benefit? Can we play games with high vram requirements on low vram cards?
How did you change the resolution?
Very interesting. Do you think it is enough to have int8 and fp8 hardware to be able to support such tech then. Or does Nvidia have some other special hardware sauce to make it possible. I am obviously asking if this is generalized enough, so that is can and will get implemented in future vulkan and directx from more hardware vendors.
Keep in mind that typically your frame time will be much higher, so this overhead would be comparatively much smaller for something like 60 fps.
But also typically you have much more than just 90MB of textures in a game. The more neural textures you have, more time the GPU needs to decompress them.
watch nvidia use this to justify 8gb on a 400 euro GPU. The way was paved by apple with the '8gb on mac is the same as 16gb on pc' scam
This is dogshit, the "reference materials" could be reduced to 50-60mb with minimal quality loss without losing any performance, this is just another lazy optimization ai trick that will make games look even worse.