As a software guy who follows chip evolution more at a macro level like: new design + process node enabling better cores/tiles/units/clocks + new architecture enabling better caches, busses, I/O == better IPC, bandwidth, latency and throughput at given budget (cost, watts, heat, space) - I've yet to find anything which gives a sense of Rubin's likely lift vs the prior generation that's grounded in macro-but-concrete specs (such as cores, tiles, units, clocks, caches, busses, IPC, bandwidth, latency, throughput).
Edit: I found something a bit closer after scrolling down on a sub-link from the page you linked (https://developer.nvidia.com/blog/inside-the-nvidia-rubin-pl...).
For context, my understanding is that companies have recently moved to mark their expected GPU deprecation cycles from 3 years to as high as 6 which has huge impacts on projected expenditures.
I wonder what the step was for the Blackwell platform from the previous. Is this slower which might indicate that the slower deprecation cycle is warranted, or faster?
Technical details available here https://developer.nvidia.com/blog/inside-the-nvidia-rubin-pl...
So it's an all-NVidia solution - CPU, interconnects, AI GPUs.
Maybe it's caused by `codesign` tools? Like `codesign --extreme` which probably requires two signers to sign one thing?
I know that memory bandwidth tends to be a big limiting factor, but I'm trying to understand how this factors into it its overall perf, compared to blackwell.
Also: Tim Cook / Apple is noticeably absent.
Reading this line, I had a funny image form of some NVidia PR newbie reflexively reaching out to Lisa Su for a supporting quote and Lisa actually considering it for a few seconds. The AI bubble really has reached a level of "We must all hang together or we'll surely hang separately".