Nevertheless, when browsing the results, there are several things that must be kept in mind:
1. Much of the advantage of the NVIDIA Vera CPU is caused not by having better CPU cores, but by having a much faster memory interface than the CPUs to which it has been compared.
2. These benchmarks have been specifically approved by NVIDIA, as being workloads where Vera is competitive. It was forbidden to publish other benchmarks. There are plenty of other benchmarks, especially for scientific/technical computing, where Vera would not be competitive with x86 CPUs.
3. Nowadays, AMD Zen 5 still beats Vera in most benchmarks of interest, but it is already an ancient CPU core. Over its lifetime, NVIDIA Vera will compete with AMD Zen 6, which will be launched in a few months from now and which is expected to be much faster than Zen 5, including by having a faster and wider memory interface. Also with Intel, while Granite Rapids looks rather pathetic nowadays, the future Diamond Rapids should be greatly improved and it will be the competitor overlapping with Vera in its lifetime.
Also, does that mean that once the AI bubble pops, Nvidia come come to the consumer market with a powerful ARM gaming SoC?
Are we rewriting history? Nvidia had consumer ARM SoCs long ago before Qualcomm pushed them out with patents, modems, etc. See: Tegra.
Early day Tegra had quite a bit of Android market share (and they're also currently in the Nintendo Switch).
I feel like at a certain point there are just going to be big SOC packages with 128gb of ram and stacks of cores (each with their own "local" cache) and the 128gb "local" HBM on-package ram will just be the 4th or 5th level cache, and big server boards will have 4 of those and CXL elsewhere for "main" memory.
And things like the VAST stuff also blur lines between high speed local storage and less performant san or bulk commodity storage.
The old memory / storage hierarchies are getting mixed up (again).
Interesting times.
Not as fast as raw PCIe slots inside the chassis, but I don't think the ghost of steve jobs cares much. He'd tell you that if you've got no taste at all you can put his beautiful machine into ugly junk like from sonnettech and you can go do ugly things elsewhere.
I think apple's happy taking the market they've got and they'll leave the big guns HPC market to nvidia. The margins look great for nvidia right now, but I suspect nvidia's path will be similar to dram's boom/bust cycle more than apple's continuous "premium tool" brand's market positioning.
HBM tends to be integrated onto the package (board, multi chip module, die) because there are really tight signaling and wire routing constraints that make "modularity" impossible.
I remember back in the day you could get motherboards for your 286, 386, and sometimes even 486 with external L1 / L2 / L3 cache -- you'd buy a bunch of static ram dips that you'd populate sockets next to the CPU, and set a bios or DIP switch to enable it. These days that's just not practical because there are too many wires interconnecting the cache to the dies and cache coherence logic, and the speed of light is just too slow and electricity is too messy to put "external" to the die/chip/package, even if the packaging issues could be addressed.
HBM memory is similar -- it's not practical to make a generic interconnect that'd actually work reliably enough to provide field replaceable memory modules as you can with DDR style dimms.
EDIT:
Apparently I'm totally wrong in that these "SOCAMM2" modules have thousands of pads (like a CPU socket) and can in fact run with the same data bus width (1024 bits wide!) as "local" HBM. Very cool. And please ignore my out of date blatherings above. It's still not quite as fast as if you put the HBM in the package, but it's way faster than the DDR style setup.
There is an option already, at least from AMD, in the HEDT segment - Threadripper/Pro has 4/8 channels (although the bandwidth is not a high as Apple chips).
The thing is, it doesn't have to do anything. It is busy getting bailed out, I guess.