Memory bandwidth is 273 Gb/s. Which is nowhere near a GPU’s. It’s a 4K machine. Personally, I’d rather have two GPUs and run a quantize model. I have two 32GB AMD r9700 cards, cost $2600. Quantized models get me 120K ish of context window and TPS is about 60% of what I see with the same model on my 4090 (which only has enough vram to load weights and about 6K context).
Sure I can’t run a 100B+ model but neither can a single GB10 unless no context window is what you are going for. So you buy a second 4K machine?
At least this thing is actually useful, and there are $3k variants available.