And the active parameters come from the experts. For each token the model picks some experts to run the pass (usually 2 to 4, I haven't read V4's papers). It's not always the same experts.
OTOH, being DeepSeek, I foresee a bunch of V4 distilled FP8 models fitting in a 5090 with tiny batches and with performance close from 75 to 85% of V4. And this might be good enough for many everyday tasks.
Today is a good day for open models. Thank god for DeepSeek.
"Pro" $3.48 / 1M output tokens vs $4.40 for GLM 5.1 or $4.00 for Kimi K2.6
"Flash" is only $0.28 / 1M and seems quite competent
(EDIT: Note that if you hit the setting that opencode etc hit (deepseek-chat / deepseek-reasoner) for DeepSeek API, it appears to be "flash".)