3 pointsby ramshanker4 hours ago3 comments
  • lifecodes4 hours ago
    I guess we are reaching the point where “10T parsmeters” sounds more like a marketing number than a meaningful metric.

    Between moE, aggressive quantization, and synthetic data pipelines, it’s getting harder to tell whether bigger models are actually better, or just more expensive to train.

    Would be more interesting to see -> capability per dollar or per watt, not parameter count...

  • bfeynman2 hours ago
    Isn't what the leading labs are currently chasing after is not pretraining and massive parameters but enriched and deep fine tuning and post training for agentic tasks/coding? MoE with just new post training paradigms lets smaller models perform quite well, and much more pragmatic to scale inference with. Given that, this choice seems super odd, as the frontier labs seem to stay neck and neck, and I don't even see Grok being used in any benchmarks because of how poorly it performs
  • ramshanker4 hours ago
    This is the best publically posted model size, ever since top AI labs started treating model size as a trade secret. This should also guide next generation of inference ASICs.