18 pointsby gmays5 hours ago1 comment
  • embedding-shape4 hours ago
    > The model, called VibeThinker-3B, scored 94.3 on AIME 2026 — the American Invitational Mathematics Examination, one of the most demanding standardized math competitions in the world. That figure places it alongside DeepSeek V3.2, a model with 671 billion parameters

    Overfitting, no need to argue about anything I think?

    The rest of the article seems to echoing people's misunderstanding of pretty elementary stuff.

    • crote14 minutes ago
      That's the obvious answer, yes. But if they are doing it, why should anyone assume the competition isn't doing it?

      If it is possible to cheat on the benchmarks used to judge AI performance, how can the general population be certain that any of the AI "innovation" is genuine? Is there true development here worth the many-billion-dollar investments, or are we seeing an industry-wide case of them doing a Theranos by faking the results and hoping they can do real innovation before anyone finds out?