Why Weibo's tiny VibeThinker-3B has the AI world arguing over benchmarks again(venturebeat.com)

18 pointsby gmays5 hours ago1 comment

embedding-shape4 hours ago
> The model, called VibeThinker-3B, scored 94.3 on AIME 2026 — the American Invitational Mathematics Examination, one of the most demanding standardized math competitions in the world. That figure places it alongside DeepSeek V3.2, a model with 671 billion parameters
Overfitting, no need to argue about anything I think?
The rest of the article seems to echoing people's misunderstanding of pretty elementary stuff.
- crote14 minutes ago
  That's the obvious answer, yes. But if they are doing it, why should anyone assume the competition isn't doing it?
  If it is possible to cheat on the benchmarks used to judge AI performance, how can the general population be certain that any of the AI "innovation" is genuine? Is there true development here worth the many-billion-dollar investments, or are we seeing an industry-wide case of them doing a Theranos by faking the results and hoping they can do real innovation before anyone finds out?