Hacker News
new
top
best
ask
show
job
Open model StepFun-3.5 is #1 on MathArena, an uncheatable math benchmark
(
twitter.com
)
1 point
by
diyer22
4 hours ago
1 comment
falcor84
3 hours ago
How is "uncheatable"? If you know the exact olympiad questions it's being assessed on, what's stopping you from massaging it until it gets more of them right than the previous number 1?