Hacker News
new
top
best
ask
show
job
VibeBench: Measuring 1k Engineers' Opinions of New Models
(
vibebench.standardagents.ai
)
6 points
by
jpschroeder
3 hours ago
1 comment
mhi3
3 hours ago
"Published benchmarks are gamed, optimized, and overfit, and no longer yield a useful signal."
Is this true?
But I love this concept!
jpschroeder
3 hours ago
Oh very true. Benchmaxxing itself is basically gaming them.