Hacker News
new
top
best
ask
show
job
Aihumanbench
1 point
by
AIhumanbench
4 hours ago
2 comments
AIhumanbench
4 hours ago
aihumanbench.com
rad-b
3 hours ago
Seems interesting but testing myself only yields my results? How would I compare the result to a frontier model, that part seems to be missing?
Also, the tests seem to be heavily skewed in favor of what LLMs are good at.
AIhumanbench
4 hours ago
[flagged]