1 pointby AIhumanbench4 hours ago2 comments
  • AIhumanbench4 hours ago
    aihumanbench.com
    • rad-b3 hours ago
      Seems interesting but testing myself only yields my results? How would I compare the result to a frontier model, that part seems to be missing?

      Also, the tests seem to be heavily skewed in favor of what LLMs are good at.

  • AIhumanbench4 hours ago
    [flagged]