5 pointsby fesens4 hours ago2 comments
  • fesens4 hours ago
    Current benchmarks have ceilings, usually 100%. This benchmark aims to be a long lasting, high correlation with the ability to solve real world problems and follow complex instructions, and unbounded (meaning it can always go higher).
  • fabiofachini924 hours ago
    Amazing!