4 pointsby gertlabs4 hours ago1 comment
  • gertlabs3 hours ago
    We've been working on a way to address the obvious problems with existing benchmarks, by creating a single comprehensive benchmark that measures things that technical people care about, while also getting as close to an objective, "core intelligence" measurement as possible.

    Some demo games are shown on /spectate that gives you an idea of how we test models and why this would be difficult to benchmax. I think our benchmark is by far the best relative measurement of artificial intelligence out there. Feedback is welcome and usually acted upon quickly.