2 pointsby famouswaffles4 hours ago1 comment
  • famouswaffles4 hours ago
    An Open Code Instance with Read, Grep, Bash tools achieved human performance on the preview games

    For the full benchmark, The ARC-AGI 3 paper confirms Opus 4.6 scored 97.1%.

    https://arcprize.org/media/ARC_AGI_3_Technical_Report.pdf

    I was wondering why the scoring for 3 was so convoluted and I'm starting to see why. This is a solved benchmark in any way that matters.