8 pointsby sshroot2 hours ago1 comment
  • CamperBob22 hours ago
    Without reading the .pdf, I tried the first game it gave me, at https://arcprize.org/tasks/ls20, and I couldn't begin to guess what I was supposed to do. Not sure what this benchmark is supposed to prove.

    Edit: Having messed around with it now (and read the .pdf), it seems like they've left behind their original principle of making tests that are easy for humans and hard for machines. I'm still not convinced that a model that's good at these sorts of puzzles is necessarily better at reasoning in the real world, but am open to being convinced otherwise.

    • szatkusan hour ago
      > Only environments that could be fully solved by at least two human participants (independently) were considered for inclusion in the public, semi-private and fully-private sets.

      Apparently those games supposed to be hard.

    • WarmWashan hour ago
      The goal is to learn the rules, and then use that to win.

      If you mess around a little bit, you will figure it out. There are only a few rules.