Edit: Having messed around with it now (and read the .pdf), it seems like they've left behind their original principle of making tests that are easy for humans and hard for machines. I'm still not convinced that a model that's good at these sorts of puzzles is necessarily better at reasoning in the real world, but am open to being convinced otherwise.
Apparently those games supposed to be hard.
If you mess around a little bit, you will figure it out. There are only a few rules.