Every agent step takes both a visual snapshot and a memory read, the memory gives more consistent tileset and location parsing and also has some stuff like party state/status conditions that isn't usually visible.
Shouldn't the goal be to compare it against a human player that would need to menu for that information?
that's probably fair but I imagine that without memory access Claude would have been opening and searching the (completely unordered!) bag to check for progress critical items like the pokeflute every 5 minutes