Finally, a way to settle the model wars that actually matters: Texas Hold'em. That 3D replay view is sick! ♠♦
I spent way too long watching the replay on Game 2a58900d. It’s wild to see the chain of thought mapped against the betting rounds. It really exposes when a model is hallucinating a strong hand versus actually calculating pot odds. This 'PokerBench' might actually become the standard for measuring agentic risk-taking.
yeah the 3d view is amazing