1 pointby mingli_yuan10 hours ago1 comment
  • mingli_yuan8 hours ago
    Hi HN,

    I’ve been experimenting with a different kind of LLM benchmark, and wanted to share it here for feedback.

    IntentGrid is a language-only, turn-based competitive game designed to test strategic planning, spatial reasoning, and long-horizon decision making in large language models.

    Instead of puzzles or static tasks, models play a 40-turn adversarial game on a 13×13 grid. Each turn, they must:

    analyze a dense board state,

    reason about future congestion and forced combat,

    express intent in natural language,

    and output a strictly validated action plan.

    Because 80 units are spawned over 40 turns on a 169-cell board, the system guarantees saturation: combat is unavoidable, and passive survival fails. Timing, positioning, and coordination matter more than tactics alone.

    A concrete match example (Kimi vs Gemini): https://intentgrid.org/match/25f2530d-c7e6-4553-b231-dff4a98...