1 pointby zhebrak4 hours ago1 comment
  • zhebrak4 hours ago
    Link: https://simulator.zhebrak.io?welcome

    You are the Compute Officer aboard a generation ship. Systems are failing, a signal arrives from deep space, and every mission is a real distributed ML problem — fix OOM errors, configure tensor parallelism, scale training across clusters, optimise inference throughput.

    The game runs on a first-principles physics engine: FLOPs, memory bandwidth, collective communication, pipeline bubbles. Calibrated against published runs from Meta, DeepSeek, and NVIDIA within 1-2% MFU.

    There's also a Learn mode with 60 tasks (from beginner to advanced) covering both training and inference, and a full simulator for exploration and planning, if you are not into the story. All client-side, no backend.

    GitHub: https://github.com/zhebrak/llm-cluster-simulator