58 pointsby at20059 hours ago6 comments
  • algo_trader3 minutes ago
    great write up (and effort!! ;))

    what are your thoughts on MCTS for coding?

    this can/must be paired with a smart execution harness to optimise roll out and roll back of execution paths and system state.

    does this change the calculus for optimal post-training ?

  • natufunu3 hours ago
    Great post! I wonder why MCTS is not more popular as a test time compute harness. Did you compare performance of MCTS (without distillation) against other methods (eg best of N) with the same compute budget?
  • supermdguy5 hours ago
    > One might note that MCTS uses more inference compute on a per-sample basis than GRPO: of course it performs better

    This part confused me, it sounded like they were only doing the MCTS at train time, and then using GRPO to distill the MCTS policy into the model weights. So wouldn’t the model still have the same inference cost?

    • at20053 hours ago
      Ah, I meant that MCTS uses more inference-time compute (over GRPO) to produce a training sample
  • devcraft_aian hour ago
    [dead]
  • biang153431006 hours ago
    [flagged]
  • puildupO7 hours ago
    [dead]