1 pointby lebovic2 hours ago1 comment
  • lebovic2 hours ago
    GLM 5.1 is surprisingly capable. Anecdotally, I couldn't notice a difference until ~120K tokens.

    Qwen 3.6 35B A3B also exceeded my expectations. It's surprisingly performant, even though the previous generation wasn't even able to use the testing harness.

    (Tbd on Kimi K2.6; the eval is still running.)