15 pointsby tosh9 days ago2 comments

2001zhaozhao8 days ago
It would be interesting to see if they have an updated version of a model that employs this training technique. According to the paper it scored well on release (65.8% on SWE bench), but by now it no longer scores competitively against the latest generation open coding models (e.g. Devstral Small 2).
I wonder whether other labs have implemented something similar to this approach. Perhaps code world modeling isn't actually necessary (relative to other simpler techniques) to achieve the kind of deep environment understanding that the paper touts as being important to improve agentic coding performance.
- general_reveal8 days ago
  Serious question. How do we know these bench suites are any good?
chid8 days ago
Given the high bar of entry 160VRAM GPU - is there anything practical one can use this for?
- omneity8 days ago
  The model being 32B could run in <20GB VRAM with Q4 quantization (minimal loss of quality), or 80GB unquantized at full fidelity. The quoted 160GB is for their recommended evaluation settings.
  There's a few pre-quantized options[0] or you can quantize it yourself with llama.cpp[1]. You can run the resulting gguf with llama.cpp `llama-cli` or `llama-server`, with LM Studio or with Ollama.
  0: https://huggingface.co/models?search=cwm%20q4%20gguf
  1: https://huggingface.co/spaces/ggml-org/gguf-my-repo
  - chid8 days ago
    I see, still a fair more VRAM than I have access to. Thanks for sharing that information.