8 pointsby davidvgilmore3 hours ago2 comments
  • camomileandmilk3 hours ago
    Can you elaborate on this "Sonnet and Haiku are almost always less capability-per-dollar than open models"?
    • davidvgilmore3 hours ago
      Yes - in short, open models like Deepseek, Mimo, Kimi, and GLM tend to complete tasks with less tokens and cost less per token than both Sonnet and Haiku. So those models are more cost efficient, and we often think of that as them having higher "capability-per-dollar" than Sonnet or Haiku.

      Much of Claude Code's internal model routing ends up delegating tasks to Sonnet or Haiku, so by intercepting those calls and using open models instead, we often see better performance at a better price.

      • camomileandmilk2 hours ago
        yeah, I get you now. but those are all Chinese hosted right? Don't think my company will enable us using them.
        • davidvgilmore2 hours ago
          Many of them are produced by Chinese labs. Some, like Neomotron, are U.S. made. And we support inference providers in both the U.S. and overseas.

          If geography is important, we can restrict which geos inference takes place in. And if you don't want to use Chinese-trained models, you can use others like Mistral, Neomotron, Google's, or OpenAI's.

  • oypass3 hours ago
    How is this different from open router?
    • davidvgilmore3 hours ago
      Four ways: (1) We are built specifically for Claude Code model routing. (2) We route at a subagent/subtask level. (3) We support on-device routing. (4) We have a built-in ML router trained specifically to route Claude Code subagent tasks. Its use is optional.
      • oypass2 hours ago
        What is the benefits of on device routing? How do you decide if the task can be run on device?
        • davidvgilmore2 hours ago
          For those that have capable enough hardware, it's effectively free to run subtasks on-device. (just the marginal cost of additional electricity).

          With Google's most recent 12b param Gemma model, even Mac users with just 16gb of unified memory can offload some tasks on-device.