Much of Claude Code's internal model routing ends up delegating tasks to Sonnet or Haiku, so by intercepting those calls and using open models instead, we often see better performance at a better price.
If geography is important, we can restrict which geos inference takes place in. And if you don't want to use Chinese-trained models, you can use others like Mistral, Neomotron, Google's, or OpenAI's.
With Google's most recent 12b param Gemma model, even Mac users with just 16gb of unified memory can offload some tasks on-device.