The warm pool pattern for Kata VMs makes a huge difference for UX. Cold-starting a microVM every time would kill the conversational flow. Curious how many warm VMs you typically keep ready and what the memory overhead looks like per idle VM?
One observation from the SDK comparison: the harness quality seems to matter as much as the model. OpenCode with local models struggles not because the models can't do tool calls, but because smaller context windows make the harness's prompt engineering fall apart. Wonder if there's room for a "lightweight harness" optimized for local inference with aggressive context management.