This is how every LLM API has worked for years; the API is a stateless token machine, and the prompts + turns are managed by the client application. If anything it's interesting how standard it is; no inside baseball, they just use the normal public API.
I use both Claude Code and Xcode with a local LLM (running with LM Studio) and I noticed they both have system prompts that make it work like magic.
If anyone reading this interested in setting up Claude Code to run offline, I followed these instructions:
https://medium.com/@luongnv89/setting-up-claude-code-locally...
My personal LLM preference is for Qwen3-Next-80B with 4bit quantization, about ~45GB in ram.