I've been trying deepseek-v4-flash in OpenCode (via OpenRouter) and I'm blown away. It's no Opus, obviously, but it had zero issues with any regular coding task I threw at it. v4-flash is remarkably "good enough" for what I needed. The whole evening of coding cost me $0.52 in API credits.
How does it compare to popular local inference engines, e.g. ollama, lm studio, or handrolled llama.cpp? I saw a brief benchmark in the readme but wasn't sure if there was more.