Agent scaffold comparison. We additionally evaluateOpenCode, an open-source scaffold that supports multiplemodel providers. Native CLI scaffolds consistently outper-form OpenCode when using the same underlying model.GPT-5.1 Codex Max achieves 20.2% on Codex CLI butonly 7.7% on OpenCode. Similarly, Gemini 3 Pro scores18.3% on Gemini CLI versus 14.9% on OpenCode. The
one exception is Claude Opus 4.5, which scores 17.1% on Claude Code and 17.3% on OpenCode — effectively equivalent, and the only case where the open-source scaffold
matches or slightly exceeds the native one.