I built safari-mcp a few weeks ago — a macOS-native Safari automation MCP server (no Chrome, no headless, keeps Safari logins). 84 tools via the Model Context Protocol, used directly by Claude Code, Cursor, Cline, etc.
When I saw HKUDS/CLI-Anything (29k stars, auto-wraps open-source software as agent-native CLIs), I wondered if wrapping safari-mcp as a CLI was actually a good idea — so I benchmarked it before shipping.
The numbers, measured live against real Safari:
Per-call latency (10x list_tabs, warm cache):
MCP (persistent stdio session): 119ms median
CLI (subprocess per call): 3,023ms median
MCP is 25.3x faster.
5-op reactive workflow:
MCP: 2.7s
CLI sequential: 15.3s
CLI shell pipeline: 15.2s
MCP 5.6x faster (pipelining does NOT amortize npx spawn).
Token overhead per API call (real tools.json, cl100k_base tokenizer):
MCP (84 tool definitions): 7,986 tokens
CLI (just `bash` tool def): 95 tokens
CLI 84x fewer per-call tokens.
Accuracy: byte-identical output (both paths hit the same safari-mcp).
So for Claude Code / Cursor / Cline users, MCP is the right answer — 25x lower latency per call. I say this up front in the harness's README and SKILL.md.The CLI exists for a different audience:
- Agents that don't speak MCP (Codex CLI, GitHub Copilot CLI, older frameworks, bash scripts) - CI / cron — subprocess-friendly, jq-pipeable JSON output - Long Opus sessions where tool-def tokens dominate cost. At $15/MTok input, sending 7,986 tokens of tool definitions on every API call adds up. 100-turn session: ~$12 in tool-def overhead for MCP vs ~$0.22 for CLI. Prompt caching narrows the gap to ~10x but it's still real money at scale.
The harness is schema-driven: an offline parser reads safari-mcp's Zod definitions, emits a JSON bundle, and at import time safari_cli.py generates 84 Click commands from it — zero manual mapping, parity tests pin the result. The parser went through 5 review rounds before I caught everything, including a sneaky nested-schema bug where .describe() was picked from the inner field instead of the outer.
Happy to answer questions about the architecture, the benchmark methodology, or why it took 5 review rounds to find all the bugs.
Full writeup with methodology and the bug post-mortems: https://dev.to/achiya-automation/mcp-vs-cli-for-browser-auto...
safari-mcp repo: https://github.com/achiya-automation/safari-mcp