1 pointby Achiyacohen5 hours ago1 comment
  • Achiyacohen5 hours ago
    Author here. Some context that didn't fit in the title.

    I built safari-mcp a few weeks ago — a macOS-native Safari automation MCP server (no Chrome, no headless, keeps Safari logins). 84 tools via the Model Context Protocol, used directly by Claude Code, Cursor, Cline, etc.

    When I saw HKUDS/CLI-Anything (29k stars, auto-wraps open-source software as agent-native CLIs), I wondered if wrapping safari-mcp as a CLI was actually a good idea — so I benchmarked it before shipping.

    The numbers, measured live against real Safari:

      Per-call latency (10x list_tabs, warm cache):
        MCP (persistent stdio session):   119ms median
        CLI (subprocess per call):      3,023ms median
        MCP is 25.3x faster.
    
      5-op reactive workflow:
        MCP:                  2.7s
        CLI sequential:      15.3s
        CLI shell pipeline:  15.2s
        MCP 5.6x faster (pipelining does NOT amortize npx spawn).
    
      Token overhead per API call (real tools.json, cl100k_base tokenizer):
        MCP (84 tool definitions):  7,986 tokens
        CLI (just `bash` tool def):    95 tokens
        CLI 84x fewer per-call tokens.
    
      Accuracy: byte-identical output (both paths hit the same safari-mcp).
    
    So for Claude Code / Cursor / Cline users, MCP is the right answer — 25x lower latency per call. I say this up front in the harness's README and SKILL.md.

    The CLI exists for a different audience:

    - Agents that don't speak MCP (Codex CLI, GitHub Copilot CLI, older frameworks, bash scripts) - CI / cron — subprocess-friendly, jq-pipeable JSON output - Long Opus sessions where tool-def tokens dominate cost. At $15/MTok input, sending 7,986 tokens of tool definitions on every API call adds up. 100-turn session: ~$12 in tool-def overhead for MCP vs ~$0.22 for CLI. Prompt caching narrows the gap to ~10x but it's still real money at scale.

    The harness is schema-driven: an offline parser reads safari-mcp's Zod definitions, emits a JSON bundle, and at import time safari_cli.py generates 84 Click commands from it — zero manual mapping, parity tests pin the result. The parser went through 5 review rounds before I caught everything, including a sneaky nested-schema bug where .describe() was picked from the inner field instead of the outer.

    Happy to answer questions about the architecture, the benchmark methodology, or why it took 5 review rounds to find all the bugs.

    Full writeup with methodology and the bug post-mortems: https://dev.to/achiya-automation/mcp-vs-cli-for-browser-auto...

    safari-mcp repo: https://github.com/achiya-automation/safari-mcp