5 pointsby sampleSal7 hours ago1 comment
  • sampleSal7 hours ago
    We're building AI agents on OpenClaw and were burning $1,100/week on Anthropic API calls.

    No idea if our prompting strategy was inefficient or if everyone was paying this much.

    Built a quick benchmarking tool: https://local001.com/tokens

    Submit your weekly spend + provider + use case → see your percentile + comparisons.

    The dataset is early — it gets more useful the more people submit. But here's why I built this:

    We're spending $1,100/week on Anthropic for a mix of coding agents and personal assistant tasks. I have no idea if that's normal or insane. Specifically:

    Are we overspending by use case? Our coding agent burns ~$700/week and the assistant tasks burn ~$400. But I don't know what "good" looks like. Is $700/week for an agentic coding workflow competitive? Are teams doing similar work at $200? $2,000? There's zero public data on this.

    Are we overspending on Anthropic? We're all-in on Claude right now. For coding tasks, maybe that's the right call. But for assistant/chat workflows — should we be routing half of that to GPT-4o or Gemini and cutting costs 60%? I genuinely don't know, and I haven't seen anyone publish real cost comparisons by task type, not just benchmark scores.

    That's what this tool is for. Submit your weekly spend, provider, and use case → see where you land. If 50 teams submit data, we'll finally have a real answer to "is Anthropic worth the premium for X?"

    Open questions:

    Should we track tokens/$ instead of just $?

    Should we separate o1/reasoning models vs base models?

    How do you benchmark "efficiency" vs raw spend?

    Built with Next.js + Cloudflare Workers + D1. Submissions are anonymous (just hashed IPs).

    Long-term goal: use this data to negotiate bulk API rates with Anthropic/OpenAI/Google.

    How would you improve this?

    https://local001.com/tokens