1 pointby robinbanner7 hours ago1 comment
  • robinbanner7 hours ago
    Backstory: I was building a customer support AI for a client last year. We started with Claude Opus for everything because it worked great. The bill was $250/month for maybe 10K conversations.

    Then I looked at the actual queries. 70% were things like "what are your hours?" and "how do I return something?" — questions where a $0.80/M-token model gives the same answer as a $15/M-token model. But about 5% were genuinely complex (multi-step troubleshooting, product comparisons requiring reasoning) where Opus was noticeably better.

    I started manually routing: simple patterns to a cheap model, everything else to Opus. The bill dropped to $40/month with no quality complaints from users. But maintaining the routing logic across projects got tedious — every new app needed the same classification + model selection + failover logic.

    So I built Komilion to package it up. The classification runs in two stages:

    1. A regex fast path catches ~60% of requests instantly (greetings, FAQ patterns, simple classification tasks). Zero API calls, under 5ms.

    2. For the rest, a lightweight LLM classifier determines task type and complexity, then matches against a routing table built from LMArena and Artificial Analysis benchmark data.

    What surprised me in the benchmark data: complex tasks through the router actually produced MORE detailed output than any single pinned model (6,614 chars avg vs 3,573 for Opus). The router selects specialized models per task type rather than using a generalist model for everything.

    Stack: Next.js on Vercel, Neon PostgreSQL, OpenRouter upstream. Total hosting cost ~$20/month. It's a solo project.

    The thing I'd do differently: I should have started with the benchmark data instead of building the product first. The numbers make the case better than any feature list.

    Happy to answer technical questions about the routing logic, benchmark methodology, or anything else.