4 pointsby Barathkanna4 hours ago4 comments
  • thiago_fm10 minutes ago
    Just add very hard high limits and add instrumentation so you can track it and re-evaluate it accordingly.

    This takes a couple of hours maximum at best.

  • sriramgonella3 hours ago
    local models are better in controlling costs rather commercial models are very high and no control on this cost..how ever again local models training setup to be archietected very well to train this continoulsly
    • thiago_fm7 minutes ago
      That isn't true, if you run local models you'll also need to have to spend on operations.

      Maybe focus first on providing value and later you can optimize this setup.

  • Lazy_Player823 hours ago
    Honestly, if you're designing your agent workflows properly with hard limits on retries and tool calls, the variance shouldn't be that wild. Most of the unpredictability comes from not having those guardrails in place early on. A few weeks of real production data usually shows the average cost is more stable than you'd expect.
    • Barathkanna3 hours ago
      True, but for early stage builders it’s harder to design those guardrails upfront. A lot of the time you only discover the retry patterns and cost spikes once real users start hitting the system.
      • Lazy_Player823 hours ago
        Fair point. And honestly, with more non-technical builders shipping agent-based products these days, that's probably where a service like this makes the most sense – for people who don't yet have the experience to know what guardrails to put in place.
        • Barathkanna3 hours ago
          Exactly. That’s actually why we started building Oxlo.ai. Early stage builders usually just want to experiment without worrying too much about token cost spikes.
  • clearloop4 hours ago
    imo switch to local models could be an option
    • Barathkanna3 hours ago
      Local models solve the marginal cost problem, but they move the complexity into infrastructure and throughput planning instead.
      • clearloopan hour ago
        makes sense, it really depends on the use cases, I'm building my version of claw openwalrus for the local LLMs first goal, I think myself will use local models for daily tasks that heavily depend on tool callings, but for coding or doing research, I'll keep using remote models

        and this topic actually inspires me that I can introduce a builtin gas meter for tokens