4 pointsby ostefani4 hours ago6 comments
  • Intelligent_Fox29 minutes ago
    For most teams the answer depends less on the technical tradeoffs (latency, cost, privacy) and more on who will actually be operating and governing it.

    If you have non-technical stakeholders managing the deployment or making decisions about AI use, API providers give you more guardrails and less operational surface area. Local LLMs require someone who understands the stack.

    That said, the AI literacy gap on the business/governance side is real — teams where managers don't have AI foundations tend to get stuck on escalations. Programs like IAIDL (https://iaidl.org) address that side of the equation for non-technical colleagues. Reduces the "what is it actually doing" conversation overhead considerably.

  • ok_computer_3 hours ago
    Gemma 4 dropped two days ago and it's a pretty direct answer to this question. Google DeepMind built it explicitly for local deployment, the 26B MoE activates only 3.8B parameters during inference (so it runs at roughly 4B cost while hitting near-31B benchmark quality), and the smaller E4B variant runs fully offline on an 8GB laptop. The 31B Dense currently ranks third among all open models on the Arena AI leaderboard. The quality-per-parameter gap between local and cloud is closing faster than most people expected.

    That said, "worth it" still depends heavily on your hardware. A 4070 Ti gets you a very different answer than a 3060.

    Disclosure: I'm building localllm-advisor.com, free and client-side, which also helps answer these types of questions. It shows which models fit your GPU with quantization options and estimated tok/s, or which GPU you'd need to run a specific model. Relevant to the question so I'm mentioning it, but take it for what it is.

  • politelemon4 hours ago
    Depends on what you're using it for, a small model could be viable as long as you're willing to absorb the maintenence overheads of running and deploying your own inference. A simple API would be much more cost effective especially if there are scaling requirements and time constraints.
    • ostefani3 hours ago
      Use for support chat bot. I see a lot of open source models. Not sure if it's worth it. Reply via API from LLM should be better?
  • JaceDev4 hours ago
    tbh small lms are better if u
    • ostefani3 hours ago
      But you need to host it? Small model will provide worse results?
  • yours3lf4 hours ago
    [dead]
  • amoriodi4 hours ago
    [dead]