2 pointsby Ms-J6 hours ago3 comments
  • Archerlm6 hours ago
    From what I know of, Ollama works offline, but if your ram is only 8 GB and without a GPU, latency would be severely limiting. The closest local model I can suggest is LM Studio (no cloud dependencies, own model registry, etc.). If that doesn't work out for you then GPT4All, but its models are quite small.
    • Ms-J4 hours ago
      I'm looking for an agent, but thanks.

      I forget the issue with Gpt4all as some have blended together when they weren't suitable for me.

      • Archerlm3 hours ago
        My bad I misread your post. If GPT4all didn't work out, go with Aider. It’s a CLI tool, doesn't have a UI trying to proxy requests to a dev's server. You just point it at your local model (via Ollama or vLLM) and it stays in its lane. Since it’s Python-based, you can grep the source code to confirm there are no hidden update pings. If that's not for you, and you need the IDE experience, pick Continue. It’s the only one that handles air-gapped setups properly. You can manually install the .vsix file and kill all telemetry in the config.json. Unlike OpenCode, it doesn't try to be it’s just a bridge between your code and your model server. OpenCode failed because it’s basically "cloud-first" pretending to be local. Aider and Continue are actually built for what you want.
        • Ms-J2 hours ago
          This was such a useful post, thank you have an upvote! I forgot about Aider.

          I was looking into it but got distracted with other work, do you know if it does have any update checks or telemetry? I will check the source but I could miss something so I definitely want to ask people who have used it.

          I think I also looked into Continue very briefly. I'm glad you put thos notes about it being more the IDE experience. Also for this one, does it come with instructions on a Github page or something on how to kill all spying/telemetry?

          Thanks again!

  • josefcub6 hours ago
    Try charmbracelet's crush, found here:

    https://github.com/charmbracelet/crush

    Crush is pretty new, but getting better all the time. It's written in Go, so no node hijinks to get it working. It works fine with my ollama or llama-server localhost endpoints, and I've used it to make up a couple of internal projects without any issues.

    It does have internal telemetry and such (including updating its list of external models it can use) that can be turned off in the crush.json configuration file.

    If you're on a Mac, you can install via homebrew or use the more traditional route via Github.

    • Ms-J4 hours ago
      Thanks for the recommendation, I took a look and maybe you can answer a few questions that I couldn't find a clear answer when doing some quick searching.

      Regarding local models, can it use them? I found this discussion:

      https://github.com/charmbracelet/crush/discussions/775

      I didn't appreciate the meow maintainer's attitude converting it into a discussion and ignoring the issue even to this day.

      "It does have internal telemetry and such (including updating its list of external models it can use) that can be turned off in the crush.json configuration file."

      Is there a page or guide which explains the telemetry and any internet connected settings?

      Forgot to add, I use Linux.

      • josefcub3 hours ago
        My google-fu is failing me at the moment to cite sources, but here's an example ~/.config/crush/crush.json file (based on my own) showing the options to remove telemetry and provider auto updates, and the connection info to connect to a localhost model on an OpenAI-compatible endpoint:

        { "$schema": "https://charm.land/crush.json", "options": { "disable_provider_auto_update": true, "disable_metrics": true }, "providers": { "ollama": { "name": "Local Models", "base_url": "http://localhost:11434/v1", "api_key": "nunya", "type": "openai-compat", "models": [ { "name": "Qwen 3.5 Local", "id": "qwen-3.5-35b-planning", "cost_per_1m_in": 0.01, "cost_per_1m_out": 0.01, "context_window": 131072, "think": true, "default_max_tokens": 5120, "supports_attachments": true } ] } } }

        ...or not, thanks to formatting. I can't even search for help formatting this text box, because of HN's nature haha

        • Ms-J2 hours ago
          That helps a lot being able to see an example, thanks!

          I don't know why all of these tools make it so hard to find the info to disable the telemetry/spying it's not just this one.

          Regarding the formatting, I have no idea haha but there is a small "help" button on the bottom right next to the comment. Yes yes, I'm sure it won't help much.

          Alternatively, possibly asking an LLM might help. It was able to link me the other day to a comment between a user complaining to the mod about the posting cool down period. I was able to learn that it can be disabled per account.

  • verdverm4 hours ago
    I have a custom agent setup and just added ollama support. The main issue with local is model quality, it's just not there for the most part. I have had some good results using gemma4 31b on cpu, but the latency is really long. I do plan to use it for higher level automation that could run for a day or two without me needing to look at it (what would be ~minutes to hours with Gemini flash, i.e.)