This led me down the rabbit-hole of just how insidious this local-first branding for cloud models actually is.
Ollama built its reputation on a simple promise: run large language models locally, keep your data on your machine. It's a great tool. I use it. Millions of developers use it. The whole brand is "local-first AI." Then, quietly, Ollama shipped cloud models.
If you visit ollama.com/library/glm-5:cloud, you'll find GLM-5: a 744 billion parameter model built by Z.ai (formerly Zhipu AI), a Chinese AI lab. The :cloud tag means when you run it, your prompts leave your machine and get processed on remote GPUs somewhere. The command looks the same. The API is the same. Your terminal doesn't scream "WARNING: YOUR DATA IS LEAVING YOUR COMPUTER." It just works.
I started asking basic questions about this. I couldn't find answers to any of them.
The UX problem is the real danger
Here's what makes this especially concerning: the developer experience is designed to make local and cloud feel identical.
# This runs locally on your machine ollama run llama3:8b
# This sends your prompt to unknown infrastructure ollama run glm-5:cloud
Same CLI. Same API endpoint (localhost:11434). Same libraries. Same everything except one keeps your data on your machine and the other sends it somewhere you can't verify.
When you run ollama ls, cloud models show up alongside local models. The only visual difference is a - where the file size would be, and the :cloud tag in the name. There's no warning, no confirmation prompt, no "you are about to send data externally."
If you're using Ollama for local inference, nothing has changed. The local tool is still solid. But if you or anyone on your team is using :cloud models, you should be asking:
Do you know where your prompts are going? Not "Ollama's cloud" the actual datacenter, provider, and jurisdiction.
Are you sending PII through cloud models? Resumes, medical records, financial data, customer information any of this flowing through an unaudited cloud endpoint is a compliance risk. Do you have controls to prevent accidental cloud usage? Ollama offers a local-only mode (OLLAMA_NOCLOUD=1), but it's opt-in. The default allows cloud.
What's your fallback if Ollama's cloud gets compromised? With no SOC 2, no disclosed architecture, and a 21-person team, the blast radius of a breach could be significant.
The local-first branding becomes a trojan horse for cloud inference. Most developers won't read the changelog. Most won't notice the :cloud tag. Most won't ask where the compute is.
If you're running cloud inference for millions of developers, disclose where it runs. Name the datacenter provider, the jurisdiction, and whether inference stays on hardware you control or gets proxied to model providers. Get a third-party audit and publish it. And make cloud opt-in. Require explicit confirmation before a prompt leaves the user's machine, not a tag they might not notice.
Until then, treat :cloud the way you'd treat any unaudited third-party API: assume your data is being logged, and don't send anything you wouldn't post publicly.