- AI assistants for smaller practices without enterprise EHR. Epic at the moment integrates 3rd party AI assistants, but those are of course cloud services and are aimed at contracts with large hospital systems. They're a great step forward, but leave much to be desired by doctors in actual usefulness.
- Consumer/patient facing products to help people synthesize all of their health information and understand what their healthcare providers are doing. Think of a n on device assistant that can connect with something like https://www.fastenhealth.com/ to make local RAG of their health history.
Overall, users can feel more confident they know where their PHI is, and potentially easier for smaller companies/start-ups to get into the healthcare space without having to move/store people's PHI.
You could make the same argument for Kubernetes. If you have the cash and the team, why not build it yourself? Most don't have the expertise or the time to find/train the people who do.
People want AI that works out of the box on day one. Not day 100.
Yeah, the beachead will be our biggest issue - where to find first hard-core users. I was thinking legal (they have a need for AI, but data cannot leave their servers), healthcare (same as legal, but more regualtions), and government (not right now, but normally have deep pockets).
What do you think is a good starting place?
An idea might be to try and get a vertical sooner rather than later. The only thing better than an interested lawyer would be a selection of curated templates and prompts designed by people in the industry for example. So you get orchestration and industry-specific aligned verts. Much easier sell than a general purpose platform. But then you're fighting with the other vertically integrated offerings.
Maybe there are other differentiators? If this is like bedrock for your network, maybe the angle is private models where you want them. Others are doing that though, so there's pretty active competition there as well.
The more technical and general the audience the more you're going to have to talk them out of just rolling openwebui themselves.
We’re starting with regulated enterprises (defense, healthcare, legal, fintech) where control and compliance actually matter. The same YAML-defined system can run in AWS, in a hospital, or fully air-gapped — no lock-in, no data leaving.
We’re building a few sample recipes to show it in action:
Legal: doc analysis + precedent search with local vector DBs
Healthcare: privacy-preserving RAG over clinical notes
Industrial/Defense: offline sensor or alert agents that sync later
Partnering with MSPs and modernization firms seems like the obvious path — they already have the relationships and budgets, and LlamaFarm gives them something repeatable to deploy.
Still figuring that out though — what’s the best way to actually get into the MSP funnel?
You can pull down the repo and run a few easy commands to get inference up and running.
Any GPU that is not being used at 80% capacity needs to be put to work; we have a lot of work that can be done. (A lot of industries cannot lease their GPUs to the public due to regulatory issues).
> Instead of one brittle giant, we orchestrate a Mixture of Experts…
“mixture of experts” is a specific term of art that describes an architectural detail of a type of transformer model. It’s definitely not using smaller specialized models for individual tasks. Experts in an MoE model are actually routed to on a per token basis, not on a per task or per generation basis.
I know it’s tempting to co-opt this term because it would fit nicely for what you’re trying to do but it just adds confusion.
Our bet is that the timing’s finally right: local inference, smaller and more powerful open models (Qwen, Granite, Deepseek), and enterprise appetite for control have all converged. We’re working with large enterprises (especially in regulated industries) where innovation teams need to build and run AI systems internally, across mixed or disconnected environments.
That’s the wedge — not another SaaS, but a reproducible, ownable AI layer that can actually move between cloud, edge, and air-gapped. Just reach out, no intro needed - robert @ llamafarm.dev
We also have plans for eval features in the product so that users can measure the quality of changes over time, whether to their own project configs or actual LlamaFarm updates.
Yes, all that's a bit hand-wavy, I know. :-) But we do recognize the problem and have real ideas on solutions. But execution is everything. ;-)
How did RAG degrade when it went to prod? Do you mean your prod server had throughput issues?
Where are you on Vulkan support? Hard to find good stacks to use with all this great intel and non-rocm amd hardware. Might be a good angle too rather than chasing the usual Nvidia money train.
We now support AMD, Intel, CPU, and Cuda/Nvidia.
Hit me up if you want a walk through - this is in dev right now (you have to pull down the repo to run it), but we'll ship it as a part of our next release.
Then do LF init to get the project started!
Production-ready enterprise AI requires solving model management, RAG pipelines, model fine-tuning, prompt engineering, failover, cost optimization, and deployment orchestration. You can’t just be good at one or two of these, you have to be great at all of them or your project won't succeed. And so Llamafarm was born!
Monetization-wise - We’re open source and free forever, with revenue coming from enterprise support, managed deployments, and compliance packages—basically, companies pay for confidence, not code.
We still are Rownd (https://rownd.com); but we see the writing on the wall. SaaS Software that helps with "hard code" problems is going the way of the dodo.
What used to take a few weeks and was hard to maintain can be down with Codex in the background. We are still bringing in decent revenue and have no plans to sunset, we are just not investing in it.
We all have IBM backgrounds - not sexy, but we are good at running complex software in customer datacenters and in their clouds. AI is going to have to run locally to extract full value from regulated industries.
We are using a services + support model, likely going vertical (legal, healthcare, and we had some good momentum in the US Gov until 1 October :)).
LlamaFarm provides an abstraction over most (eventually all) of those pieces. Something that should work out of the box wherever you deploy it but with various knobs to customize as needed (we're working on an agent to help you with this as well).
In your example (alarm monitoring), I think right now you'd still need to write the agent, but you could use LlamaFarm to deploy an LLM that relied on increasingly accurate examples in RAG and very easily adjust your system prompt.
You can wire things together yourself (LangChain, bash, Ollama, etc.), but LlamaFarm tries to make that repeatable and portable. It’s declarative orchestration for AI systems — you describe what you want (models, RAG, agents, vector DBs) in YAML, and it runs the same way anywhere: laptop, cloud, or fully air-gapped edge.
So instead of gluing frameworks and breaking them every update, you can do something like:
name: home_guarde runtimes: - detect_motion: {model: "phi-3", provider: "lemonade"} - alert: {model: "gpt-5", fallback: "llama3:8b"} rag: embedder: "nomic-embed-text" database: chromaDB
…and it just runs — same config, same behavior, whether you’re doing local RAG or home monitoring. The goal isn’t to replace the DIY route, just to make it composable and reproducible.
Where LlamaIndex gives you powerful RAG primitives, we give you the full production system - the model failover when OpenAI is down, the strategy system that adapts from development to production, the deployment configs for Kubernetes. We handle all the boring stuff that turns a RAG prototype into a system that actually runs in production. One YAML config, one CLI command, and you have everything from local development to cloud deployment. :)
I'm basically imagining a vast.ai type deployment of an on-prem GPT; assuming that most infra is consumer GPUs on consumer devices, the idea of running the "company cluster" as combined compute of the company's machines
Maybe a better descriptor is "self-sovereign AI?" "Self-hosted AI?"
https://llm-d.ai/blog/intelligent-inference-scheduling-with-...
The Support + Service model has been proven by large and small companies alike - it is one that will also survive the AI contraction coming.
build agents. please.
Our goal is to target large enterprises with services and support, leveraging deep channel partnerships. We do ship agents as a part of the project and it will be a critical part of the services model in the future.
Agents will come and go (and probably run into the same orchestration headaches), but someone still has to build the reliable, open foundation they’ll stand on.
I, for one, am glad that not everyone shares your ethos. Conversely, I'm also glad that there are people out there "building agents". Diversity is a good thing. Encouraging everyone to only do one thing is a bad thing.
We're building a general purpose compiler for Python. Once compiled, developers can deploy across Android, iOS, Linux, macOS, Web (wasm), and Windows in as little as two lines of code.
Congrats on the launch!
Read more:
- https://blog.codingconfessions.com/p/compiling-python-to-run... - https://docs.muna.ai/predictors/ai#inference-backends