Launch HN: LlamaFarm (YC W22) – Open-source framework for distributed AI(github.com)

106 pointsby mhamann6 days ago18 comments

jochalek6 days ago
Very cool to see a serious local first effort. Looking back at how far local models have come I definitely believe their usefulness combined with RAG or in domain specific contexts is soon to be (or already is) on par with general purpose gpt5-like massive parameter cloud models. The ability to generate quality responses without having to relinquish private data to the cloud used to be a pipedream. It's exciting to see a team dedicated to making this a reality.
- rgthelen6 days ago
  What are a few use-cases you want to see this used for?
  - jochalek6 days ago
    Few ideas related to healthcare
    - AI assistants for smaller practices without enterprise EHR. Epic at the moment integrates 3rd party AI assistants, but those are of course cloud services and are aimed at contracts with large hospital systems. They're a great step forward, but leave much to be desired by doctors in actual usefulness.
    - Consumer/patient facing products to help people synthesize all of their health information and understand what their healthcare providers are doing. Think of a n on device assistant that can connect with something like https://www.fastenhealth.com/ to make local RAG of their health history.
    Overall, users can feel more confident they know where their PHI is, and potentially easier for smaller companies/start-ups to get into the healthcare space without having to move/store people's PHI.
    rgthelen6 days ago
    Cool ideas, thank you.
- mhamann6 days ago
  Thanks! It means a lot to hear you say that.
A4ET8a8uTh0_v26 days ago
I am not sure if it is the future, but I am glad there is some movement to hinder centralization in this sector as much as possible ( yes, I recognize future risk, but for now it counts as hindering it ).
- rgthelen6 days ago
  I am kind of militant about this. The ability to run great AI models locally is critical, not just for this sector, but for innovation overall. The bar of "build a few datacenters" is far too high for all but the largest countries and companies in the world.
- mhamann6 days ago
  100%. We don't know what's going to happen in the future. Things are evolving so quickly. Hopefully pushing back on centralization now will keep the ecosystem healthier and give developers real options outside the big two/three cloud providers.
johnthecto6 days ago
So this sounds like an application layer approach, maybe just shy of a replit or base44, with the twist that you can own the pipeline. While there's something to that, I think there are some further questions around differentiation that need to be answered. I think the biggest challenge is going to be the beachead: what client demographic has the cash to want to own the pipeline and not use SaaS, but doesn't have the staff on hand to do it?
- mhamann6 days ago
  I think that enterprises and small businesses alike need stuff like this, regardless of whether they're software companies or some other vertical like healthcare or legal. I worked at IBM for over a decade and it was always preferable to start with an open source framework if it fit your problem space, especially for internal stuff. We shipped products with components built on Elastic, Drupal, Express, etc.
  You could make the same argument for Kubernetes. If you have the cash and the team, why not build it yourself? Most don't have the expertise or the time to find/train the people who do.
  People want AI that works out of the box on day one. Not day 100.
- rgthelen6 days ago
  Yeah, that’s a fair framing — it is kind of an “application layer” for AI orchestration, but focused on ownership and portability instead of just convenience.
  Yeah, the beachead will be our biggest issue - where to find first hard-core users. I was thinking legal (they have a need for AI, but data cannot leave their servers), healthcare (same as legal, but more regualtions), and government (not right now, but normally have deep pockets).
  What do you think is a good starting place?
  - johnthecto5 days ago
    Like your ideas around legal and Healthcare. Both sectors where you've got interest and money. Another way might be to look at partnering with a service org who does transformation/modernization as a way to accelerate delivery. Maybe MSPs? They're always trying to figure out how to lean out.
    An idea might be to try and get a vertical sooner rather than later. The only thing better than an interested lawyer would be a selection of curated templates and prompts designed by people in the industry for example. So you get orchestration and industry-specific aligned verts. Much easier sell than a general purpose platform. But then you're fighting with the other vertically integrated offerings.
    Maybe there are other differentiators? If this is like bedrock for your network, maybe the angle is private models where you want them. Others are doing that though, so there's pretty active competition there as well.
    The more technical and general the audience the more you're going to have to talk them out of just rolling openwebui themselves.
    rgthelen5 days ago
    Yeah, totally fair. The “horizontal orchestration” story only goes so far — at some point you need vertical depth.
    We’re starting with regulated enterprises (defense, healthcare, legal, fintech) where control and compliance actually matter. The same YAML-defined system can run in AWS, in a hospital, or fully air-gapped — no lock-in, no data leaving.
    We’re building a few sample recipes to show it in action:
    Legal: doc analysis + precedent search with local vector DBs
    Healthcare: privacy-preserving RAG over clinical notes
    Industrial/Defense: offline sensor or alert agents that sync later
    Partnering with MSPs and modernization firms seems like the obvious path — they already have the relationships and budgets, and LlamaFarm gives them something repeatable to deploy.
    Still figuring that out though — what’s the best way to actually get into the MSP funnel?
    Unheard36105 days ago
    Follow on, great question (separately from John) can it run on Vulkan? He was saying he has a hard time finding stuff that doesn’t only run on CUDA or ROCm and sees this as a huge opportunity
    rgthelen5 days ago
    Yes! Vulcan support is in dev (will be in the next versioned release, probably tomorrow).
    You can pull down the repo and run a few easy commands to get inference up and running.
    https://docs.llamafarm.dev/docs/models#lemonade-runtime
    Unheard36105 days ago
    Yeah! Or maybe the GPU rental and lab market? (At least those that aren’t already fully hyperscaler captured?) like we were chatting about John?
    rgthelen5 days ago
    We are adding continuous model fine-tuning soon, and being able to bring extra horsepower into training is an opportunity. There is also a genuine opportunity to do the same thing with shared resources inside an internal server or VPC; timing and utilizing resources, such as GPUs, during off-hours to train, improve, etc., is a never-recovered opportunity that many enterprises leave on the table.
    Any GPU that is not being used at 80% capacity needs to be put to work; we have a lot of work that can be done. (A lot of industries cannot lease their GPUs to the public due to regulatory issues).
zackangelo5 days ago
Just a bit of feedback:
> Instead of one brittle giant, we orchestrate a Mixture of Experts…
“mixture of experts” is a specific term of art that describes an architectural detail of a type of transformer model. It’s definitely not using smaller specialized models for individual tasks. Experts in an MoE model are actually routed to on a per token basis, not on a per task or per generation basis.
I know it’s tempting to co-opt this term because it would fit nicely for what you’re trying to do but it just adds confusion.
- rgthelen5 days ago
  I hear you and valid. A mixture of Models is probably a better phrase - we are constantly moving between AI experts and very skilled Developers who use OpenAI endpoints and call it AI, so we are constantly working on finding the correct language. This was a miss though - will do better :)
gus_225 days ago
Would like to connect. I've got a YC colleague I have asked to setup an intro. Agree with John's point below about the chicken and egg scenario. Our community flagged your post. Fundamental problems like who cares, who pays reinforce why this hasn't already been done (well and at scale). And why this time might be the right time!
- rgthelen5 days ago
  Appreciate that — and totally agree. The “who cares / who pays” question is exactly why this hasn’t scaled before.
  Our bet is that the timing’s finally right: local inference, smaller and more powerful open models (Qwen, Granite, Deepseek), and enterprise appetite for control have all converged. We’re working with large enterprises (especially in regulated industries) where innovation teams need to build and run AI systems internally, across mixed or disconnected environments.
  That’s the wedge — not another SaaS, but a reproducible, ownable AI layer that can actually move between cloud, edge, and air-gapped. Just reach out, no intro needed - robert @ llamafarm.dev
Eisenstein6 days ago
How do you deal with the space continually evolving? Like, MCP changed major ways over the course of a few months, new models are released with significant capability upgrades every month, inference engines like llamacpp get updated multiple times a day. But organizations want to setup their frameworks and then maintain them. Will this let them do that?
- mhamann6 days ago
  Yes, our goal is to provide a stable, open source platform on top of the cutting-edge AI tools. We can systematically update dependencies as needed and ensure that outputs meet quality requirements.
  We also have plans for eval features in the product so that users can measure the quality of changes over time, whether to their own project configs or actual LlamaFarm updates.
  Yes, all that's a bit hand-wavy, I know. :-) But we do recognize the problem and have real ideas on solutions. But execution is everything. ;-)
ivape6 days ago
We built a bunch of AI demos but they were impossible to get to production. It would work perfectly on our laptop, but when we deployed it, something broke, and RAG would degrade.
How did RAG degrade when it went to prod? Do you mean your prod server had throughput issues?
- mhamann6 days ago
  Multiple areas of degradation. Typically, you don't ship a dataset to prod and then never change it. You want the system to continue to learn and improve as new data is available. This can create performance issues as the dataset grows in size. But also, your model's performance in terms of quality can degrade over time if you're not constantly evaluating its responses. This can occur because of new info within RAG, a model swap/upgrade, or changes to prompts. Keeping all of those knives in the air is tricky. We're hoping we can solve a bunch of pain points around this so that reliable AI systems are accessible to anyone.
johnthecto5 days ago
Heya - one blue sea question.
Where are you on Vulkan support? Hard to find good stacks to use with all this great intel and non-rocm amd hardware. Might be a good angle too rather than chasing the usual Nvidia money train.
- rgthelen5 days ago
  Funny you bring it up. We shipped Vulkan support TODAY through a tight integration Lemonade (https://lemonade-server.ai).
  We now support AMD, Intel, CPU, and Cuda/Nvidia.
  Hit me up if you want a walk through - this is in dev right now (you have to pull down the repo to run it), but we'll ship it as a part of our next release.
  https://docs.llamafarm.dev/docs/models#lemonade-runtime
bobbyradford6 days ago
I'm a contributor on this project and am very excited to hear your feedback. We really hope that this will become a helpful tool for building AI projects that you own and run yourself!
- rgthelen6 days ago
  Likewise, it's been fun building in the open on this one. You can download the CLI with just: curl -fsSL https://raw.githubusercontent.com/llama-farm/llamafarm/main/... | bash
  Then do LF init to get the project started!
serjester6 days ago
Congrats on the launch. YC 2022? I'm assuming this was a pivot - what lead to it and how do you guys plan on making money long term?
- rachelradulo6 days ago
  Yep! We were working on an authentication startup (https://news.ycombinator.com/item?id=30615352) and built it to $1.5M in ARR, but then we saw even a bigger pain point; local AI is hard. When we tried building a corporate knowledge base with RAG and local models, we hit the same wall: a painful gap between prototype and production.
  Production-ready enterprise AI requires solving model management, RAG pipelines, model fine-tuning, prompt engineering, failover, cost optimization, and deployment orchestration. You can’t just be good at one or two of these, you have to be great at all of them or your project won't succeed. And so Llamafarm was born!
  Monetization-wise - We’re open source and free forever, with revenue coming from enterprise support, managed deployments, and compliance packages—basically, companies pay for confidence, not code.
- rgthelen5 days ago
  Oh man, yeah, big pivot.
  We still are Rownd (https://rownd.com); but we see the writing on the wall. SaaS Software that helps with "hard code" problems is going the way of the dodo.
  What used to take a few weeks and was hard to maintain can be down with Codex in the background. We are still bringing in decent revenue and have no plans to sunset, we are just not investing in it.
  We all have IBM backgrounds - not sexy, but we are good at running complex software in customer datacenters and in their clouds. AI is going to have to run locally to extract full value from regulated industries.
  We are using a services + support model, likely going vertical (legal, healthcare, and we had some good momentum in the US Gov until 1 October :)).
darkro6 days ago
I love the ethos of the project! I think your docs link might be broken, however. Looking forward to checking this out!
- rachelradulo6 days ago
  Hey thanks! Sorry about the broken link - here's a better docs link for now https://docs.llamafarm.dev/docs/intro mind sharing where it's broken?
  - darkro6 days ago
    Yea of course. I was trying to click the docs link from the homepage on llamafarm.dev from two different networks on two different browsers: edge and brave. Neither worked. Phone didn’t either. It takes me to a supabase link that errors out. Hope that helps! Thanks for the link! (Btw I don’t see any errors in the browsers console)
    rachelradulo6 days ago
    thank you! found and fixed 2 on the website - appreciate the comment and detailed testing
- rachelradulo6 days ago
  Just found and fixed a bad link on the bottom of the website - thanks again for pointing that out !
Unheard36106 days ago
but wait, why should I do this for my first home grown orchestration instead of something else? Like, if I want to set up a local LLM running on my old laptop for some kind of RAG on all my hard drives why is this best? Or if I want agentic monitoring of alarms instead of paying for simplisafe or ring or whatever.
- mhamann6 days ago
  Right...there are lots of ways you could do that. Most of the ways we've seen enabling that sort of thing tend to be programmatic in nature. That's great for some people, but you have to deal with shifting dependencies, sorting out bugs, making sure everything connects properly, etc. Some people will want that for sure, because you do get control over every little piece.
  LlamaFarm provides an abstraction over most (eventually all) of those pieces. Something that should work out of the box wherever you deploy it but with various knobs to customize as needed (we're working on an agent to help you with this as well).
  In your example (alarm monitoring), I think right now you'd still need to write the agent, but you could use LlamaFarm to deploy an LLM that relied on increasingly accurate examples in RAG and very easily adjust your system prompt.
- rgthelen6 days ago
  Good question — that’s actually the sweet spot for LlamaFarm.
  You can wire things together yourself (LangChain, bash, Ollama, etc.), but LlamaFarm tries to make that repeatable and portable. It’s declarative orchestration for AI systems — you describe what you want (models, RAG, agents, vector DBs) in YAML, and it runs the same way anywhere: laptop, cloud, or fully air-gapped edge.
  So instead of gluing frameworks and breaking them every update, you can do something like:
  name: home_guarde runtimes: - detect_motion: {model: "phi-3", provider: "lemonade"} - alert: {model: "gpt-5", fallback: "llama3:8b"} rag: embedder: "nomic-embed-text" database: chromaDB
  …and it just runs — same config, same behavior, whether you’re doing local RAG or home monitoring. The goal isn’t to replace the DIY route, just to make it composable and reproducible.
smogs6 days ago
Looks great! Congrats on the launch. How is this different than llamaindex?
- rachelradulo6 days ago
  Hey thanks! I'm Rachel from LlamaFarm; we actually use LlamaIndex as one of our components. It's great for RAG, and we didn't want to reinvent what they've already done. LlamaFarm is about bundling the best of open source into a complete, production-ready AI project framework. Think of us like the integration and orchestration layer that makes LlamaIndex, plus model management, plus prompt engineering, plus deployment tools all work together seamlessly.
  Where LlamaIndex gives you powerful RAG primitives, we give you the full production system - the model failover when OpenAI is down, the strategy system that adapts from development to production, the deployment configs for Kubernetes. We handle all the boring stuff that turns a RAG prototype into a system that actually runs in production. One YAML config, one CLI command, and you have everything from local development to cloud deployment. :)
outfinity6 days ago
Hope you are right and we decentralize AI properly...
- rgthelen6 days ago
  The hardest part, honestly, is the runtime. How do we make it super easy actually to deploy this. We are still working on that. Where do you see a few good places to focus at first? I was thinking AWS and Google, since both have good GPU pricing models, but I am probably missing a few good ones!
singlepaynews6 days ago
Very cool. I jumped in here thinking it was gonna be something else though: a packaged service for distributing on-prem model running across multiple GPUs.
I'm basically imagining a vast.ai type deployment of an on-prem GPT; assuming that most infra is consumer GPUs on consumer devices, the idea of running the "company cluster" as combined compute of the company's machines
- mhamann6 days ago
  Great point. I can see how you'd land there. Also a great idea! xD
  Maybe a better descriptor is "self-sovereign AI?" "Self-hosted AI?"
- jochalek6 days ago
  Sounds like something that could be implemented with llm-d, though I've not experimented with it.
  https://llm-d.ai/blog/intelligent-inference-scheduling-with-...
  - rgthelen6 days ago
    Yeah, I don't see why we could not integrate that. I think that is the next step as we move our workloads to production.
    mhamann6 days ago
    `lf deploy` here we come!
- olokobayusuf6 days ago
  We're building something closer to this at Muna: https://docs.muna.ai . Check us out and let me know what you think!
  - dang6 days ago
    https://news.ycombinator.com/item?id=43119777
  - rgthelen6 days ago
    Let me know when you open source it; I think there is a place for this and I think we could integrate it as a plug in pretty easily into the LlamaFarm framework :)
bityard6 days ago
Open source but backed by venture capital, so what is your monetization strategy?
- rachelradulo6 days ago
  Fair question. The core will always stay open source and free. We’ll monetize around it with things like managed hosting, enterprise support, and compliance options (HIPAA, SOC2, etc). Basically, we make money when teams want someone to stand behind it in production, not for using the software itself. But let us know if you have other ideas! We're still new to open source
  - jbstack5 days ago
    So to clarify, does this mean you don't plan to go down the route of having a "community edition" vs a "enterprise edition" with missing features in the former?
    rgthelen4 days ago
    The split model leaves too many holes to make it really useful for the community. When we add things like "authentication", we'll ship the plugins (like okta integration (for enterprises), etc. We will do our best to maintain all of the plugins (but if there are 30 different auth providers, we will have to rely on the community to maintain the smaller ones), but, enterprises will pay us to ENSURE everything is up to date, safe, etc.
    The Support + Service model has been proven by large and small companies alike - it is one that will also survive the AI contraction coming.
    rachelradulo5 days ago
    Correct for now. We've got a ways to go to hammer out the real details of enterprise. We wouldn't want a world where not having the enterprise-y add-ons would hinder the core value prop.
swyx5 days ago
yeah guys look i wish you well and respect for launching and all but this is just not going to ever be a venture scale startup and you should calibrate your expectations. you could be wasting the best years of your life being a wrapper of a wrapper of a wrapper and competing on developer experience in open source for no money, or you could be building agents.
build agents. please.
- rgthelen5 days ago
  Our business model is not to compete in open-source; we are just providing this to the community since its the right thing to do and as a signal to enterprises to reduce risk.
  Our goal is to target large enterprises with services and support, leveraging deep channel partnerships. We do ship agents as a part of the project and it will be a critical part of the services model in the future.
- mhamann5 days ago
  Appreciate the pep talk, but let’s not pretend infra and developer experience plays can’t scale. GitLab, HashiCorp, and Vercel all "just built better DX for open source" and somehow ended up billion-dollar companies.
  Agents will come and go (and probably run into the same orchestration headaches), but someone still has to build the reliable, open foundation they’ll stand on.
- swyx5 days ago
  more context - https://x.com/swyx/status/1904256213661192405 i really mean it in as non mean spirited a way as possible
  - mhamann5 days ago
    I do really appreciate you taking the time to drop by and leave a comment. But I'm curious...why do you think building agents is so important vs. building more of the "AI infrastructure" (which is really what LlamaFarm is trying to do) that will enable the devs that are building integrated AI systems (including agents).
    swyx5 days ago
    both are important, but one will make more money and have higher changes of survival. incidentally i have found that agent companies have better infra than the horizontal ai infra as a service companies anyway, and you can intuit why that is. so: you will struggle to get the good customers because the good customers are increasingly finding it affordable to build rather than buy.
- jbstack5 days ago
  This whole comment is predicated on the idea that something is only worthwhile if it will lead to a "venture scale startup" and that to do anything else is to "waste the best years of your life".
  I, for one, am glad that not everyone shares your ethos. Conversely, I'm also glad that there are people out there "building agents". Diversity is a good thing. Encouraging everyone to only do one thing is a bad thing.
  - swyx5 days ago
    bros identify as YC W22, this is a reasonable expectation
    rgthelen5 days ago
    Yeah, we are in it to win it.
    swyx4 days ago
    respect for that. i hope you know my criticism is merely well intentioned tough love. go prove me wrong
olokobayusuf6 days ago
This is super interesting! I'm the founder of Muna (https://docs.muna.ai) with much of the same underlying philosophy, but a different approach:
We're building a general purpose compiler for Python. Once compiled, developers can deploy across Android, iOS, Linux, macOS, Web (wasm), and Windows in as little as two lines of code.
Congrats on the launch!
- mhamann6 days ago
  Oh! Muna looks cool as well! I've just barely glanced at your docs page so far, but I'm definitely going to explore further. One of the biggest issues in the back of our minds is getting models running on a variety of hardware and platforms. Right now, we're just using Ollama with support for Lemonade coming soon. But both of these will likely require some manual setup before deploying LlamaFarm.
  - olokobayusuf6 days ago
    We should collab! We prefer to be the underlying infrastructure behind the scenes, and have a pretty holistic approach towards hardware coverage and performance optimization.
    Read more:
    - https://blog.codingconfessions.com/p/compiling-python-to-run... - https://docs.muna.ai/predictors/ai#inference-backends
    rgthelen6 days ago
    This looks awesome. Are you kind of like lemonade? Let's chat - robert@llamafarm.dev