Open-Weight Models Don't Need to Win(twitter.com)

5 pointsby kumama2 hours ago4 comments

firebearan hour ago
> open-weight models let the people who do have the domain expertise build on top of a capable foundation. this doesn't require open models to win the frontier race. it requires them being close enough.
This makes sense to me. Access to top-class open weight models allow for the community to do fine-tunes on low-resource languages such as Kinyarwanda or Luganda, which closed labs may not have time or expertise for.
Multiple ground up initiatives here in Rwanda would not be possible without models such as Qwen or NLLB.
- kumama28 minutes ago
  whats nllb?
angelpan28 minutes ago
Curious how much of the code progress story is post training investment vs code being uniquely suited to RL with verifiable rewards
seahyinghang82 hours ago
i really hope that's the case
- kumama2 hours ago
  yeah :)
sowbugan hour ago
Reprinted with proper capitalization:
The general sentiment on open-weight models is something like this: they can approach the frontier, but will always lag a little behind. Because reaching/surpassing the frontier needs large pools of capital, compute & data access that only the big labs can put together. There’s a good chance this is right. But the question beckons - does it actually matter? I, for one, don’t think it does. The reasons behind this have interesting implications for the longer-term market structure of AI models.
The weights aren’t the moat
A recent thought experiment I ran with my team over lunch: if OpenAI and Anthropic “open-weighted” their models, what do they actually lose? My take: lesser than most believe. Weights alone aren’t a moat, on both consumer & enterprise.
On the consumer side, both the labs have built brands with loyalty. The average consumer isn’t running quantitative benchmarks to see which is better, they use whatever they feel has the best vibes. Case in point is all the folks who clamored for GPT-4o even after much newer, “better” versions were released. On the enterprise side, the fact that both companies are starting PE-like deployment companies & FDEs tells you something.
Enterprises need more than a model - they need folks to figure out integration, evaluation & operationalization. Just having access to weights doesn’t help much there.
What’s holding open-weights back?
The above prose is great and all - but it hasn’t empirically played out yet. Open-weights model usage has gone up in recent months but is still far, far behind frontier models. What’s missing?
The core misunderstanding here is treating open-weight models as one-to-one replacements to closed models. API models are products. Open-weight models are toolkits. And right now, the toolkits are missing most of their tools. There’s historical precedent to this with Linux - Linux didn’t win by being better than Windows out of the box. It won by allowing customization in ways Windows didn’t. Devs could do whatever they wanted with it. That allowed a real community & ecosystem to be built around it: tooling, package managers, etc. Linux won because of the stuff built around it, not because of its core kernel.
Open-weight models today are still at the kernel stage. Weights are there & the customization possibilities are endless (finetuning, quantization & deployment in ways APIs won’t allow). But the surrounding ecosystem is still nascent. If you want to take an open model and make it excellent at your specific use case, you need post-training infra: data creation tools, IDEs, GPU orchestration & inference optimization. The stack that sits between “here’s the weights” and “here’s a model that does what I need” is still immature. Building it out is, I think, the single highest-leverage thing anyone can do for the open-model ecosystem right now.
Why customization matters
One area where model capability has improved rapidly is code. That’s not random - it’s where labs have spent tremendous post-training effort. It’s an example of the implications of taking a capable base set of weights & investing seriously in it to make it excellent at a specific domain.
Now imagine if every vertical progressed at that pace - medical reasoning, legal analysis, scientific workflows, industrial applications. Progress there has been good, but it’s been slower and a little less deliberate. A lot of it is because they haven’t had that level of post-training investment of code.
This requires a sustained, domain-specific effort but also what open-weights can uniquely enable: lots of specialized models, each more intelligent for its specific domain than any general-purpose frontier model. The applications of intelligence are infinite and the closed labs will never staff enough teams to do deep post-training for oncology, contract law, materials science, agricultural planning. They don't have the domain expertise, and they don't have the incentive, the markets are too fragmented, the customization too granular.
Open-weight models let the people who do have the domain expertise build on top of a capable foundation. This doesn't require open models to win the frontier race. It requires them being close enough.
The “close-enough” assumption
This then raises the question - can frontier labs run so far ahead that even “close enough” becomes hard?
It’s a serious concern - because the capital concentration in closed labs has no historical precedent. The combined dollars raised by OAI & Anthropic and a handful of others far exceeds everyone else. Anthropic and OpenAI’s share of AI startup revenue was recently reported to be 89% (https://www.theinformation.com/articles/anthropic-openais-sh...).
The interesting thing though is that a lot of the inputs to model building don’t compound. Talent’s been remarkably fluid across the labs, carrying the tricks of the trade with them. Data too isn’t a cornered resource - there are tons of data vendors & synthetic data pipelines are improving rapidly.
Compute concentration is the most serious concern. But the nature of being closed also structurally demands a lot more compute. Closed labs internalize everything - including inference. If you’re the only one who can serve the model, you’ve to provision compute for every use-case and customer. Growth comes with a proportional capital burden. Open-models don’t have this problem - they can be deployed anywhere by anyone. The inference burden is shared across the ecosystem. This way customization & inference can grow without a proportional capital burden.
None of this means the closed labs won't be ahead. But being ahead is not the same as a runaway. And for the specialization thesis to work, open models just need to stay within striking distance, as they have for the last 2 years.
Why this matters to me
This essay probably reads as me really wanting open-weights models to succeed.
I started working on what was then called NLP as a fifteen-year-old in Singapore, far far from Silicon Valley, because I thought it was the coolest thing ever. It was only possible because there was a robust culture of open-research on the cutting edge, driven by both public institutions and private companies like Google who published their work freely. Open-weight models are the continuation of that tradition. I want that door to stay open for whoever's fifteen and curious right now.
- kumamaan hour ago
  hahahhah someone's punching back against my war on capitalization :)