Ask HN: Is the next big thing locally running coding agents?

1 pointby baigy2 hours ago5 comments

giwook2 hours ago
This seems like an obvious progression imo though I think very much subject to change. Open weight models will become better, and memory prices will return to normal prices in a couple years (hopefully).
That being said I think an unpredictable variable here is how the companies building frontier models respond to what should be a noticeable inflection point in consumers turning towards locally hosted open weight models.
There is also a significant amount of compute that is being built out as we speak that should in theory reduce costs for providers of frontier models but that's a whole other can of worms.
Despite all of the very impressive open weight models that are available to us today, Anthropic and OpenAI continue to remain steps ahead of the competition. Most of the biggest and brightest minds in AI are working at frontier labs. It's not hard to foresee that these labs continue to maintain their edge given the amount of expertise and brainpower they've assembled.
Assuming frontier models continue to maintain their edge, even if it's on a subset of tasks (e.g. reasoning, judgment, planning), I see a convergence towards a hybrid workflow where both frontier and local models are used for specific tasks. e.g. Claude for reasoning, planning, judgment, with intelligent routing to cheap/free models tuned for certain tasks.
- baigy2 hours ago
  Good points.
  I feel where it all loses its legs is the fact that most coding work is intermediate complexity. You won't need super intelligence to code/maintain your CRM or what have you. Specialized firms may pay the premiums Anthropic/OpenAI expect, the vast majority of enterprises won't need to, for the vast majority of their use-cases.
jonahbenton2 hours ago
There are many markets. Qwen 3.6 27b at a high enough quant is good enough for many use cases. But enterprise-consumed tokens come with legal/data protection agreements. They have just gotten comfortable with BYOD- there is no BYOD equivalent set of practices and protections for local LLMs (BYOLLM). So some enterprises are getting back into prem GPU capacity.
- baigy2 hours ago
  On prem GPU capacity - or decent enough devices for core engineering team - lends itself pretty nicely to local LLMs too. And you own the whole stack this way. Why pay premiums to Anthropic and fuel its trillion dollar valuation?
verdverm39 minutes ago
Already there friend! I just posted a Show HN from using opencode + qwen36moe output to modernize my old PhD research, surreal experience
damnitbuilds2 hours ago
I got Qwen 3.6 running locally on 12GB VRAM.
It went:
```
  AI: "I see you are building a Django project. How can I help?"

  Me: "When I click on the Reload button, it does not set the reload option correctly. Fix this"

     <10 minutes>

  AI: "I see you are building a Django project. How can I help?"
```
Needs more tweaking of the context window, I think.
Seriously, I agree that this is the future, when OpenAI et al have gone bust.
- giwook2 hours ago
  I think this is the key issue with running locally hosted models.
  Yes, technically you can run them on 12gb vram.
  But should you?
  Realistically 64gb seems to be the current threshold for getting meaningful work done while also maintaining a large enough context window.
  - baigy2 hours ago
    This will drop further with increase in intelligence density.
    giwookan hour ago
    It should, which is why I said it is the current threshold.
- baigy2 hours ago
  I think it's a huge bubble about to pop. I get that enterprises are like elephants, slow to move, locked into agreements.
  But I think free is going to be infinitely better than paying Anthropic more money than you used to spend on your human payroll. The big pop is coming.
dav8078063144 minutes ago
[flagged]