89 pointsby amarble7 hours ago16 comments
  • julianlam3 hours ago
    I think it's interesting that people write off open weight models because they're "a few months behind" proprietary models.

    I know LLMs move at the speed of light (especially these past few quarters), but if Opus and GPT "a few months ago" were really like open weight models, then there's really no reason to not switch, especially for those who were using these models a few months ago.

    Your codebase didn't change, so use the open weight model. Don't move the goalposts.

    • kgeist2 hours ago
      Every new proprietary model is "groundbreaking" and "look, it just solved task X that no other model could solve," only to be referred to as "that crappy previous-generation model" a month later.

      So yeah, I'm totally fine using Kimi-2.7, GLM-5.2 or Deepseek-v4. I think we've already hit the ceiling and most improvements now seem to be from harness improvements and slightly better RL to improve reasoning/tool calling.

      • jbverschooran hour ago
        Not only that, but to me it seems that after a week the intelligence is being downscaled or routed. Maybe because of lack of capacity
      • realusername23 minutes ago
        There's also a lot of benchmark trickery going on, it's becoming harder to see how the latest models really improved.

        The top models also seem to have inconsistent performance depending on the time of day and how far we are from the next release.

      • 4fffs2 hours ago
        Correct. Anything else is pure marketing and you have fallen for it.
      • matheusmoreira39 minutes ago
        There's at least the possibility that they intentionally degrade the models as time passes. We can't really verify that we're getting what we're paying for all of the time. All the more reason to invest in local inference.
        • taytus21 minutes ago
          At current prices, and considering these OS Models' performance, investing in local inference sounds like a bad idea.
          • matheusmoreira16 minutes ago
            Current prices are insane but at this point I'm starting to feel like it's an existential issue. I'm not a US citizen. At any point the USA could come up with some arbitrary export controls. Not having a computer capable of running at least Qwen is starting to actually seem risky to me.

            At least it's going to be usable as a very high end gaming PC.

          • jrm411 minutes ago
            At current "proprietary inference company behavior," investing in local inference sounds like the exceedingly far more rational option.

            Long term predictability ought to far outweigh a few more cycles of performance.

    • Aurornis34 minutes ago
      > I think it's interesting that people write off open weight models because they're "a few months behind" proprietary models

      I experiment a lot with the open models and I’m getting tired of this trope. I’m not yet convinced that even the best open weight models are equal to Opus from “a few months” ago.

      I know what the benchmarks say. I had higher hopes. My real experience just doesn’t match the benchmarks.

      I also do a lot of work that even Opus 4.8 struggles with. When even the cutting edge LLMs aren’t all the way there yet, my motivation to switch to something even further behind just isn’t there.

    • itwaswatson4 minutes ago
      We have a provider with Deepseek V4 flash at our work. It can handle 95% of the "actually functional" workload at a tenth of the cost. I still pull up beefier ones sometimes, but that's after some consideration.

      The moat is so flat, it only gives +1 food and +1 production. +1 gold with a road.

    • dwoosleyan hour ago
      The only reason I'm on HN right now reading this post is because the Anthropic's API is down... so there's another point for self hosted.
    • Gigachadan hour ago
      The reason for me is work pays for Github Copilot which doesn't have these open modals.
    • taormina2 hours ago
      For that matter, the new models are shit. If I’m using Opus 4.6 anyway to get anything actually done, then great, we’re actually entirely caught up then.
    • TacticalCoder2 hours ago
      > I think it's interesting that people write off open weight models because they're "a few months behind" proprietary models.

      The really interesting thing is that it's typically those very same accounts who were explaining, a few months ago, that thanks to their commercial model they were gaining so much time and producing so much fantastic code.

      A few months passes and suddenly the open-source model have caught up with the models that were gaining them so much time and that produced amazing code (in production everywhere for sure btw) but... It's impossible to work with these models.

      Rinse and repeat.

      The current models, according to them, are basically AGI and they can go fishing while paid subscriptions solve the world's problems.

      But when it six months there shall be new closed, pricey, models and when the open ones shall have reach the level of Fable, we'll hear how it's impossible to work in late 2026 on a model that is "only at the level of Fable".

      These people should have been snake-oil salesmen (and it could be what they actually are).

      • nemomarxan hour ago
        My most charitable interpretation that there's some honeymoon effect for each release, and people genuinely feel very productive and useful for 2-3 months. By the time the next big model release happens they've seen some issues or run into something that makes them feel like the new model will fix all that and improve their flow so much, etc.

        Not unusual in the tech space, but this has been basically constantly happening for two years now? I can't imagine the improvements are more than incremental at this point.

    • tonfreedan hour ago
      Even just one of the smaller models is good enough for the grunt work I use them for 90% of the time. Currently doing most of my home hobby projects with OpenCode Go and Qwen 3.7 Plus, it's not great at diagnosing issues in the code, but if I can clearly articulate a test suite or boilerplate refactoring it works fine.
  • bnj21 minutes ago
    I’ve been wanting to get better acquainted with local inference but I don’t have the hardware, which has made me think about something I haven’t seen discussed, which is local collaboratives. The economics makes it seem like a group of people joining together to run good hardware and an open model might make sense, but I haven’t seen anything like this mentioned. Have I been missing it?

    I think it would be pretty neat to launch a service helping people who wanted to participate in something like that locate one another.

    • blackoil14 minutes ago
      Open models hosted in Cloud???
  • pkulakan hour ago
    Sure. But OpenAI is the same price. Why would I pay $18/month for z.ai when OpenAI is $20/month?
    • CJeffersonan hour ago
      One big advantage I’ve found — people get attached to models (including me). With open models if you find one that works perfectly for you but the next version doesn’t, you can run the old one forever (or someone will for you)
      • itake4 minutes ago
        But… the models will fall behind. As libraries and languages and tool calling updates or the world knowledge changes, the models decay.

        Personally, I don’t like the change, but it’s just how technology works so I’d rather move with the flow than try to stick my foot down and freeze time.

      • taytus19 minutes ago
        This is a good point I never thought of. I appreciate it.
  • Aurornis37 minutes ago
    The headline says one thing, then the article text says this:

    > I’m hoping it’s going to be minimal.

    I have multiple subscriptions and I pay per token to try out different LLM providers through OpenRouter. I also run open weight models locally.

    I just can’t agree yet. The models from Anthropic and OpenAI really are that much better than anything else. The open weight models must be universally benchmaxxed across the board because my real world experience with them is very different than what the benchmarks imply. I get downvoted a lot for speaking about my experience because I don’t think it’s the reality that people want to hear right now, but it’s true for complex work.

    I do think there are a lot of easier tasks that can be handled appropriately by the open weight models in the hands of a skilled operator. If an entire job is simple enough that you wouldn’t hesitate to hand it off to a junior with a little supervision then any model will do. However for a lot of the work I do, even Opus 4.8 on Max requires a lot of attention and extra steering and review to keep it on track. Fable did, too, though to a lesser degree. When I try to use the big open weight models (hosted, because they’re not running at reasonable speeds locally at a quantization I can tolerate) it feels like I spend more time waiting while they burn tokens for output that I probably have to reject anyway, at least for the bigger tasks. I wish they were there, but that’s not the case yet.

  • radhitya2 hours ago
    Have you read about Opencode Go? They are great provider for open model, like GLM 5.2, Deepseek v4 Pro, Kimi 2.7 Code. You should give it shot to them :-)
  • linzhangrun2 hours ago
    Open source models are still not good enough for now, but with the current speed of one new SOTA every two months, by this time next year we will definitely have cheap open source models at least as good as Fable :)
  • mdale3 hours ago
    I think the frontier will command premium for sometime just as slight better software developers were 10x's vs their peers as their architecture & development strategies and code approach compounded quickly. One less error per block of work compounds quickly.

    Sure, there may be some cases and reasons for local models and industry is so large they will continue to make progress and gather economic value and users for specific use case; but frontier will command vast majority of the economic value distinct from Linux and open source where the model created better than proriatary economic incentives around development

    • byzantinegenean hour ago
      10x developers were not slightly better than their peers, they were vastly superior and faster. OTOH, the lead of frontier llms is diminishing as training is getting diminishing returns.

      Also, on that note. Not every company needs 10x developers, just as not every task needs frontier llms. Ultimately, operating costs will be the largest contributing factor.

    • 4fffs2 hours ago
      Youre clutching at straws.

      Ultimately its a financial game. Open source is far cheaper so it already has an upper-hand. Frontier models have to justify financially why they are worth the additional spend.

  • cpillan hour ago
    I think once the hardware process comes down and these mini DGXs become cheaper, and by then open models still be smaller and better, there is going to be less and less reason to use the providers. CEOs are already complaining that they are costing too much. There are also large organisations like Banks which can't use external services and are already looking at internal housing. it's a good thing so the big AI companies just went IPO as once the self hosting trend kicks in they are going bust.
  • DANmode3 hours ago
    But, what model are you using?

    and what hardware are you using?

    • 0gs3 hours ago
      yeah, on a 96GB Mac Studio and Gemma+Qwen, it's definitely fully doable. fully doable but not really for coding on 16GB. but svelter models and cheaper (eventually) hardware are coming!
      • fluidcruftan hour ago
        If you don't have that hardware thr math of buying a depreciating computer is challenging if you are satisfied with the $100/month plans ($1200/year). A 96GB Mac Studio is ~$4k. I think if you have the hardware already as a sunk cost then yes it makes sense. But I'm not sure it is worth spending $4k for today's hardware vs waiting for newer hardware in a few years.
      • nezuzen3 hours ago
        "cheaper (eventually) hardware" Best case 2-3 years from now. Otherwise it will take a major global recession to get us anywhere near last year's prices.
      • marcus_holmesan hour ago
        Macs are expensive hardware, but I'm always seeing people running LLMs on them. Is anyone running on cheaper generic hardware and Linux?
        • brucehoult23 minutes ago
          A Mac is cheaper than a high end GPU with the same amount of RAM.
      • Gigachadan hour ago
        I suspect hosted and local will converge when hardware prices come down and API prices go up. The massive rate of datacenter build out will be unsustainable. Right now the hosted models are massively cheaper than buying the hardware and running it yourself which signals that hosted is very subsidized.
  • PcChip3 hours ago
    Is it just me or is half the article missing?

    I enjoyed the first part though

  • blindriver9 minutes ago
    As someone that has pretty powerful desktop that I've been using with local open weight models, people are far exaggerating the quality of them. Some of them are now useful. They don't compare yet to the online models of ChatGPT, Claude, Gemini, etc. They are still about 18 months behind. I have accomplished useful work with them, like image classification on Gemma4, but they are much much slower, much much more expensive and they don't scale at all.

    A $10,000 RTX 6000 Blackwell card will pay for 500 months of Claude or Codex, which is 40 years worth of compute. Obviously they are going to raise their prices, my prediction being to $200-500/month, but that still makes them at least years of compute and they scale very well with more traffic. Single GPUs do not, they are pegged at 100% and good luck getting it to answer multiple queries at the same time.

  • aussieguy12342 hours ago
    >There was a time not too long ago when using Linux entailed some professional risk1. First there was compatibility: you may not have been able to render a Word document or PowerPoint correctly, and you might have had to trust Open Office’s export capability to render docs the way you wanted

    For a while during this era, I used to port my laptops windows installation into a virtual machine that can run on Linux. It took a bit of hacking away but I could usually do it in a day or two. Then its all Linux with the windows vm being used for the microsoft stuff.

  • causality0an hour ago
    I know open models have gotten quite good in many tasks such as coding or composition, but are there any that can access the internet and retrieve data like ChatGPT, Claude, etc can?

    I do have to admit I have recently begun wishing I could pay five dollars a month for a "just answer the fucking question" plan that would give me results without the guardrails and without the constant simpering and ego-stroking. I keep finding myself going a quick evaluation of "is it faster for me to skim search results myself or to construct an elaborate narrative to make an AI give me a real answer".

    • JSR_FDEDan hour ago
      Just go to kimi.com and try for yourself (not affiliated, but happy user).

      First time I did this I realized in 5 seconds that the big players weren’t going to be carving up the market between them.

    • wiljan hour ago
      > I know open models have gotten quite good in many tasks such as coding or composition, but are there any that can access the internet and retrieve data like ChatGPT, Claude, etc can?

      The things you describe are just tool calling, they're a feature of whatever harness you use. Use OpenCode, pi.dev, or maki.sh with any of the open models.

      > I do have to admit I have recently begun wishing I could pay five dollars a month for a "just answer the fucking question" plan that would give me results without the guardrails and without the constant simpering and ego-stroking. I keep finding myself going a quick evaluation of "is it faster for me to skim search results myself or to construct an elaborate narrative to make an AI give me a real answer".

      You can do most of this with some system prompts added to whatever agent you're using. You can do it from the settings on the claude/chatgpt websites too. (minus the no-guardrails thing)

    • linzhangrunan hour ago
      You can let the AI solve it itself, and then it will provide two solutions: implement a local search service (easily blocked), or purchase a Web Search API service
  • cws_ai_buddyan hour ago
    [flagged]
  • c_chenfengan hour ago
    [dead]
  • codelong8882 hours ago
    [dead]