I wonder if managers will be as excited about AI when the prices go up.
I suspect the API prices are already served with profitable unit economics. The SOTA API prices are much higher than the costs for other providers to run very large open weight models.
The monthly subscription plans were being offered at a discount to generate interest in these models.
We're not entering a period of billing AI at cost. We're entering a period of exploring how how the prices can go before losing too many customers.
Products and services aren't sold at cost. They're sold at the price the market will bear. It takes some experimentation to find that equilibrium point where you make more profit per customer but don't lose too many customers.
There is absolutely no evidence to support this.
Obviously this is an extremely rough calculation. I can even be off by a factor of 10 and it's still a pretty good return.
Training is akin to the cost of building the software/product. Inference is selling the product.
In my opinion, it’s a profitable kind of service. They probably don’t pay the public prices for the cloud GPUs though.
Or, as I would say if I were Bugs Bunny, “Duck Season”
Analysts like Semi-Analysis have done a lot of modeling and estimates on the topic.
But two can play this game: There is absolutely no evidence to support that API prices do not have profitable unit economics.
Typically the burden of proof is on the one making the claim.
They have some of the best publicly available analysis on these topics. The full details and numbers are hidden behind the institutional accounts which are priced for investors (not something you sign up for personally) but they're generous with what they send out in their newsletter.
If you're not familiar with resources like this I could understand how you'd assume that the providers are hemorrhaging money on inference costs, because that is that story that gets parroted around spaces like Hacker News.
You could ignore all of that, though, and go check OpenRouter to see how much providers are selling high parameter count models. They're not entirely at the level of the SOTA models, but the biggest open weight models are not that far behind in complexity either. They're being sold an order of magnitude cheaper than what you pay for the APIs from the major players. We don't know exactly how big the major models are, but it's unlikely that they're more than 10X more compute intensive from the leaks we do have.
The cost of AI inference has been a heavily analyzed topic. I trust the professional analysts much more than the casual Hacker News commenter claiming they’re losing money per token because they’re repeating what they heard some other Hacker News commenter say
nobody needs to prove their suspicions
I see it as no different from the previous generation of consumer startups burning money - as Derek Thompson wrote,
> ...if you woke up on a Casper mattress, worked out with a Peloton, Ubered to a WeWork, ordered on DoorDash for lunch, took a Lyft home, and ordered dinner through Postmates only to realize your partner had already started on a Blue Apron meal, your household had, in one day, interacted with eight unprofitable companies that collectively lost about $15 billion in one year.
The conversation around AI being cheap now started when ChatGPT launched in 2023
> I wonder if managers will be as excited about AI when the prices go up.
Companies are willing to pay the api pricing. Engineering time is very expensive and AI coding agents actually work now since December and are actually showing measurable productivity gains, finally. It’s a good deal to make (obviously, with caveats: you need to make sure your tokens are going on productive tasks that will actually grow revenue) and anyone who penny-pinches is making a strategic mistake.
I always wondered about this statement, like we are generally salaried and there is so many variables that affect how I spend my "time". None of us are machines that can do X work per day and our managers get to slice it as they see fit. Pull a dev off a project they love and throw them onto something they hate and suddenly X is diminished greatly.
I would almost predict that reshaping our workflow to be: "prompt, wait, approve changes." results in losses because it is such a mentally tiring workflow and drills into our brains the desire for the LLM to "just fix it". It is the next level of just moving tickets to completed all day.
I don't think it is. At some point they have to make money and they can't do that if the token cost doesn't include ALL the costs. Someone has to pay for that at some point. And someone has to pay for the subsidized subscribers. So no. API token prices don't reflect the real price. They are still subsidized. Just in a different way.
It is? If another company comes out with a better model tomorrow and offers it at the same price Anthropic charges for Opus, they’re going to lose customers fast. They have to keep training to keep selling inference.
Most businesses factor in the cost of making their product into the product’s P&L.
Lastly, theyll realize like every good capitalist, theres more profit in exclusiveness and cutiing out customers.
If your company has Figma, Github, and Cursor and they're using the same models you are, your monthly costs with them increase as well. You're exposed N times to the foundation model price increases, where N is the number of times software you directly or indirectly use talks to a frontier model.
Citation needed. Anthropic does not have public books
If their CEO was just flapping his mouth without any other comparable baseline, it'd probably be different. But as the GP points out, open-weight model providers are charging comparable rates and very likely have positive profit margins. That would imply that with API pricing tokens are sold at above cost.
That cost may well be "inference only", so excludes everything apart from hardware and power. Whether that's enough to cover the enormous training costs and other overheads is a different question.
he has access to the real numbers and a legal risk from lying publicly. it does him no good to lie about this.
That is no longer a helpful tool... it costs like ~15% of an actual dev.
Even if it is helping, is it actually... making things better or building anything truly important? The issue seems way too nuanced to spend $2k/mo. Not to mention the entire tech industry floats on hype and imaginary goal posts so now what? Devs can hurdle towards those faster and more mindlessly?
The full cost of each employee is more than their salary. The common estimate is 1.4X their salary due to all of the employer-paid taxes, benefits, and other things.
So even $2K/month of token costs would only be around 10% of the cost of a mid-range developer cost.
It doesn't have to increase productivity much to justify the cost.
Another challenge for US tech companies is that - if you'll forgive the bluntness - their "brand" is now toxic in most of the world. Almost everyone is trying to distance themselves from US tech as fast as they can. Governments and big businesses are starting to invest seriously in alternative solutions and local resources. It will happen over time but I don't see much the US tech companies or the US government can do to stop the train now the wheels are turning.
So there's a serious risk for US tech companies now of a double whammy where their already relatively high R&D costs increase even further and yet they're also facing much stronger competition in international markets or maybe even excluded from some of those markets entirely.
If we also reach the seemingly inevitable point that "capable enough" LLMs can run locally - or at least as a private resource provided internally by large organisations - there is very little moat left to protect not only US Big Tech whose stocks have been heavily driven by expected returns from AI but the whole US tech industry that is banking on productivity gains from that AI tech. Then they also won't be able to capture most of the entire global supply of components like GPUs/RAM/SSDs because it won't be cost effective any more - and that is one of the few practical moats they have built (however accidentally) that would be a significant barrier to direct competitors setting up shop in places like Europe and Asia.
It's going to be interesting to see how US tech companies respond to these effects over the next 5-10 years. The giants are all aboard the AI train and can't back down now so there will probably be some casualties there if - as again seems inevitable - the bubble bursts at some point. But then there's a very long tail of still very successful US tech companies that might be paying US salaries and using AI-based tools but aren't themselves focussed on developing or providing those AI-based tools and they're the ones who are going to need to find new ways to compete effectively within that kind of time frame.
What they don't like is paying money for the work, that's all that matters to them.
Thus, your compute is significantly more expensive than AI. Thankfully your taste is also part of your package deal, and is where you deliver real the value over an LLM.
sometimes people just have different belief systems than you, and that’s actually okay
I have used the following on a 32G MacMini to help write useful code:
ollama launch claude --model qwen3.6:27b-coding-nvfp4
The problem is that running local models (except for engineering tasks like data munging) is slow. With the above setup I set up a task (asking for no user verification) and go for a walk to wait for results that my Gemini Ultra plan would produce in 10 seconds.
IMO the programming world is far too myopic about / insistent on using laptops, especially macbooks. Just because a crappy deal exists doesn't mean everyone is forced to take it. Local AI is a high performance computing problem and laptops are fundamentally a crappy form factor for it; buy an efficient desktop computer and be surprised at what's possible even with today's crazy prices.
We all know, and have known for a long time, that the AI labs selling dollars for a nickel are going to pull that rug, and up that price, at some point.
Copilot, though, has been consistently the weakest mainstream AI coding offering. Inferior to Cursor or Windsurf at editor completions, inferior to Codex, Claude, OpenCode, blah blah blah, at agentic coding and also the old-school chat-style...
And now, it's no longer cheap AND now sucks even more than it has all along — the new $39/month plan is not only worse than all its competitors, but worse than its own $10 plan was a month ago — by a lot.
The thing is, you can't jack the price up unless you're good enough — at least on some axis, to some customer segment — to jack the price. And when you're not good enough, and you have vastly superior competitors who are not doing that yet... you're just forfeiting the game.
Which I agree, Copilot should do — it's the Windows Phone of AI coding assistants, after all — it still seems weird to me to just commit humiliating suicide rather that trying to make some deal with one of those superior competitors.
Instead of just jumping into a dumpster and lighting yourself on fire.
Even before yesterday, I assumed they made money via the gym model. I'd have months where I'm too busy to use my co pilot subscription in any meaningful way.
Canceling and restarting is too much of a hassle.
But with the pricing update I'd probably use up the 10$ plan within 3 days.
I don't know if anything else is integrated so we'll into GitHub though. I might keep the 10$ plan just for the occasional GitHub AI PR.
Just today, when I wasn't being especially chatty with GHCP, I used about 12 requests to get a few thousand line changes in 3 projects I'm juggling. The last project repo of copilot I closed, in 3 hours burned 38M input tokens, 28M cache, and like 400K out. For GPT5.4, high. That's like $135, in half the day, 1 of 3 instances. No crazy tool use, just lots of docs and unorganized code. GHCP charged like 70 cents for that on the old plan.
It seems everything Microsoft does is like this nowadays. They just can't seem to win anymore.
I've seen some projects that use it and you open the PR page to be greeted by every PR having 3-20 comments but when you goto the actual PR, there's no one except the contributor with a bunch of Copilot feedback.
It gives a false message that the PR is resonating with folks and has real activity. I wonder if GitHub did this on purpose to make engagement seem higher than it really is.
I want to know how many real humans read my post, commented, shared etc.
Clankers can keep their own counts.
It's a bummer because it's hitting a lot of users and even valid users who don't communicate good are getting hit hard too with skeptical responses.
Previously a quick scan of comment history would make it obvious you're looking at an LLM, now you're stuck arguing over a one off comment where they can get away with benefit of the doubt.
[0] https://github.blog/news-insights/company-news/bringing-more...
It seems that for "actions", the trailing twelve months availability is 98.67%.
Trailing 3 months is even worse :/
My org noticed the incident at 12:19p ET, Github pushed their first update at 12:38p, and pushed that it was mitigated at 5:48p.
[1] https://securitylab.github.com/resources/github-actions-prev...
I have always found it as a pretty nice to have feature if I am already using GitHub. It’s far from perfect or robust but I can get a lot of use out of it with low to no friction.
"Give $provider a break, they have such crazy scale that they can't possibly hope to have great uptime"
... yet it very rapidly gets lower uptime than a service running on a desktop in the corner of the office with some backups that get restored somewhere else.
Most sysadmins will tell you tales of laptops with a decade of uptime hosting simple services that nobody cared about (IRC, ticket software) with no downtime, not even an hour, and people only discovered that fact when they decided it was too slow and it's time to migrate.. These services have become less reliable than that, and servers themselves have only gotten more reliable in that time..
(yes, I'm aware of the security liability of decades old software running, even if it's not accessible by everyone)
There's a weird doublethink going on.
It is time to setup local models. It is cheaper, and you already have a computer. Why keep it idle and pay someone else for their CPU?
Once it is cheaper, there will be more demand so it will no longer be cheaper. Buying now gets current prices (though demand is still fairly high).
There are more and more independent AI inference providers without VC backing that serve open weight models on a ~cost-plus basis that show that subsidies are not significant for AI inference.
honestly really surprised they haven't gone full ham on AI data centers.
E.g. a well-designed deployment (infrastructure-as-code) repository doesn’t need a frontier model to be understood well-enough to create a new job / service using sibling jobs / services as templates.
And this already saves me dozens of minutes per week, although it’s not a 2x multiplier in my efficiency.
If there are 100 companies you can choose from, yes.
If there are 3 oligarchs that own all options, not.
Capitalism only works when there is competition between many players. When you get less than a dozen players the prices are too easy to increase to maximize profit. They do not need even to talk between them, to not start a price war is the only logical strategy and they follow it. That is why big-tech is so problematic.
In the past, this kind of companies were highly regulated. Phone companies were not allowed to wiretap calls, prices were limited by law, etc. Internet providers had the same regulations applied to them. But service providers run amok without control abusing their position and hurting the rest of the economy. Do not expect lower prices in such unregulated environment.
They never really get tight very long: the various states are way too busy flooding the world with endless money printing to kick the can of the public debt always further.
Covid financial crash? We went to new highs. 2022 tech flash crash (Meta and Netflix did -75% for example): we then went to new highs.
The only way for governments who ever spend way more than they bring in taxpayer dollars is to de-valuate the currency.
So "financial markets getting tighter": probably won't last.
So you'll probably never see government customers allow that and neither will a lot of commercial customers.
I don't see the risk. If your code is easily AI generated you don't have a moat anyways. A Chinese competitor probably won't have as easy of a time as a US one of you operate in the US
Further, at a lot of companies, the risk has to be acceptable to shareholders and auditors. Perceived risk is often a more powerful motivator than actual risk.
lmao tell that to the artists, authors and foss contributors whose work has been cloned into the llm oracle
I mean, "Copyright Infringement" famously does not translate to Mandarin; but we have Amazon ripping off best sellers in their marketplace pretty brazenly and Apple "sherlocking" applications -- that's even where the term comes from.
The models themselves are trained on a corpus of material that was obtained with dubious legality... though I suppose some argument could be made that they're forced to bend the models because of that.
I'd be more wary of these models terms and conditions granting a license to themselves for everything they come into contact with: nobody is reading these licenses it seems. Copilots old one only allowed for "being inspired" by the output, despite they themselves producing an IDE of some kind which allowed you to make complete projects from suggestions: directly breaking their own T&C's.
Also, how long do you think openai, Microsoft, Google, anthropic, etc could delay a lawsuit while you pay hundreds of thousands in legal retainers? 5 years? 10?
From the perspective of someone currently living in the EU... I'd say thats pretty much a wash (or even slightly tilted in China's favour) for folks outside the US
Maybe the government should nationalize GitHub at this point as it is absolutely critical for US infrastructure and MSFT has shown to be a terrible steward for the public.
[1] Ed Zitron speculates the actual prices with token based billing for heavy users will be something like 10x the subscription price, but this seems high.
Although I would also point out that OpenAI recently tripled the amount of Codex inference you get per month for £200 (and to head off the suggestion, this is distinct from their current 2x promotion on £100/month plans)
Neither of those is how much it actually costs the company selling the service. And I have feeling they are running at loss here so the play is "get everything possible using LLMs then jack up the pricing"
Inference is cheap but training is quite expensive. Plus all the money they've invested and keep investing on hardware, data centers, etc. And evidently they also need to make a profit at some point.
Maybe from the perspective of traditional, turn-based chat. But when you start having developers command an army of agents that work around the clock, those cheap tokens start adding up fast...
I think the margins have to be a lot higher than that in order to give investors the return they're expecting, to continue the never-ending training treadmill, and to build more and more datacenters to accommodate people basically DDOS'ing the GPUs in order to run their workloads.
Yes, in theory what you said makes sense. But the tightrope these companies have to walk is that the per-token costs still have to be low enough that developers and companies don't just say "ehhh I guess we can still do all this work the old-fashioned way" but ALSO high enough to cover the massive expenses AND astronomical returns everyone's expecting.
If prices go up, I suspect a bunch of folks will jump to cheaper, less capable models instead of eating the added cost. The whole value proposition of AI in enterprise is around cost-cutting, so that mentality is likely to persist when choosing which model to pay for.
> Last month, we shared how GitHub Copilot code review runs on […] GitHub Actions using GitHub-hosted runners.
They say that they’re now billing against their actual costsBut also - enterprise accounts already have budget assigned to github actions, and this allows them to start billing right away without having to actually get (or allow) businesses to evaluate the return of having copilot do code reviews.
So seems like it's a mix of immediate incentives and long term architecture. I don't like it, though. If I were an enterprise my first response would be to turn it off.
Hang on, I read this as copilot reviews with bill both actions minutes and AI credits. Did I miss something?
Though actually the more I think about it, I think this change actually does make more sense. In the case of the AI running on GitHub side, that does feel pretty equivalent to CI minutes. I would hope that the number of minutes they bill for is pretty minimal though, since the vast majority of that will be I/O waiting on the agent to return
Done that way it obfuscates cost of the code review and I think that's on purpose
Weird that Anthropic decided to build a Claude Code Routines toilet.
Do they, though? I don't know a single person who uses GitHub who actually likes it. It's far more often something like "it's fine, but I miss (GitLab|Gerrit)" or "I stopped using it for personal stuff and moved to (Codeberg|GitLab)."
The brand recognition among non-technical folks is really the strongest selling point in my eyes. And that's irrelevant to ~95% of software development.
If you want to make your repos public, you could use cgit and the like.
Between 27x model costs and this, CVE exploits and downtime their platform is starting to feel like a questionable decision.
Stopped my recurring subscription at the end of last year when it started spinning up actions for review. Which as a side effect doubled the time (or so) to do a review. Whereas before that I would open a PR, wait at most a minute or two and the review was already done.
I’m blowing through my 1000 mins in days.
Thinking to either pool some free tiers or figure something out with spot instances.
Also is it just me or is CI/CD tooling still sort of rough all around.
Hetzner has cheap VPS that I host my CI on. It costs like $10/month.
Pick the cheapest region, since CI runners location doesn’t matter much.
But I think the issue is that my situation (solo dev, mono repo) is just not right for a dedicated instance.
With only 1-2 runners, the pipeline is slow (low parallelism) and resource constrained. And at least 50% of the time its idle (I'm not working/sleeping).
I guess what I'm really looking for is for some kind of aggressive autoscaling, and aggressive caching.
I tried a couple of things (GHA, Dagger + Hertzner, Buildkite)
And Im just not too sure theres going to be any out of the box solution since my priority is essentially to minimize cost and maximize efficiency. Not really a great customer for any providers.
Im tempted to just get agent to build something out quickly with cloudflare workers + spot instances.
I also have some other nice to have requirements:
- ts/code over config
- locally runnable and testable
- preferably no lock in
- repeatable/reproducible
A buddy of mine runs his whole CICD setup off an old gaming desktop. They use tailscale to connect to their hosted infrastructure and set it up as a GitHub action runner.
For a solo dev this might be the way to go.
My wife uses my old gaming desktop for her ux design work as well.
And I was thinking of using the gpu to run some tts models.
Now to just figure out a way to run it all on windows and have it auto start when she logs in.
So what? its $10 a month. Why do you need to chase 100% utilization?
And use can use that to host your website, a game server, maybe some other projects...
https://docs.drone.io/server/provider/github/
Very easy to stand up, does just fine. Definitely doesn't have the "library" of prebuilt actions that GHA does, but for the most part... I consider that a plus.
Otherwise it's very similar in concept - define actions in a yaml file, run commands on an image, webhook integration with most repo providers.
I run it on some old hardware locally (k3s cluster on old machines) and it outperforms the 1000 minutes from GHA easily, and costs basically nothing but some maintenance and time.
I've been keeping my eyes open for something new in this space since Harness bought it, though - so if other folks have recommendations I'd be interested in alternatives.
Best decision we ever made
There’s a difference between criticizing a company and just swapping names for insults. The former can be useful, the latter just turns the discussion into noise. If you’ve got a point about Copilot or the review feature, make it. Otherwise it’s hard to see what anyone is supposed to take away from “ShitHub” other than childish shit-posting.