"Service X is down" is not "news". Kinda feels like points-farming.
Anyone caring that X works is gonna know it's down - they can't work! And probably why they are at HN ;)
HN as an "is service X down?" detector is less reliable than trying the actual service :)
If HN is gonna keep letting people post "X is down" no worries. But it seems worth a flag.
And it's not like open models are cheap to run even as alternatives. For example, with my $100/mo subscription for Claude Code, I often burn more than $100 a day several times a week. But if I were to use the API of GLM, it would be about $300.
Gpt 5.5 uses a third of the opus 4.8 tokens for the same task and scores higher. Glm 5.2 was worse in quality but used half the tokens - 5.3 is not tested yet but will be higher.
Not to mention the way the proprietary models patronize you if they think you’re up to no good. The other day I was trying to use Claude to transcribe a song on YouTube to sheet music; the link was broken and it had not been available for purchase for years. Of course I first had to prove to Claude that I was not just being cheap and trying to get around paying the $5 for the download, which I would have had no problem with. “I am going to be up front with you —- my research raises some red flags with your initial premise that the sheet music is no longer available…”
I think in the next 5-10 years inference costs will come down substantially and the open source models will get so advanced that only someone in an extremely niche, cutting edge field will need a frontier model. Everyone else will have a dedicated box somewhere on their network that runs their LLM of choice. No tokens, none of your data getting sent to a third party, no arguing with it over whether or not a link is actually broken or if you’re just trying to be cheap.
I think OpenAI and Anthropic realize this which is the reason they’re in a rush to go public.
I would give GLM a try. I'm shocked at how well it's been able to handle some things I've thrown at it.
These lead to small accumulation of sampling errors which makes it all but inevitable that open source models will shit the bed by the 200K token mark or even sooner.
If you set your opencode to use a good sampling algorithm, such as min_p or top-n sigma (llamacpp supports both), you'll find that at least for long running tasks, your model gets a lot better.
It won't make GLM as good as Opus 4.8, but it will stop the feeling of "brain damage" from running open source models at the edge of their context windows.
And yes, there is an upcoming (hopefully NeurIPS) paper titled "Long Context Generation is a Sampling Problem" for more details about this. Give it two months and it'll be on Arxiv one way or another.