More than that, it's an extremely large and complex TypeScript code base — probably larger and more complex than it needs to be — and (partly as a result) it's fairly resource inefficient (often uses 1GB of RAM or more. For a TUI).
On top of that, at least I personally find the TUI to be overbearing and a little bit buggy, and the agent to be so full of features that I don't really need — also mildly buggy — that it sort of becomes hard to use and remember how everything is supposed to work and interact.
That's (one of the reasons) why I'm favoring Codex over Claude Code.
Claude Code is an... Electron app (for a TUI? WTH?) and Codex is Rust. The difference is tangible: the former feels sluggish and does some odd redrawing when the terminal size changes, while the latter definitely feels more snappy to me (leaving aside that GPT's responses also seem more concise). At some point, I had both chewing concurrently on the same machine and same project, and Claude Code was using multiple GBs of RAM and 100% CPU whereas Codex was happy with 80 MB and 6%.
Performance _is_ a feature and I'm afraid the amounts of code AI produces without supervision lead to an amount of bloat we haven't seen before...
The redraw glitches you’re referring to are actually signs of what I consider to be a pretty major feature, a reason to use `claude` instead of `codex` or `opencode`: `claude` doesn’t use the alternate screen, whereas the other two do. Meaning that it uses the standard screen buffer, meaning that your chat history is in the terminal (or multiplexer) scrollback. I much prefer that, and I totally get why they’ve put so much effort into getting it to work well.
In that context handling SIGWINCH has some issues and trickiness. Well worth the tradeoff, imo.
The difference in feel between Codex and Claude Code is obvious.
The whole thing is vibed anyway, I'm sure they could get it done in a week or two for their quality standards.
What would make go more "accessible to contributors" than Rust?
You need to set an explicit "small model" in OpenCode to disable that.
Have fun on windows - automatic no from me. https://github.com/anomalyco/opencode/issues?q=is%3Aissue%20...
this is what i notice with openclaw as well. there have been releases where they break production features. unfortunately this is what happens when code becomes a commidity, everyone thinks that shipping fast is the moat but at the expense of suboptimality since they know a fix can be implemented quickly on the next release.
I’m sure we’ll all learn a lot from these early days of agentic coding.
So far what I am learning (from watching all of this) is that our constant claims that quality and security matter seem to not be true on average. Depressingly.
But as agents move from prototypes to production, the calculus changes. Production systems need: - Memory continuity across sessions - Predictable behavior across updates - Security boundaries that don't leak
The tools that prioritize these will win the enterprise market. The ones that don't will stay in the prototype/hobbyist space.
We're still in the "move fast" phase, but the "break things" part is starting to hurt real users. The pendulum will swing back.
Only for the non-pro users. After all, those users were happy to use excel to write the programs.
What we're seeing now is that more and more developers find they are happy with even less determinism than the Excel process.
Maybe they're right; maybe software doesn't need any coherence, stability, security or even correctness. Maybe the class of software they produce doesn't need those things.
I, unfortunately, am unable to adopt this view.
I'm 13 years into this industry, this is the first I'm hearing of this.
Also most of the long running enterprise projects I’ve seen - there was one that had been around for like 10 years and like about 75% of the devs I hadn’t even heard of and none of the original ones were in the project at all.
The thing had no less than three auditing mechanisms, three ways of interacting with the database, mixed naming conventions, like two validation mechanisms none of which were what Spring recommended and also configurations versioned for app servers that weren’t even in use.
This was all before AI, it’s not like you need it for projects to turn into slop and AI slop isn’t that much different from human slop (none of them gave a shit about ADRs or proper docs on why things are done a certain way, though Wiki had some fossilized meeting notes with nothing actually useful) except that AI can produce this stuff more quickly.
When encountered, I just relied on writing tests and reworking the older slop with something newer (with better AI models and tooling) and the overall quality improved.
All code is not fungible, "irreverent code that kinda looks okay at first glance" might be a commodity, but well-tested, well-designed and well-understood code is what's valuable.
Code today can be as verbose and ugly as ever, because from here on out, fewer people are going to read it, understand and care about it.
What's valuable, and you know this I think, is how much money your software will sell for, not how fine and polished your code is.
Code was a liability. Today it's a liability that cost much much less.
How much value are you going to be able to extract over its lifetime once your customers want to see some additional features or improvements?
How much expensive maintenance burden are you incurring once any change (human or LLM generated) is likely to introduce bugs you have no better way of identifying than shipping to your paying customers?
Maybe LLM+tooling is going to get there with producing a comprehensible and well tested system but my anectodal experience is not promising. I find that AI is great until you hit its limit on a topic and then it will merrily generate tokens in a loop suggesting the same won't-work-fix forever.
The whole thing reminds me a bit of the many RAD tools that were supposed to 'solve' programming. While it was easy to start and produce something with those tools, at some point you started spending way too much time working around the limitations and wished you started from scratch without it.
[1] https://museumoffailure.com/exhibition/wonka-chocolate-exper...
There are limits to what even AI can do to code, within practical time-limits. Using AI also costs money. So, easier it is to maintain and evolve a piece of software, the cheaper it will be to the owners of that application.
Code that has not been thoroughly tested is a greater liability, not a lesser one.l, the faster you can write it.
I expect that from something guiding the market, but there have been times where stuff changes, and it isn't even clear if it is a bug or a permanent decision. I suspect they don't even know.
I would (incorrectly) assume that a product like this would be heavily tested via AI - why not? AI should be writing all the code, so why would the humans not invest in and require extreme levels of testing since AI is really good at that?
Like Rails/DHH was one phase, Git/GitHub another.
And right now it's kinda Claude Code. But they're so obviously really bad at development that it feels like a MLM scam.
I'm just describing the feeling I'm getting, perhaps badly. I use Claude, I recommended Claude for the company I worked at. But by god they're bloody awful at development.
It feels like the point where someone else steps in with a rock solid, dependable, competitor and then everyone forgets Claude Code ever existed.
[0] https://www.reddit.com/r/LocalLLaMA/comments/1rv690j/opencod...
that #12446 PR hasn't even been resolved to won't merge and last change was a week ago (in a repo with 1.8k+ open PRs)
Must be a karmic response from “Free” /s
The choice isn't "telemetry or you're blindfolded", the other options include actually interacting with your userbase. Surveys exist, interviews exist, focus groups exist, fostering communities that you can engage is a thing, etc.
For example, I was recruited and paid $500 to spend an hour on a panel discussing what developers want out of platforms like DigitalOcean, what we don't like, where our pain points are. I put the dollar amount there only to emphasize how valuable such information is from one user. You don't get that kind of information from telemetry.
We all know it’s extremely, extremely hard to interact with your userbase.
> For example I was paid $500 an hour
+the time to find volunteers doubled that, so for $1000 an hour x 10 user interviews, a free software can have feedback from 0.001% of their users. I dislike telemetry, but it’s a lie to say it’s optional.
—a company with no telemetry on neither of our downloadable or cloud product.
On the contrary, your users will tell you what you need to know, you just have to pay attention.
> I dislike telemetry, but it’s a lie to say it’s optional.
The lie is believing it’s necessary. Software was successful before telemetry was a thing, and tools without telemetry continue to be successful. Plenty of independent developers ship zero telemetry in their products and continue to be successful.
Is Claude Code like this too? I wonder if Pi is any better.
A big downside would be paying actual cost price for tokens but on the other hand, I wouldn't be tied to Google's model backend which is also extremely flaky and unable to meet demand a lot of the time. If I could get real work done with open models (no idea if that's the case yet) and switch providers when a given provider falls over, that would be great.
I'm very happy with Pi myself (running it on a small VPS so that I don't need to do sandboxing shenanigans).
Interesting you say this because I'd say the opposite is true historically, especially in the systems software community and among older folks. "Do one thing and do it well" seems to be the prevailing mindset behind many foundational tools. I think this why so many are/were irked by systemd. On the other hand newer tools that are more heavily marketed and often have some commercial angle seem to be in a perpetual state of tacking on new features in lieu of refining their raison d'etre.
OpenCode has been much more stable for me in the 6 months or so that I’ve been comparing the two in earnest.
On top of that. Open code go was a complete scam. It was not advertised as having lower quality models when I paid and glm5 was broken vs another provider, returning gibberish and very dumb on the same prompt
That being said, I do prefer OpenCode to Codex and Claude Code.
(I'm also hating on TS/JS: but some day some AI will port it to Rust, right?)
CC I have the least experience with. It just seemed buggy and unpolished to me. Codex was fine, but there was something about it that just didn't feel right. It seemed fined for code tasks but just as often I want to do research or discuss the code base, and for whatever reason I seemed to get terse less useful answers using Codex even when it's backed by the same model.
OpenCode works well, I haven't had any issues with bugs or things breaking, and it just felt comfortable to use right from the jump.
Tbf, this seems exactly like Claude Code, they are releasing about one new version per day, sometimes even multiple per day. It’s a bit annoying constantly getting those messages saying to upgrade cc to the latest version
It's annoying how I always get that "claude code has a native installer xyz please upgrade" message
I then tried running other options like picoclaw/picocode etc but they were all really hard to manage/create
The UI/UX I want is that I can just put my free openrouter api key in and then I am ready to go to get access to free models like Arcee AI right now
After reading your comments/I read this thread, I tried crush by charmbracelet again and it gives the UI/UX that I want.
I am definitely impressed by crush/ the charm team. They are on HN and they work great for me, highly recommended if you want something which can work on low constrained devices
I do feel like Charm's TUI's are too beautiful in the sense that running a connection over SSH can delay so when I tried to copy some things, the delay made things less copy-able but overall, I think that I am using Crush and I am happy for the most part :-)
Edit: That being said, just as I was typing this, Crush took all the Free requests from Openrouter that I get for free so it might be a bit of minor issue but overall its not much of an issue from Crush side, so still overall, my point is that Crush is worth checking out
Kudos to the CharmBracelet team for making awesome golang applications!
[1] https://github.com/badlogic/pi-mono/tree/main/packages/codin...
I build VT Code with Tree-sitter for semantic understanding and OS-native sandboxing. It's still early but I confident it usable. I hope you'll give it a try.
But we did a lot of work on improving the experience, both on UX, performance, and the actual reliability of the agent itself.
I would suggest you to give it a try.
Also, non-interactive support, useful for some workflows:
Using AI to generate all your code only really makes sense if you prioritize shipping features as fast as possible over the quality, stability and efficiency of the code, because that's the only case in which the actual act of writing code is the bottleneck.
Personally, I find this idea that "coding isn't the bottleneck" completely preposterous. Getting all of the API documentation, the syntax, organizing and typing out all of the text, finding the correct places in the code base and understanding the code base in general, dealing with silly compiler errors and type errors, writing a ton of error handling, dealing with the inevitable and inoraticable boilerplate of programming (unless you're one of those people that believe macros are actually a good idea and would meaningfully solve this), all are a regular and substantial occurrence, even if you aren't writing thousands of lines of code a day. And you need to write code in order to be able to get a sense for the limitations of the technology you're using and the shape of the problem you're dealing with in order to then come up with and iterate on a better architecture or approach to the problem. And you need to see your program running in order to evaluate whether it's functionality and design a satisfactory and then to iterate on that. So coding is actually the upfront costs that you need to pay in order to and even start properly thinking about a problem. So being able to get a prototype out quickly is very important. Also, I find it hard to believe that you've never been in a situation where you wanted to make a simple change or refactor that would have resulted in needing to update 15 different call sites to do properly in a way that was just slightly variable enough or complex enough that editor macros or IDE refactoring capabilities wouldn't be capable of.
That's not to mention the fact that if agentic coding can make deploying faster, then it can also make deploying the same amount at the same cadence easier and more relaxing.
Which one you think companies prefer? Or if you're a consulting business, which one do you think your clients prefer?
I have yet to actually see a single example of the latter, though. OpenCode isn't an isolated case - every project with heavy AI involvement that I've personally examined or used suffers from serious architectural issues, tons of obvious bugs and quirks, or both. And these are mostly independent open source projects, where corporate interests are (hopefully) not an influence.
I will continue to believe it's not actually possible until I am proven wrong with concrete examples. The incentives just aren't there. It's easy to say "just mindlessly follow X principle and your software will be good", where X is usually some variation of "just add more tests", "just add more agents", "just spend more time planning" etc. but I choose to believe that good software cannot be created without the involvement of someone who has a passion for writing good software - someone who wouldn't want to let an LLM do the job for them in the first place.
That's a complete strawman of what I — or others trying to learn how to use coding agents to increase quality, like Simon Willison or the Oxide team — am saying.
> but I choose to believe that good software cannot be created without the involvement of someone who has a passion for writing good software - someone who wouldn't want to let an LLM do the job for them in the first place.
This is just a no true Scotsman. I prefer to use coding agents because they don't forget details, or get exhausted, or overwhelmed, or lazy, or give up, ever — whereas I might. Therefore, they allow me to do all of the things that improve code and software quality more extensively and thoroughly, like refactors, performance improvements, and tests among other things (because yes, there is no single panacea). Furthermore, I do still care about the clarity, concision, modularity, referential transparency, separation of concerns, local reasonability, cognitive load, and other good qualities of the code, because if those aren't kept up a) I can't review the code effectively or debug things as easily when they go wrong, b) the agent itself will struggle to male changes without breaking other things, and struggle to debug, c) those things often eventually effect the quality of the end state software.
Additionally, what you say is empirically false. Many people who do deeply value quality software and code quality, such as the creators of Flask, Redis, and SerenityOS/Ladybird, all use and value agentic coding.
Just because you haven't seen good quality software with a large amount of agentic influence doesn't mean it isn't possible. That's very close minded.
To change that, you need to set a custom "small model" in the settings.
https://old.reddit.com/r/LocalLLaMA/comments/1rv690j/opencod...
They also don't let you run all local models, but specific whitelisted by another 3rd party: https://github.com/anomalyco/opencode/issues/4232
Everything you read on the internet seems exaggerated today. Especially true for reddit, and especially especially true for r/LocalLllama which is a former shadow of itself. Today it's mostly sockpuppets pushing various tools and models, and other sockpuppets trying to push misinformation about their competitors tools/models.
Imagine someone using it at work, where they are only allowed to use a GitHub Copilot Business subscription (which is supported in OpenCode). Now they have sent proprietary code to a third party, and don't even know they're doing it.
I really like how their subagents work, as a bonus I get to choose which model is in which agent. Sadly I have to resort to the mess that Anthropic calls Claude Code
From what I've heard, the metrics used by Anthropic to detect unauthorized clients is pretty easy to sidestep if you look at the existing solutions out there. Better than getting your account banned.
The belief is that the subscriptions are subsidized by them (or just heavily cut into profit margins) so for whatever reason they're trying to maintain control over the harness - maybe to gather more usage analytics and gain an edge over competitors and improve their models better to work with it, or perhaps to route certain requests to Haiku or Sonnet instead of using Opus for everything, to cut down on the compute.
Given the ample usage limits, I personally just use Claude Code now with their 100 USD per month subscription because it gives me the best value - kind of sucks that they won't support other harnesses though (especially custom GUIs for managing parallel tasks/projects). OpenCode never worked well for me on Windows though, also used Codex and Gemini CLI.
You can point Claude Code at a local inference server (e.g. llama.cpp, vLLM) and see which model names it sends each request to. It's not hard to do a MITM against it either. Claude Code does send some requests to Haiku, but not the ones you're making with whatever model you have it set to - these are tool result processing requests, conversation summary / title generation requests, etc - low complexity background stuff.
Now, Anthropic could simply take requests to their Opus model and internally route them to Sonnet on the server side, but then it wouldn't really matter which harness was used or what the client requests anyway, as this would be happening server-side.
Actually curious to hear what others think about why Anthropic is so set on disallowing 3rd party tools on subscriptions.
So they have to move up the stack to higher margin business solutions. Which is why they offer subsidized subscription plans in the first place. It’s a marketing cost. But they want those marketing dollars to drive up the stack not commodity inference use cases.
When setting your token limits, their economics calculations likely assume that those optimizations are going to work. If you're using a different agent, you're basically underpaying for your tokens.
Build the single pane of glass everyone uses. Offer it under cost. Salt the earth and kill everything else that moves.
Nobody can afford to run alternative interfaces, so they die. This game is as old as time. Remember Reddit apps? Alternative Twitter clients?
In a few years, CC will be the only survivor and viable option.
It also kneecaps attempts to distill Opus.
API = way more expensive, allowed to use on your terms without anthropic hindering you.
One-price-per-month subscriptions (Claude Code Pro/MAX @ $20/$100/$200 a month) use a different authentication mechanism, OAUTH. The useful difference is you get a lot more inference than you can for the same cost using the API but they require you to use Claude Code as a client.
Some clients have made it simple to use your subscription key with them and they are getting cease and desist letters.
(Ok, technically o1-pro is even more expensive, but I'm assuming that's a "please move on" pricing)
I'd rather switch to OpenAI than give up my favorite harness.
For me it's $0.8/kWh during peak, $0.47 off peak, and super off peak of $0.15. I accidentally left a little mini 500W heater on all day, while I was out, costing > 5% of your whole month!
Musk was the largest individual political donor of the 2024 election [1] and Greg Brockman was the largest donor to Trump's "MAGA Inc" super PAC [2]
[1] https://www.washingtonpost.com/technology/2024/12/06/elon-mu...
[2] https://www.theverge.com/ai-artificial-intelligence/867947/o...
EDIT: The system I bought last summer for $1980 and just took delivery of in October, Beelink GTR 9 Pro, is now $2999.... wow...
https://www.youtube.com/live/z0JYVTAqeQM?si=oLvyLlZiFLTxL7p0
Long tool outputs/command outputs everything in my harness is spilled over to the filesystem. Context messages are truncated and split to filesystem with a breadcrumb for retrieving the full message.
Works really well.
Assuming you pay per token, which seems like a really strange workflow to lock yourself into at this point. Neither paid monthly plans nor local models suffer from that issue.
I tried once to use APIs for agents but seeing a counter of money go up and eventually landing at like $20 for one change, made it really hard to justify. I'd rather pay $200/month before I'd be OK with that sort of experience.
Pi is refreshingly minimal in terms of system prompts, but still works really well and that makes me wonder whether other harnesses are overdoing. Look at OpenCode's prompts, for instance - long, mostly based on feels and IMO unnecessary. I would've liked to just overwrite OC's system prompts with Pi's (to get other features that Pi doesn't have) but that isn't possible today (without maintaining a custom fork)
JS is not something that was developed with CLI in mind and on top of that that language does not lend itself to be good for LLM generation as it has pretty weak validation compared to e.g. Rust, or event C, even python.
Not to mention memory usage or performance.
I can’t find the tweet from Mario (the author), but he prefers the Typescript/npm ecosystem for non-performance critical systems because it hits a sweet spot for him. I admire his work and he’s a real polyglot, so I tend to think he has done his homework. You’ll find pi memory usage quite low btw.
Also python ones would also allow self modifying. I'm always puzzled (and worried) when JS is used outside of browsers.
I'm biased as I find JS/TS rather ugly language compared to anything other basically (PHP is close second). Python is clean, C has performance, Rust is clean and has performance, Java has the biggest library and can run anywhere.
The simplicity of extending pi is in itself addictive, but even in its raw form it does the job well.
Before finding pi I had written a lot of custom stuff on top of all the provider specific CLI tools (codex, Claude, cursor-agent, Gemini) - but now I don’t have to anymore (except if I want to use my anthropic sub, which I will now cancel for that exact reason)
I’m sure there’s a more elegant way to say this, but OpenCode feels like an open source Claude Code, while pi feels like an open source coding agent.
I used it recently inside a CI workflow in GitLab to automatically create ChangeLog.md entries for commits. That + Qwen 3.5 has been pretty successful. The job starts up Pi programatically, points it at the commits in question, and tells it to explore and get all the context it needs within 600 seconds... and it works. I love that this is possible.
What I'd want to see from any of these tools is a clear permissions model — which files the agent can read vs write, whether it can execute commands, and an audit log of what it actually did. Claude Code's hooks system at least gives you deterministic guardrails before/after agent actions, but it's still early days for this whole category.
> they need broad file system access to be useful, but that access surface is also the attack surface
Do they? You give them access to one directory typically (my way is to create a temporary docker container that literally only has that directory available, copied into the container on boot, copied back to the host once the agent completed), and I don't think I've needed them to have "broad file system access" at any point, to be useful or otherwise.
So that leads me to think I'm misunderstanding either what you're saying, or what you're doing?
OpenCode has no sandboxing, as far as I know.
That makes Codex a much better choice for security.
I sprinkle in some billed API usage to power my task-planner and reviewer subagents (both use GPT 5.4 now).
The ability to switch models is very useful and a great learning experience. GLM, Kimi and their free models surprised me. Not the best, not perfect, but still very productive. I would be a wary shareholder if I owned a stake in the frontier labs… that moat seems to be shrinking fast.
It's been a moving target for years at this point.
Both open and closed source models have been getting better, but not sure if the open source models have really been closing the gap since DeepSeek R1.
But yes: If the top closed source models were to stop getting better today, it wouldn't take long for open source to catch up.
The big expensive models are great at planning tasks and reviewing the implementation of a task. They can better spot potential gotchas, performance or security gaps, subtle logic and nuance that cheaper models fail to notice.
The small cheap models are actually great (and fast) at generating decent code if they have the right direction up front.
So I do all the spec writing myself (with some LLM assistance), and I hand it to a Supervisor agent who coordinates between subagents. Plan -> implement -> review -> repeat until the planner says “all done”.
I switch up my models all the time (actively experimenting) but today I was using GPT 5.4 for review and planning, costing me about $0.4-$1 for a good sized task, and Kimi for implementation. Sometimes my spec takes 4-5 review loops and the cost can add up over an 8 hour day. Still cheaper than Claude Max (for now, barely).
Each agent retains a fairly small context window which seems to keep costs down and improves output. Full context can be catastrophic for some models.
As for the spec writing, this is the fun part for me, and I’ve been obsessing over this process, and the process of tracking acceptance criteria and keeping my agents aligned to it. I have a toolkit cooking, you can find in my comment history (aiming to open source it this week).
I'm building a full stack web app, simple but with real API integrations with CC.
Moving so fast that I can barely keep a hold on what I'm testing and building at the same time, just using Sonnet. It's not bad at all. A lot of the specs develop as I'm testing the features, either as an immediate or a todo / gh issue.
How can you manage an agentic flow?
I used Claude with paid subscription and codex as well and settled to OpenCode with free models.
I feel that if you want to build a coding agent / harness the first thing you should do is to build an evaluation framework to track performance for coding by having your internal metrics and task performance, instead I see most coding agents just fiddle with adding features that don't improve the core ability of a coding agent.
I considered creating a PR for that, but found that creating new agents instead worked fine for me.
Now I just started looking into OpenCode yesterday, but seems you can override the system prompts by basically overloading the templates used in for example `~/.opencode/agents/build.md`, then that'd be used instead of the default "Build" system prompt.
At least from what I gathered skimming the docs earlier, might not actually work in practice, or not override all of it, but seems to be the way it works.
The changes I've made locally are:
- Added a discuss mode with almost on tools except read file, ask tool, web search only based no heuristics + being able to switch from discuss to plan mode.
Experiments:
- hashline: it doesn't bring that much benefit over the default with gpt-5.4.
- tried scribe [0]: It seems worth it as it saves context space but in worst case scenarios it fails by reading the whole file, probably worth it but I would need to experiment more with it and probably rewrite some parts.
The nice thing about opencode is that it uses sqlite and you can do experiments and then go through past conversation through code, replay and compare.
It started getting increasingly flaky with Anthropic's API recently, so I switched back to Claude Code for a couple of days. Oh my, what a night and day difference. Tokens, MCP use, everything.
For anyone reading at OpenAI, your support for OpenCode is the reason I now pay you 200 bucks a month instead.
But I don't use MCP, don't need anything complicated, and not sure what OpenCode actually offers on top. The UI is slightly nicer (but oh so much heavier resource usage), both projects source code seems vibecoded and the architecture is held together with hopes and dreams, but in reality, minor difference really.
Also, didn't find a way in OpenCode to do the "Fast Mode" that Codex has available, is that just not possible or am I missing some setting? Not Codex-Spark but the mode that toggles faster inference.
I'm guessing that a model which only covers a single language might be more compact and efficient vs a model trained across many languages and non-programming data.
If you want it to stick to better practices you have to write skills, provide references (example code it can read), and provide it with harnessing tools (linters, debuggers, etc) so the agent can iterate on its own output.
The OpenCode docs suggest its possible, but it only works with their extension (not in an already open VS Code terminal) with a very specific keyboard shortcut and only barely at that.
- With love The Official Pink Eye #ThereIsNoOther
I use it with Qwen 3.5 running locally when my daily limits run out on my other subscriptions.
The harness is great. Local models are just slow enough that the subscription models are easier to use. For most of my tasks these days, the model's capability is sufficient; it is just not as snappy.
I just did a one hour vibe session today, ripping out a library dependency and replacing it with another and pushing the library to pypi. I should take my task list and let the local model replicate the work and see how it works out.
Hugely grateful for what they do.
What caused the switch was that we're building AI solutions for sometimes price-conscious customers, so I was already familiar with the pattern of "Use a superior model for setting a standard, then fine-tuning a cheaper one to do that same work".
So I brought that into my own workflows (kind of) by using Opus 4.6 to do detailed planning and one 'exemplar' execution (with 'over documentation' of the choices), then after that, use Opus 4.6 only for planning, then "throw a load of MiniMax M2.5s at the problem".
They tend to do 90% of the job well, then I sometimes do a final pass with Opus 4.6 again to mop up any issues, this saves me a lot of tokens/money.
This pattern wasn't possible with Claude Code, thus my move to Open Code.
Edit: it's not. https://github.blog/changelog/2026-01-16-github-copilot-now-...
They must be eating insane amounts of $$$ for this. I wouldn't expect it to last
See https://models.dev for a comparison against the normal "vanilla" API.
They shouldn't, as long as your terminal emulator doesn't. Why do you think it's Wayland related?
It works perfectly fine on Niri, Hyprland and other Wayland WMs.
What problem do you have?
I didn't dig further
Seems like there's many github issues about this actually
https://github.com/anomalyco/opencode/issues/14336
If you respond twice to their theme query probes, the whole thing bricks. Or if you're slightly out of order. It's very delicate.
And then the official docs: https://opencode.ai/docs/troubleshooting/#linux-wayland--x11...
> Linux: Wayland / X11 issues
> On Linux, some Wayland setups can cause blank windows or compositor errors.
> If you’re on Wayland and the app is blank/crashing, try launching with OC_ALLOW_WAYLAND=1.
> If that makes things worse, remove it and try launching under an X11 session instead.
OC_ALLOW_WAYLAND=1 didn't work for me (Ubuntu 24.04)
Suggesting to use a different display server to use a TUI (!!) seems a bit wild to me. I didn't put a lot of time into investigating this so maybe there is another reason than Wayland. Anyway I'm using Pi now
Even as a CC user I’m glad someone is forcing the discussion.
My prediction: within two years ‘model neutrality’ will be a topic of debate. Creating lock-in through discount pricing is anti-competitive. The model provider is the ISP; the tool, the website.
That is not the point. That is a mere technicality.
You signed a contract. If you don't ignore the terms of the contract to use the product in a way that is explicitly prohibited, you're abusing the product. It is as simple as that.
They offer a separate product (API) if you don't like the terms of the contract.
Also, if you really want to get technical: the limits are under the assumption that caching works as intended, which requires control of the client. 3P clients suck at caching and increase costs. But that is not the overarching point.
> Creating lock-in through discount pricing is anti-competitive.
Literally everyone does this. OpenAI is doing this with Codex, far more than Anthropic is. It's not great but players much bigger than Anthropic are using discount pricing to create an anti-competitive advantage.
And yet, OpenAI have publicly said they welcome OpenCode users to use their subscription package. So how are they being anti-competitive "far more" than Anthropic?
Reading through his X comments and GitHub comments he is behaving immaturely. I don't trust what he's saying here. Ripping out Claude API support was just throwing a tantrum. Weird given his age - he's old enough to be more mature.
I'm not a US citizen, so both companies are the same, as far as I'm concerned.
Still, I feel like "will commit illegal mass murder against their own citizens" is a significant enough degree more evil. I think lots of corporations will help their government murder citizens of other countries, but very few would go so far as to agree to murder their own (fellow) citizens ... just to get a juicy contract.
We also have taboos against betraying/murdering/whatever people of other tribes, but those taboos are much weaker and get relaxed sometimes (eg. in war). My point is, it takes significantly more anti-social (ie. evil) behavior to betray your own tribe, in the deepest way possible, than it does to do horrible things to other tribes.
This is just as much true for Russians murdering Ukranians as Ukranians murdering Russians, or any other conflict group: almost all Russians would consider a Russian who helps kill Russians to be more evil than a Russian who kills Ukranians (and vice versa).
https://www.washingtonpost.com/technology/2026/03/04/anthrop...
Many folks from other tools are only getting exposed to the same functionality they got used to, but it offers much more than other harnesses, especially for remote coding.
You can start a service via `opencode serve`, it can be accessed from anywhere and has great experience on mobile except a few bugs. It's a really good way to work with your agents remotely, goes really well with TailScale.
The WebUI that they have can connect to multiple OpenCode backends at once, so you may use multiple VPS-es for various projects you have and control all of them from a single place.
Lastly, there's a desktop app, but TBH I find it redundant when WebUI has everything needed.
Make no mistakes though, it's not a perfect tool, my gripes with it:
- There are random bugs with loading/restoring state of the session
- Model/Provider selection switch across sessions/projects is often annoying
- I had a bug making Sonnet/Opus unusable from mobile phone because phone's clock was 150ms ahead of laptop's (ID generation)
- Sometimes agent get randomly stuck. It especially sucks for long/nested sessions
- WebUI on laptop just completely forgot all the projects at one day
- `opencode serve` doesn't pick up new skills automatically, it needs to be restarted
- GH copilot API is a first class citizen with access to multiple providers’ models at a very good price with a pro plan - no terminal flicker - it seems really good with subagents - I can’t see any terminal history inside my emacs vterm :(
There's also a request and a PR to add such option but it was closed due to "not adhering to community standards"
Curious how the context window management works in practice. With large repos, the "what files to include" problem tends to dominate — does it have a strategy beyond embedding-based retrieval, or is that the main approach here?
I wonder why did they use Typescript and not a more resource efficient language like C, C++, Rust, Zig?
Since their code is generated by AI human preferences shouldn't matter much and AI is happy to work with any language.
I really should look into more "native" Emacs options as I find using vterm a bit of a clunky hack. But I'm just not that excited about this stuff right now. I use it because I'm lazy, that's all. Right now I'm actually getting into woodwork.
There is nothing open about it. Please do not abuse the term "open" like in OpenBSD.
> Please do not abuse the term "open" like in OpenBSD.
this is such a pet peeve of mine; all these "open" products (except when they're not)IMO, the web UI is a killer feature - it’s got just enough to be an agent manager - without any fluff. I run it on my remote VMs and connect over HTTP.
https://www.reddit.com/r/LocalLLaMA/comments/1rv690j/opencod...
?
> there isnt any telemetry, the open telemetry thing is if you want to get spans like the ai sdk has spans to track tokens and stuff but we dont send them anywhere and they arent enabled either
> most likely these requests are for models.dev (our models api which allows us to update the models list without needing new releases)
> opencode will proxy all requests internally to https://app.opencode.ai
> There is currently no option to change this behavior, no startup flag, nothing. You do not have the option to serve the web app locally, using `opencode web` just automatically opens the browser with the proxied web app, not a true locally served UI.
> https://github.com/anomalyco/opencode/blob/4d7cbdcbef92bb696...
OpenCode just has more bugs, it's incredibly derivative so it doesn't really do anything else than Codex.
The advantage of OpenCode is that it can use any underlying model, but that's a disadvantage because it breaks the native integration. If you use Opus + Claude Code, or Gpt-Codex + Codex App, you are using it the way it was designed to be used.
If you don't actually use different models, or plan to switch, or somehow value vendor neutrality strategically, you are paying a large cost without much reward.
This is in general a rule, vendor neutrality is often seen as a generic positive, but it is actually a tradeoff. If you just build on top of AWS for example, you make use of it's features and build much faster and simpler than if you use Terraform.
I only boot my windows 11 gaming machine for drm games that don’t work with proton. Otherwise it’s hot garbage
I do not understand the insistence on using JavaScript for command line tools. I don't use rust at all, but if I'm making a vibe coded cli I'm picking rust or golang. Not zig because coding agents can't handle the breaking changes. What better test of agentic coders' conviction in their belief in AI than to vibe a language they can't read.
At least you can easily turn off telemetry in Claude Code - just set CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC to 1.
You can use Claude Code with llama.cpp and vLLM, too right out of the box with no additional software necessary, just point ANTHROPIC_BASE_URL at your inference server of choice, with any value in ANTHROPIC_API_KEY.
Some people think that Anthropic could disable this at any time, but that's not really true - you can disable automatic updates and back up and reuse native Claude Code binaries, ensuring Anthropic cannot change your existing local Claude Code binary's behavior.
With that said, I like the idea of an open source TUI agent that won't spy on me without my consent and no way to disable it much better than a closed source TUI agent that I can effectively neuter telemetry on, but sadly, OpenCode is not the former. It's just another piece of VC-funded spyware that's destined for enshittification.
¹https://github.com/anomalyco/opencode/blob/4d7cbdcbef92bb696...
You'd be surprised how useless datasets become with like 10% garbage data when you don't know which data is garbage