On MacOS is much better. But most of the team either ended up with locked in Mac-only or go cross platform with Electron.
You don't need to use microsoft's or apple's or google's shit UI frameworks. E.g. see https://filepilot.tech/
You can just write all the rendering yourself using metal/gl/dx. if you didn't want to write the rendering yourself there are plenty of libraries like skia, flutter's renderer, nanovg, etc
You will be outcompeted if you waste your time reinventing the wheel and optimizing for stuff that doesn't matter. There is some market for highly optimized apps like e.g. Sublime Text, but you can clearly see that the companies behind them are struggling.
When was the last time complaining about this did anything?
I see complains about RAM and slugginess against Slack and countless others Electron apps every fucking day, same as with Adobe forcing web rendered UI parts in Photoshop, and other such cases. Forums are full of them, colleagues always complain about it.
They do, but they don't know what's causing it. 8GB of RAM usage for Codex App is clown-level ridiculous.
I use web-tech apps because I have to, and because they're adequate, not because it's an optimal user experience.
> You will be outcompeted if you waste your time reinventing the wheel and optimizing for stuff that doesn't matter. There is some market for safe, environmentally-friendly products, but you can clearly see that the companies that make them are struggling.
ok.
I insist on good UI as well, and, as a web developer, have spent many hours hand rolling web components that use <canvas>. The most complicated one is a spreadsheet/data grid component that can handle millions of rows, basically a reproduction of Google Sheets tailored to my app's needs. I insist on not bloating the front-end package with a whole graph of dependencies. I enjoy my NIH syndrome. So I know quality when I see it (File Pilot). But I also know how tedious reinventing the wheel is, and there are certain corners that I regularly cut. For example there's no way a blind user could use my spreadsheet-based web app (https://github.com/glideapps/glide-data-grid is better than me in this aspect, but there's no way I'm bringing in a million dependencies just to use someone else's attempt to reinvent the wheel and get stuck with all of their compromises).
The answer to your original question about why these billion dollar companies don't create artisanal software is pretty straightforward and bleak, I imagine. But there are a few actually good reasons not to take the artisanal path.
That's only for Windows though, it seems? Maybe the whole "just write all the rendering yourself using metal/gl/dx" is slightly harder than you think.
Even though OpenAI has a lot of cash to burn, they're not in a good position now and getting butchered by Anthropic and possibly Gemini later.
If any major player in this AI field has the power to do it's probably Google. But again, they've done the Flutter part, and the result is somewhat mixed.
At the end of the day, it's only HN people and a fraction of Redditors who care. Electron is tolerated by the silent majority. Nice native or local-first alternatives are often separate, niche value propositions when developers can squeeze themselves in over-saturated markets. There's a long way before the AI stuff loses novelty and becomes saturated.
(I was a swing developer for several years)
Use native for osx Use .Net framework for windows Use whatever on Linux.
Its just being lazy and ineffective. I also do not care about whatever "business" justification anyone can come up with for half assing it.
May be an app that is as complex as Outlook needs the pixel-perfect tweaking of every little button that they need to ship their own browser for exact version match. But everything else can use *system native browser*. Use Tauri or Wails or many other solutions like these
That said, I do agree on the other comments about TUIs etc. Yes, nobody cares about the right abstractions, not even the companies that literally depend on automating these applications
It would be nice if someone made a way to write desktop apps in JavaScript with a consistent, cross-platform modern UI (i.e. swipe to refresh, tabs, beautiful toggle switches, not microscopic check boxes) but without resorting to rendering everything inside a bloated WebKit browser.
- UE5 has its own custom UI framework, which definitely does not feel "native" on any platform. Not really any better than Electron.
- You can easily call native APIs from Electron.
I agree that Electron apps that feel "web-y" or hog resources unnecessarily are distasteful, but most people don't know or care whether the apps they're running use native UI frameworks, and being able to reassign web developers to work on desktop apps is a significant selling point that will keep companies coming back to Electron instead of native.
A full fledged app, that does everything I want, is ~ 10MB. I know Tauri+Rust can get it to probably 1 MB. But it is a far cry from these Electron based apps shipping 140MB+ . My app at 10MB does a lot more, has tons of screens.
Yes, it can be vibe coded and it is especially not an excuse these days.
Microsoft Teams, Outlook, Slack, Spotify? Cursor? VsCode? I have like 10 copies of Chrome in my machine!
One of Electron's main selling points is that you control the browser version. Anything that relies on the system web view (like Tauri and Wails) will either force you to aggressively drop support for out-of-date OS versions, or constantly check caniuse.com and ship polyfills like you're writing a normal web app. It also forces you to test CSS that touches form controls or window chrome on every supported major version of every browser, which is just a huge pain. And you'll inevitably run into bugs with the native -> web glue that you wouldn't hit with Electron.
It is absolutely wasteful to ship a copy of Chrome with every desktop app, but Tauri/Wails don't seem like viable alternatives at the moment. As far as I can tell, there aren't really any popular commercial apps using them, so I imagine others have come to the same conclusion.
But sure, you could have some specific need, but I find it hard to believe for these simple apps.
Even a full-featured TUI like Claude Code is highly limited compared to a visual UI. Conversation branching, selectively applying edits, flipping between files, all are things visual UI does fine that are extremely tedious in TUI.
Overall it comes down to the fact that people have to use TUI and that’s more important than it being easy to train, and there’s a reason we use websites and not terminals for rich applications these days.
c Do this programming task for me.
Right in the shell.That is not correct. One of the major selling points of Electron is precisely that you can call native APIs.
This is what you get when you build with AI, an electron app with an input field.
I guess you get an Electron app if you don't prompt it otherwise. Probably because it's learned from what all the humans are putting out there these days.
That said.. unless you know better, it's going to keep happening. Even moreso when folks aren't learning the fundamentals anymore.
This is just bad product management.
All I see is hype blog posts and pre-IPO marketing by AI companies, not much being shipped though.
I've got a medical doctor handwriting decipherer, a board game simulator that takes a PDF of the rulebooks as input and an accounting/budgeting software that can interface with my bank via email because my bank doesn't have an API.
None of that is of any use to you. If you happen to need a similar software, it will be easier for you to ask your own AI to make a custom one for you rather than adapt the ones I had my AI make for me.
Under the circumstances, I would feel bad shipping anything. My users would be legitimately better off just vibe coding their own versions.
Edit: I'm not going to keep addressing your comment if you keep editing it. You asked for an example & I found two very easily. I am certain there are many others so at this point the onus is on you to figure out what exactly it is you are actually arguing.
The second example is twitter post of a crypto bro asking people to build something using his crypto API. Nothing shipped.
Literally nothing shipped, just twitter posts of people selling a coding bootcamp and crypto.
Would genuinely love your thoughts if you try it. Early users have been surprised by how native it feels!
Shock horror, the waste adds up, and it adds up extremely quickly.
E.g. just say "write a c++ gui widget library using dx11 and win32 and copy flutters layout philosophy, use harfbuzz for shaping, etc etc"
LLM output is called slop for a reason.
No-one outside of a small sliver of the tech community cares if an app is built with web tech
Electron also opens up easier porting to Linux which almost certainly wouldn't happen if companies insist on native only
Atlassian products are a great example of this. Everyone knows Atlassian has garbage performance. Everyone complains about it. Never gets fixed though. Everyone I know could write customer complaints about its performance in every feedback box for a year, and the only thing that would happen is that we’d have wasted our own time.
Users _care_ about this stuff. They just aren’t empowered to feedback about it, or are trained to just sigh and put up with it.
For example, I tried opening a 200MB log file in Apple's Console.app and it hung. Opened right up in VS Code.
It's slow and stupid. It does not do proper research. It does not follow instructions. It randomly decides to stop being agentic, and instead just dumps the code for me to paste. It has the extremely annoying habit of just doing stuff without understanding what I meant, making a mess, then claiming everything is fine. The outdated training data is extremely annoying when working with Nuxt 4+. It is not creative at solving problems. It dosent show the thinking. The Undo code does not give proper feedback on the diff and if it actually did "undo." And I hate the personality. It HAS to be better than it comes off for me because I am actually in a bad mood after having worked with it. I would rather YOLO code with Gemini 3 flash, since it's actually smarter in my assessment, and at least I can iterate faster, and it feels like it has better common sense.
Just as an example, I found an old, terrible app I made years ago for our firm that handles room reservations. I told it to update from Bootstrap to Flowbite UI. Codex just took forever to make a mess, installed version 2.7 when 4.0.1 is the latest, even when I explicitly stated that it should use the absolute latest version. Then it tried to install it and failed, so it reverted to the outdated CDN.
I gave the same task to Claude Code. Same prompt. It one-shotted it quickly. Then I asked it to swap out ALL the fetch logic to have SPA-like functionality with the new beta 4 version of HTMX, and it one-shot that too in the time Codex spent just trying to read a few files in the project.
This reminds me of the feeling I had when I got the Nokia N800. It was so promising on paper, but the product was so bad and terrible to use that I knew Nokia was done for. If this was their take on what an acceptable smartphone could be, it proves that the whole foundation is doomed. If this is OpenAI's take on what an agentic coding assistant should be—something that can run by itself and iterate until it completes its task in an intelligent and creative way.... OpenAI is doomed.
That being said, the app is stuck at the launch screen, with "Loading projects..." taking forever...
Edit: A lot of links to documentation aren't working yet. E.g.: https://developers.openai.com/codex/guides/environments. My current setup involves having a bunch of different environments in their own VMs using Tart and using VS Code Remote for each of them. I'm not married to that setup, but I'm curious how it handles multiple environments.
Edit 2: Link is working now. Looks like I might have to tweak my setup to have port offsets instead of running VMs.
I have yet to hit usage limits with Codex. I continuously reach it with Claude. I use them both the same way - hands on the wheel and very interactive, small changes and tell them both to update a file to keep up with what’s done and what to do as I test.
Codex gets caught in a loop more often trying to fix an issue. I tell it to summarize the issue, what it’s tried and then I throw Claude at it.
Claude can usually fix it. Once it is fixed, I tell Claude to note in the same file and then go back to Codex
[0]: http://theoryofconstraints.blogspot.com/2007/06/toc-stories-...
But doing that with AI feels like hiring an outsourcing firm for a project and they come back with an unmaintable mess that’s hard to reason through 5 weeks later.
I very much micro manage my AI agents and test and validate its output. I treat it like a mid level ticket taker code monkey.
I’m not fully sure what’s worse, something close to garbage with a short shelf life anyone can see, or something so close to usable that it can fully bite me in the ass…
That’s how I used to deal with L4, except codex codes much faster (but sometimes in the wrong direction)
1. I like being hands on keyboard and picking up a slice of work I can do by myself with a clean interface that others can use - a ticket taking code monkey.
2. I like being a team lead /architect where my vision can be larger than what I can do in 40 hours a week even if I hate the communication and coordination overhead of dealing with two or three other people
3. I love being able to do large projects by myself including dealing with the customer where the AI can do the grunt work I use to have to depend on ticket taking code monkeys to do.
Moral of the story: if you are a ticket taking “I codez real gud” developer - you are going to be screwed no matter how many b trees you can reverse on the whiteboard
Each and everyone of us is able to write their own story, and come up with their own 'Moral'.
Settling for less (if AI is a productivity booster, which is debatable) doesn't equal being screwed. There is wisdom in reaching your 'enough' point.
By definition, this is the worse AI coding will ever be and it’s pretty good now.
From all the data I have seen, the software industry is poised for a lot more growth in the foreseeable future.
I wonder if we are experiencing a local minima, on a longer upward trend.
Those that do find a job in a few days aren't online to write about it, so based on what is online we are lead to believe that it's all doom and gloom.
We also come out of a silly growth period where anyone who could sort a list and build a button in React would get hired.
My point is not that AI-coding is to be avoided at all costs, it's more about taming the fear-mongering of "you must use AI or will fall behind". I believe it's unfounded - use it as much or as little as you feel the need to.
P.S.: I do think that for juniors it's currently harder and require intentional efforts to land that first job - but that is the case in many other industries. It's not impossible, but it won't come on a silver plate like it did 5-7 years ago.
I was once minimising the changes and trying to take the max of it. I did an uncountable numbers of tests and and variations. Didn't really matter much if I told it to do it all or change one line. I feel Claude code tries to fill the context as fast as possible anyway
I am not sure how worth Claude is right now. I still prefer that rather than codex, but I am starting to feel that's just a bias
Codex and Gemini are both good, but slower and less “smart” when it comes to our code base
Most of my tokens are used arguing with the hallucinations.
I’ve given up on it.
I find it quite hard to hit the limits with Claude Code, but I have several colleagues complaining a lot about hitting limits and they use Cursor. Recently they also seem to be dealing with poor results (context rot?) a lot, which I haven't really encountered yet.
I wonder if Claude Code is doing something smart/special
Codex at least 'knows' to give up in half the time and 1/10th of the limits when that happens.
https://hyperengineering.bottlenecklabs.com/p/the-infinite-m...
I thought Codex team tweeted about something coming for Xcode users - but maybe it just meant devs who are Apple users, not devs working on Apple platform apps...
But overall it does seem to be consistently improving. Looking to see how this makes it easier to work with.
BTW OpenAI should think a bit about polishing their main apps instead of trying to come out with new ones while the originals are still buggy.
Ie. I think the codex webapp on a self-hosted machine would be great. This is impotant when you need a beefier machine (with potentially a GPU).
We built the Codex app to make it easier to run and supervise multiple agents across projects, let longer-running tasks execute in parallel, and keep a higher-level view of what’s happening. Would love to hear your feedback!
I know coding on a phone sounds stupid, but with an agent it’s mostly approvals and small comments.
Here's the Codex tech stack in case anyone was interested like me.
Framework: Electron 40.0.0
Frontend:
- React 19.2.0
- Jotai (state management)
- TanStack React Form
- Vite (bundler)
- TypeScript
Backend/Main Process:
- Node.js
- better-sqlite3 (local database)
- node-pty (terminal emulation)
- Zod (validation)
- Immer (immutable state)
Build & Dev:
- pnpm (package manager)
- Electron Forge
- Vitest (testing)
- ESLint + Prettier
Native/macOS:
- Sparkle (auto-updates)
- Squirrel (installer)
- electron-liquid-glass (macOS vibrancy effects)
- Sentry (error tracking)
The git and terminal views are a big plus for me. I usually have those open and active in addition to my codex CLI sessions.
Excited to try skills, too.
Begs the question if Anthropic will follow up with a first-class Claude Code "multi agent" (git worktree) app themselves.
I ended up building a terminal[0] with Tauri and xterm that works exactly how I want.
0 - screenshot: https://x.com/thisritchie/status/2016861571897606504?s=20
Using slash commands and agents has been a game changer for me for anything from creating and executing on plans to following proper CI/CD policies when I commit changes.
To Codex more generally, I love it for surgical changes or whenever Claude chases its tail. It's also very, very good at finding Claude's blindspots on plans. Using AI tools adversarially is another big win in terms of getting things 90% right the first time. Once you get the right execution plan with the right code snippets, Claude is essentially a very fast typer. That's how I prefer to do AI-assisted development personally.
That said, I agree with the comments on tokens. I can use Codex until the sun goes down on $20/month. I use the $200/month pro plan with Claude and have only maxxed out a couple times, but I do find the volume to quality to be better with Claude. So far it's worth the money.
What I like is that the sessions are highly configurable from their plan.md which translates a md document into a process. So you can tweak and add steps. This is similar to some of the other workflow tools I've seen around hooks and such -- but presented in a way that is easy for me to use. I also like that it can update the plan.md as it goes to dynamically add steps and even add "hooks" as needed based on the problem.
I think these subtle issues are just harder to provide a "harness" for, like a compiler or rigorous test suite that lets the LLM converge toward a good (if sometimes inelegant) solution. Probably a finer-tuned QA agent would have changed the final result.
I wonder what it was doing with all those tokens?
From a developer's perspective it makes sense, though. You can test experimental stuff where configurations are almost the same in terms of OS and underlying hardware, so no weird, edge-case bugs at this stage.
I created this using PLANS.md and it basically replicates a kanban/scrum process with gated approvals per stage, locked artifacts when it moves to next stage, etc. It works very well and it doesnt need a UI. Sure, I could have several agents running at the same time, but I believe manual QA is key to keeping the codebase clean, so time spent on this today means that future requirements can be implemented 10x faster than with a messy codebase.
But what is your concept of "stages"? For me, the spec files are a MECE decomposition, each file is responsible for its unique silo (one file owns repo layout, etc), with cross references between them if needed to eliminate redundancy. There's no hierarchy between them. But I'm open to new approaches.
So many of the things that pioneered the way for the truly good (Claude, Gemini) to evolve. I am thankful for what they have done.
But the quality is gone, and they are now in catch-up mode. This is clear, not just from the quality of GPT-5.x outputs, but from this article.
They launch something new, flashy, should get the attention of all of us. And yet, they only launch to Apple devices?
Then, there are typos in the article. Again. I can't believe they would be sloppy about this with so much on the line. EDIT: since I know someone will ask, couple of examples - "7MM Tokens", "...this prompt initial prompt..."
And why are they not giving the full prompt used for these examples? "...that we've summarized for clarity" but we want to see the actual prompt. How unclear do we need to make our prompts to get to the level that you're showing us? Slight red flag there.
Anyway, good luck to them, and I hope it improves! Happy to try it out when it does, or at the very least, when it exists for a platform I own.
Claude yes, but Codex is much better than Gemini in every way that matters except speed in my experience.
Gemini 3 Flash is an amazing model, but Gemini 3 Pro isn't great. It can do good work, but it's pretty random if it will or it will go off the rails and do completely the wrong thing. OTOH GPT 5.2 Codex with high thinking is the best model currently available (slightly better than Opus 4.5)
Codex gets complex tasks right and I don't keep hitting usage limits constantly. (this is comparing the 20$ ChatGPT to the 200$ Claude Pro Max plans fwiw)
The tooling around ChatGPT and Codex is less, but their models are far more dependable imo than Antropic's at this very moment.
Please don't.
People burning through their tokens allowance on Claude Code is one thing.
People having their agent unknowingly provisioning thousands of $ of cloud resources is something completely different.
the big boys probably don't want people who don't know sec deploying on their infra lol.
[1] https://firebase.google.com/docs/ai-assistance/mcp-server
- workspace agent runner apps (like Conductor) get more and more obsolete
- "vibe working" is becoming a thing - people use folder based agents to do their work (not just coding)
- new workflows seem to be evolving into folder based workspaces, where agents can self-configure MCP servers and skills + memory files and instructions
kinda interested to see if openai has the ideas & shipping power to compete with anthropic going forward; anthropic does not only have an edge over openai because of how op their models are at coding, but also because they innovate on workflows and ai tooling standards; openai so far has only followed in adoption (mcp, skills, now codex desktop) but rarely pushed the SOTA themselves.
linux / windows requires extra testing as well as some adjustments to the software stack (e.g. liquid glass only works on mac); to get the thing out the door ASAP, they release macos first.
Like I notice in Codex in PhpStorm it uses Get-Whatever style PowerShell commands but firstly, I have a perfectly working Git-Bash installed that's like 98% compatible with Linux and Mac. Could it not use that instead of being retrained on Windows-centric commands?
But better yet, probably 95% of the commands it actually needs to run are like cat and ripgrep. Can't you just bundle the top 20 commands, make them OS-agnostic and train on that?
The last tiny bit of the puzzle I would think is the stuff that actually is OS-specific, but I don't know what that would be. Maybe some differences in file systems, sandboxing, networking.
I know/hope some OpenAI people are lurking in the comments and perhaps they will implement this, or at least consider it, but I would love to be able to use @ to add files via voice input as if I had typed it. So when I say "change the thingy at route slash to slash somewhere slash page dot tsx", I will get the same prompt as if I had typed it on my keyboard, including the file pill UI element shown in the input box. Same for slash commands. Voice is a great input modality, please make it a first class input. You are 90% there, this way I don't need my dictation app (Handy, highly recommended) anymore.
Also, I see myself using the built in console often to ls, cat, and rg to still follow old patterns, and I would love to pin the console to a specific side of the screen instead of having it at the bottom and pls support terminal tabs or I need to learn tmux.
> For a limited time we're including Codex with ChatGPT Free
Is this the first free frontier coding agent? (I know there have been OSS coding agents for years, but not Codex/Claude Code.)
Apple is great but this is OpenAI devs showing their disconnect from the mainstream. Its complacent at best, contemptuous at worst.
SamA or somebody really needs to give the product managers here a kick up the arse.
Wouldn’t native give better performance and more system integration?
Would I love to see swiftui on macos, wpf/winui om windows, whatever qt hell it is on linux? Sure. But it is what it is.
I am glad the codex-cli is rust and native. Because claude code and opencode are not: react, solidjs and what have you for a tree layer.
—
Then again, if codex builds codex, let it cook and port if AI is great. Otherwise, it’s claim chowder
My ire was provoked by this following on from the Windows ChatGPT app that was just a container for the webpage compared to the earlier bells and whistles Mac app. Perceptions are built on those sorts of decisions.
To me this still feels like the wrong way to interact with a coding agent. Does this lead people to success? I've never seen it not go off the rails in some way unless you provide clear boundaries as to what the scope of the expected change is. It's gonna write code if you don't even want it to yet, it's gonna write the test first or the logic first, whichever you don't want it to do. It'll be much too verbose or much too hacky, etc.
Weaker models give your experience, or when using a 100% LLM codebase I think it can end up in a hall of mirrors.
Now I have an idea to try, have a 2nd LLM processing pass that normalizes the vibe-code to some personal style and standard to break it out of the Stack Overflow snippet maze it can get itself in.
First phase: Plan. Mandatory to complete, as well as get AI feedback from a separate context or model. Iterate until complete.
Only then move on to the Second Phase: make edits.
Better planning == Better execution
With Codex, I increasingly can skip the plan step, and it just toils along until it has finished the issue. It can be more "lazy" at times and ask before going ahead more often, but usually in a reasonable scope (and sometimes at points where I think other services would have gone ahead on a wrong tangent and burnt more tokens of their more limited usage).
I wouldn't be surprised that with the next 1-2 model iterations a plan step won't be worth the effort anymore, given a good enough initial written issue.
> gh-address-comments address comments
Inspiring stuff. I would love to be the one writing GH comments here. /s
But maybe there's a complementary gh-leave-comments to have it review PRs for you too.
Did they fix it?
Otherwise I'm not interested.
Is there more information about it? For how long and what are the limits?
I’m aware Mac OS has some isolation/sandboxes but without running codex via docker I wouldn’t be running codex.
(Appreciate there are still risks)
I keep coming back to my basic terminal with tmux running multiple sessions. I recently though forked this https://github.com/tiann/hapi and been loving using tailscale to expose my setup on my mobile device for convenience (plus the voice input there)
also this feels like a unique opportunity to take some of that astronomical funding and point it towards building the right tooling for building a performant cross-platform UI toolkit in a memory-safe language—not to mention a great way for these companies to earn some goodwill from the FOSS community
looks like the same framework they used to build chatgpt desktop (electron)
edit - from another comment:
> Hi! Romain here, I work on Codex at OpenAI. We totally hear you. The team actually built the app in Electron specifically so we can support Windows and Linux as well. We shipped macOS first, but Windows is coming very soon. Appreciate you calling this out. Stay tuned!
So much valuation, so much intern competetion and shenanigans than the creatives left.
I am glad to not depend on AI. It would annoy me to no ends how it tries to assimilate everything. It's like systemd on roids in this aspect. It will swallow up more and more tasks. Granted, in a way this is saying "then it was not necessary to have this things anymore now that AI solves it all", but I am skeptical of "the praised land" here. Skynet was not trusted back in 1982 or so. I don't trust AI either.
I got invites to seven AI-centered meetings late last week.
Eric Schmidt has spoken a lot recently about how it's one of the biggest advances in human history and it's hard to disagree with him, even if some aspects make me anxious.
Apparently, the Codex app itself is proof that AI is not that good at doing what people think it does.
Replacing workers with things you can’t beat, sue, intimidate, or cajole? Someone is gonna do something to make that not cool in MBA land. I think if one of my employees LL-MessedUp something, and I were upset, watching that same person stealing my money haplessly turn to an LLM for help might land me in jail.
I kinda love LLMs, I’ve always struggled to write emails without calling people names. There’s some clear coding tooling utility. But some of this current hype wave is insano-balls from a business perspective. Pets.com X here’s-my-ssh-keys. Just wild.
I've recently had an issue "add VNC authentication" which covers adding vnc password auth to our inhouse vnc server at work.
This is not hard, but just a bit of tedious work getting the plumbing done, adding some UI for the settings, fiddle with some bits according to the spec.
But it's (at least to me) not very enjoyable, there is nothing to learn, nothing new to discover, no much creativity necessary etc. and this is where Codex comes in. As long as you give it clearly scoped tasks in an environment where it can use existing structures and convetions, it will deliver. In this case it implemented 85% of the feature perfectly and I only had to tweak minor things like refactor 1-2 functions. Obviously I read and understood and checked everything it wrote, that is an absolute must for serious work.
So my point is, use AI as the "code monkey". I believe most developers enjoy the creative aspects of the job, but not the "type C++ on your keyboard". AI can help with the latter, it will type what you tell it and you can focuse on the architecture and creative part of the whole thing.
You don't have to trust AI in that sense, use it like autocompletion, you can program perfectly fine without it but it makes your fingers hurt more.
Then there will be the AI wranglers who act almost like DevOps engineers for the AI - producing software in a different way ...
May give a go at this and Claude Code desktop as well, but Cursor guys are still working the hardest to keep themselves alive.
My experience with Cursor is generally good and I like that it gives me UX of using VS Code and also allows selection of multiple models to choose if one model is stuck on the prompt and does not work.
I remember the days when it was worth reading about their latest research/release. Halcyon days indeed.
From the video, I can see how this app would be useful in:
- Creating branches without having to open another terminal, or creating a new branch before the session.
- Seeing diff in the same app.
- working on multiple sessions at once without switching CLI
- I quite like the “address the comments”, I can see how this would be valuable
I will give it a try for sure
Bunch of the features u listed were already in the codex extension too. False outrage it its finest.
Once this app (or a similar app by Anthropic) will allow me to have the same level of "orchestration" but on a remote machine, I'll test it.
[1] https://survey.stackoverflow.co/2025/technology/#1-computer-...
One cool thing about this: upon installing it immediately found all previous projects I've used with Codex and has those projects in the sidebar with all of the "threads" (sessions) I've had with Codex on these projects!
Is it in the main Codex build? There doesn’t seem to be an experiment for it.
But overall, looks very nice and I'm looking forward to giving it a try.
Does it somehow gain some superpower from being left alone?
I will try it out, but is this just me, or product/UX side of recent OpenAI products are sort of ... skipped over? It is good that agents help ship software quickly, but please no half-baked stuff like Altas 2.0 again ...
Why not flex some of those codex skills for a proper native app…
They do *very* well at things like: "Explain what this class does" or "Find the biggest pain points of the project architecture".
No comparison to regular ChatGPT when it comes to software development. I suggest trying it out, and not by saying "implement game" but rather try it by giving it clear scoped tasks where the AI doesn't have to think or abstract/generalize. So as some kind of code-monkey.
I think it's clear now that the pace of model improvements is asymptotic (or at least it's reached a local maxima) and the model itself provides no moat. (Every few weeks last year, the perception of "the best model" changed, based on basically nothing other than random vibes and hearsay.)
As a result, the labs are starting to focus on vertical integration (that is, building up the product stack) to deepen their moat.
As much as I wish it were, I don't think this is clear at all... it's only been a couple months since Opus 4.5, after all, which many developers state was a major change compared to previous models.
The models are definitely continuing to improve; it's more of a question of whether we're reaching diminishing returns. It might make sense to spend $X billion to train a new model that's 100% better, but it makes much less sense to spend $X0 billion to train a new model that's 10% better. (Numbers all made up, obviously.)
Session A knocks it out of the park. Chef’s kiss.
Session B just does some random vandalism.
Sure I could move to open code and use them as commodities but I’ve gotten use to Claude code and like using the vendors first party app.
I love competition
No worries. I'm not their target demographic, anyway.
Like, seriously, this is the grand new vision of using a computer, this is the interface to these LLMs we're settling on? This is the best we could come up with? Having an army of chatbots chatting to each other running basic build commands in a terminal while we what? Supervise them? Yell at them? When am I getting manager pay bumps then?
Sorry. I'll stick with occasionally chatting with one of these things in a sandboxed web browser on a single difficult problem I'm having. I just don't see literally any value in using them this way. More power to the rest of you.
Codex really grown on me lately. I re-signed to try it out on a project I have and it turned out to be really great addition to my toolkit.
It isn't always perfect and it's cli (how I mostly use it) isn't as sophisticated as OpenCode which is my default.
I am happy with this app, I am using Superset, terminal app which suprisingly is well positioned to help you if you work in cli like I do. But like I said, new desktop app seems like a solid addition.
But you can already do that, in the terminal. Open your favourite terminal, use splits or tmux and spin up as many claude code or codex instances as you want. In parallel. I do it constantly. For all kinds of tasks, not only coding.
Translated from Marketingspeak, this is presumably "we're also desperate for some people to actually use it because everyone shrugged and went back to Claude Code when we released it".
That's like calling Coca Cola a random beverage vendor