Claude Code itself is complete trash. They had a massive headstart and now are routinely lapped by open source harnesses and then they STILL double down on not allowing e.g. OpenCode usage with the Max plan. Meanwhile, OpenAI lets you use whatever harness you want and its a beast. I recently did some testing and OpenAI's Pro plan on an opencode harness (GPT 5.5 XHigh) with parallel agent delegation absolutely smokes Claude Code 4.7 Max. These days Claude Code can barely even remember its CLAUDE.MD instructions. I'd say Opus 4.7 Max API is slightly better than GPT 5.5 XHigh, but not nearly enough that the API token price is at all justified.
Claude, I think is still better for business things like document generation, design, etc. especially via claude.ai interface (GDrive integrations and things like that are very useful). But for code generation and dev workflows, Claude Code is dropping the ball so hard its starting to look like a generational fumble.
But even codex has these super weird time limits. It's really starting to show that these companies must have been losing a ton of money with all the recent limits and degration.
I'm still on the "camp" that most of these unicorns will be F'ed by open and local models in the next few years, at least in these coding/chatbox niches and then they'll just be perpetually (re)searching for AGI :shrug:
Anthropic’s recent postmortem described several Claude Code regressions around default reasoning effort, context/thinking retention, and a system prompt change to reduce verbosity. Those seem more likely to explain the “less careful / forgetful / worse follow-through” behavior than the context window alone.
I would compare the same task in a fresh session, with the effort setting fixed, and ideally against a few repeatable tasks from your own codebase. Otherwise it is very hard to tell whether the regression is the model, Claude Code’s harness, context management, or just a stale session.