Don't get me wrong. I've been using Claude Code and Codex CLI for quite some times now and it's amazing what they can sometimes do. (I skipped the "Copilot" phase where AI was just a "better" auto-complete)
Emphasis on sometimes. And you really have to double check everything they do. So much.
And literally this week, Claude turned "dumb". Things I'd expect it to be able to do before, result in stuff I really just throw out the window. I thought I maybe started prompting differently or something so I tried multiple times on the same task. But no, it just went nowhere this week. Codex worked fine on the same problem but it tried to cheat real bad on the test cases. Luckily I caught it but otherwise the tests would've been completely useless. Essentially "always green".
And this is on the "pay-per-token" work account, so I can't simply explain it away with "they're saving on compute for free / bulk pay".