No, it's from last october, and it probably lasted until at least codex5.3/sonnet 4.6, so it's 6 month old at best, maybe less (You can't say the same with Opus though for sure). I felt faster for sure, but all metrics (except LOC) from my team were worst, and we spend a lot of extra time bugfixing. We're better now, but it's hard to say if it's because the models are that much better or if now the people generating 100% of their code now actually review the code themselves before opening a PR (also we are a bit more coercitive during the PR reviews, what use to be non-blocking like style choices is now blocking, and we dig way more).
I don't know who made those numbers up, but for me... I can almost certainly guarantuee, I have never been so relaxed before. Doing multiple paid projects simultaneously due to AI, still leaning back, customer's are happy. I can confidently say: if you know how to leverage it properly, you can be both more efficient and relaxed at the same time. I'd also argue, if you use a combination of SOTA models to code and review and put in some own thoughts, too, then code is also GG.
The updated METR study on this gives different results, but they should still be quite sobering.
But I don't really understand why MAREF is supposed to be the answer. If we adopt MAREF, then to pass MAREF, those metrics become the target, right? But let's think about Goodhart's Law: 'When a measure becomes a target, it ceases to be a good measure.' AI will just produce all sorts of bad code just to pass those checks. If you tighten things too much, people will resort to workarounds just to fit through that narrow gap.
And is all GENAIcode garbage? Honestly, I don't think so. I agree that in the long term, if AI training data gets contaminated, it will degrade, but clearly code that has been reviewed by humans is actually better. The case of AlphaDev is a good example. Optimizations like sort 3, 4, and 5 were discovered precisely because they were found by AI.
If that's the case, wouldn't it be better to just create an open source project that only accepts human‑written code and funnel all the funding into that? In other words, 'people who create uncontaminated AI datasets'
Controversial default to say the least.
But the problem they're so clumsily trying to monetize is absolutely real. GitHub is rapidly turning from a place with battle-tested solutions into a dumpster fire of hallucinations. And no crutches like MAREF are gonna fix that because platforms profit from showing growth in repo and commit counts even if it's all dead plastic code.
To use an AI-ism: HN isn't an AI blog dump, it's a community.