I because of this, the next task I gave it on the larger side, I ran its work through Codex which identified 7 glaring unfinished parts of the task.
The trend was starting the part of the task but then leaving a "skeleton" of what I has requested without any of the actual working parts.
The way I would describe it is a kid cramming his 3 month project into a Sunday evening for Monday's due date.
In reality as they scale up, the models lose nuance and become noisier. The boosters do not want to admit this.
We need highly-specialised models/interfaces. Not one thing and trying to force-fit it.
> Not one thing and trying to force-fit it.
agree, but then they become glorified ide plugins and can't justify the huge valuations that a
magic box that does and knows everything can justify...> The frustrating part is that it's not a workflow _or_ model issue, but a silently-introduced limitation of the subscription plan. They switched thinking to be variable by load, redacted the thinking so no one could notice, and then have been running it at ~1/10th the thinking depth nearly 24/7 for a month. That's with max effort on, adaptive thinking disabled, high max thinking tokens, etc etc.
So Boris' explanation isn't really an explanation.