A thing that feels under-discussed: systems improvements (retrieval, caching, distillation, better evals, tighter feedback loops) can move the frontier even if raw pretrain scaling slows. That looks like progress to users even if it’s not a new giant model.
I’m more worried about a plateau in reliability than in average capability: the last 5% of error rate is brutally expensive.