Stop anthropomorphizing intermediate tokens as "reasoning" when all it can do is rationalize.
E.g. "This test script failed but probably for an unrelated reason. I'll mark it done and move on."
Does AI ever display creativity in any sense other than behaviors that follow a statistical spread around the training space?
If it can, why doesn't this make the news? These pseudo-aphoristic stories of how to cajole AIs are told 100 times a day!
To those uninitiated to computing, it might seem amazing given that a machine can do combinatorials at rates many orders of magnitude faster than a human, but the measure of creativity is with insights, hitherto unknown, that simplify the complex and bring the inexplicable within the range of intelligibility.
The cycle is AI is found to be ludicrously bad by some ordinary or trivial measure that demonstrates nothing like human thinking is occurring in the machine, then the news of it is published and the mechanical turk gets the message, then suddenly the AI "knows" what it can't figure out on its own.
The intelligence is an externality of humans enslaved to the care and feeding of data to it.
The machine is basically a storage system with a semantic mode for retrieval. And as its access mode is semantic, it looks like it's thinking, when there is no innate reasoning in the sense of human reasoning.
Build a house of mirrors and you will find you can get lost in it, but don't confuse a house that reflects your own mind with one that has a mind of its own.
A great danger resides with a system that can confuse people into believing it thinks operating at a world-wide scale.