I've been following Ed Zitron's reporting on AI and costs/profits. When Mike Pound starts talking about the costs at 4:12, he hints at the fact that none of us know the actual cost of handling a token. Ed's reporting hints that it may be quite a bit higher than we've been lead to suspect, it may, in fact, be far above current retail prices, subsidized as part of the "marketing" expenses on the AI companies balance sheet to try to gain market share.
It seems we're in for a "call to Jesus" moment, and a big pop in the markets as a result. This video is part of the structure that does the popping.
I think that actually storing the model state after each query/result will become the standard, to save reprocessing tokens. Switching between models would thus become discouraged because it would lose that state. It wouldn't be impossible, it just wouldn't be cheap. I could see storing the entire AI state, including "thinking", model version, etc. along with the code in the GitHub repository, right next to the commit comment. Gits delta compressor should make storing all that binary data tractable.
We're living in interesting times. Agentic coding certainly seems valuable, but it may turn out to be ruinously expensive, compared to just using us human programmers. We'll have to wait and see.