Post author here.
Happy to answer questions and discuss further. The essay has an appendix with the model's own self-report on its reasoning (the most load-bearing evidence, IMO), so worth scrolling to the end if you're skeptical of the rest.
Curious what you'd propose as alternative explanations, especially from folks with pointers to related literature.