It's interesting and honestly encouraging that this kind of thing can be discovered and understood using just "simple linear methods" and high-level analysis of patterns in layer activations.
Previously:
2 years ago: https://news.ycombinator.com/item?id=37794996.
1 year ago: https://news.ycombinator.com/item?id=40329675
Fwiw, I tried multiple global tokens in my chess neural net and didn't see any uplift compared to my baseline of just having one.