You could erase the gains from literally half the compute going into some of these recent models and barely make a dent in MMLU-Pro and GPQA-D.
I'm hoping to see more work in the other direction with cyclic/looped transformers and other memory dense approaches.
The link is to a famous YouTuber called PewDiePie and he uses a local LLM to parse his email, to save time with that. They have an autoreply system and get notified about urgent matters.