We seem hyperfocused on finding more tasks to train neural networks to do. This of course leads to a moving goalpost effect like in the article, but they’re moving along an axis that doesn’t measure intelligence.
My other comment: https://news.ycombinator.com/item?id=46445511
Mine is: write a nroff document that executes at least one macro, and is a quine.
Based on what LLMs have given me for answers so far, I'd look harder for the human-written source of the nroff code. I have written what I believe to be the only quine in the GPP macro processing language, LLMs only refer me to my code if I ask for a GPP quine. Google, Meta, OpenAI really have strip mined the entire web.
If I genuinely thought anything creative or new appeared, I'd probably be at a loss as well.
(I am assuming that the task is actually possible to accomplish. If it isn't possible, then it isn't a very good goalpost!)
If it's not possible, I'd love to see an explanation, so that task can quite weighing on me.