In pointing models at scientific discovery, we will have to achieve the capabilities today's LLMs lack: - long-horizon palnning - continual adaptation - reasoning about uncertainty - information-efficient learning - and creative exploration.
Some of these capabilities may emerge from large-scale training. Others will will require changes in how we implement and train AI systems. I don't yet know how exactly such a training loop would look. So consider this post a conjecture.
But science offers a few unique properties at its foundation: - large open data - verifiability - truth-seeking (instead of power-seeking) incentives.
And thus I think scientific discovery is the ideal successor to internet-scale pretraining. It's not just an application, it maybe the means to building what we're missing. Maybe that's why we have @openai @GoogleDeepMind @periodiclabs @futurehouse etc. all focusing on it.