To my mind, inference at the edge is what will kill inference in the datacenter. Inference at the edge is more secure, faster, and uses less electricity. People share vulnerable and personal info in their chats, why share it with OpenAI who will use it to sell ads?
In a world where most of inference being done at the edge, what do we need all of these data centers for? You may say we need them to continue pre-training even bigger models. And yet, pre-training models has hit a performance plateau.
Inference in a data center never made sense. It's such a massive investment of resources when we're all carrying around computers in our pockets. As someone who values my privacy, I will start doing inference on device exclusively as soon as possible.