If LLMs Only Predict the Next Token, Why Do They Work?(sicheng.dev)

3 pointsby sichengo2 hours ago2 comments

chrisjj2 hours ago
> we somehow observe behaviors that resemble reasoning, abstraction, and even creativity.
Puleeze. You type a search into Google and it returns a news article. Did you observe Google being creative?
> where exactly does the apparent intelligence come from?
The user's gullibility.
- sichengoan hour ago
  Hmm, good point, but google only retrieves information from the web but LLMs do generate a new continuations from learned distributions. I would say the interesting question is that why modeling reasoning traces at scale produces reasoning like behavior at all.
- verdverman hour ago
  Dogs have intelligence, bees have intelligence, even slime molds some would argue. They are all different, yet still recognized. For you, why Ai is different?
  - chrisjjan hour ago
    Why is your fridge, toaster or keyfob different? Same applies.
    verdverm24 minutes ago
    Your comment breaks from https://news.ycombinator.com/newsguidelines.html
_wire_15 minutes ago
The confusion arises from the question begging an arbitrary distinction of a token in isolation from all the state in the model and its progression in response to a prompt.
They work because while the process is generating a token at a time, each token has a location in an N-dimensional matrix of overlayed state networks for all the tokens in the training data, the tokens in the given prompt, and the sequence of tokens emitted so far for this prompt.
As an analogy, an image on the screen is emitted a pixel at a time, but each pixel's state is coded as part of a network in a matrix that includes all the residual state from the point of image capture.
And just like an image on your screen the computer has no "ideas" about the contents of the presentation, but other subsystems may use mathematical approaches to selecting and categorizing images, filtering, etc.
The common regard that AI is thinking is purely a matter of appearances, and idiomatic terminology.
As to why we tend to become troubled by the resemblance of AI behavior to thought or creativity, but we are not at all troubled by how entire worlds exist within our TV sets is a matter of surprise and conditioning to the medium.
- sichengoa few seconds ago
  Yes, I do admit that I simplified the mechanism in the article, but my question is why the scale of next-token prediction yields reasoning-like behavior.