I actually worked on a similar tree viewer as part of an NLP project back in 2005, in college, but that was for rule-based machine translation systems. Chapter 4 in the final report: https://www.researchgate.net/profile/Declan-Groves/publicati...
It walks through the core stages — tokenization, POS tagging, dependency parsing, embeddings — and visualizes how meaning gets fragmented and simulated along the way.
Built with Streamlit, spaCy, BERT, and Plotly. It’s fast, interactive, and aimed at anyone curious about how LLMs turn your sentence into structured data.
Would love thoughts and feedback from the HN crowd — especially devs, linguists, or anyone working with or thinking about NLP systems.
GitHub: https://github.com/jdspiral/meaning-machine Live Demo: https://meaning-machine.streamlit.app
Subject–Verb–Object triples, POS tagging and dependency structures are not used by LLMs. One of the fundamental differences between modern LLMs and traditional NLP is that heuristics like those are not defined.
And assuming that those specific heuristics are the ones which LLMs would converge on after training is incorrect.
It also feels like motivated reasoning to make them seem dumb because in reality we mostly have no clue what algorithms are running inside LLMs.
> When you or I say "dog", we might recall the feeling of fur, the sound of barking [..] But when a model sees "dog", it sees a vector of numbers
when o3 or Gemini sees "dog", it might recall the feeling of fur, the sound of barking [..] But when a human says "dog", it sees electrical impulses in neurons
The stochastic parrot argument has been had a million times over and this doesn't feel like a substantial contribution. If you think vectors of numbers can never be true meaning then that means either (a) no amount of silicon can ever make a perfect simulation of a human brain, or (b) a perfectly simulated brain would not actually think or feel. Both seem very unlikely to me.
There are much better resources out there if you want to learn our best idea of what algorithms go on inside LLMs [2][3], it's a whole field called mechanistic interpretability, and it's way, way, way more complicated than tagging parts of speech.
[1] Maybe attention learns something like this, but it's doing a whole lot more than just that.
[2] https://transformer-circuits.pub/2025/attribution-graphs/bio...
[3] https://transformer-circuits.pub/2022/toy_model/index.html
P.S. The explainer has em dashes aplenty. I strongly prefer to see disclaimers (even if it's a losing battle) when LLMs are used heavily for writing especially for more technical topics like this.
The more salient point is that when a model reads “dog” it associates a bunch of text and images vaguely related to dogs. But when a human reads “dog” they associate their experiences with dogs, or other animals if they haven’t ever met a dog. In particular, cats who have met dogs also have some concept of “dog,” without using language at all. Humans share this intuitive form of understanding, and use it with text/speech/images to extend our understanding to things we haven’t encountered personally. But multimodal LLMs have no access to this form of intelligence, shared by all mammals, and in general they have no common sense. They can fake some common sense with huge amounts of text, but it is not reliable: the space of feline-level common sense deductions is not technically infinite, but it is incomprehensibly vast compared to the corpus of all human text and photographs.
LLMs do have language-agnostic understandings in their latent space. "Dog" and "Perro" have largely the same representation (depending on the model. I believe more advanced ones show this more strongly?) as does a picture of a dog. I'm not sure if that's exactly the form of understanding you're referring to?
I agree the human text/images corpus is very small compared to evolution's millions of years of learnings from interacting with the environment, which is why I'm excited for RLing LLMs because it opens up the same data trove.