> „But pattern‑matching is not system understanding, and plausibility is not correctness.“
Why not? Who says that? Who proved that system understanding is not just more complex pattern matching?
> „LLMs predict tokens, not consequences“
Same here. LLMs output tokens but who says that they don’t form some internal group of token-predicting tensors that move together and constitute the internal model of a „consequence“? It is like saying humans don’t have thoughts, they just have electrical impulses moving their tongues.
I too think that LLMs seem to be a very specific form of intelligence, maybe resembling the parts of our brain that do language-processing, but it is a fact that they at least fake intelligence very convincingly. And that we actually don’t know how they do it.
> Why not? Who says that? Who proved that system understanding is not just more complex pattern matching?
I'm not in the camp of "system understanding is just more complex pattern matching"
but I am absolutely in the camp of "there are many tasks where pattern matching is just as effective as actual understanding"
What if „being effective at something with pattern matching but not understanding it“ just means that you have identified only 90% of patterns and keep failing to learn the rest for whatever reason.
What I want to say is, yeah fascinating topic about real understanding, but I think we have more pressing issues.
I think the naysayers already decided that the burden of proof is on the other side.
For LLMs the null hypothesis would be that there is no relationship between the input and output tokens. Something that is so obviously not true that it's not even worth calculating the number of sigmas away from the null hypothesis that LLMs are.
So clearly we discarded the null hypothesis sometime in 2017. Now we have a system that is really really good at pattern matching and seems to understand consequences. Is that "seeming" just a ruse or does it really understand stuff? A proper scientists would look at that evidence and put forward the hypothesis that maybe it really does understand stuff and begin working on experiments that would disprove that alternative hypothesis, moving forward with the assumption that the hypothesis is true until disproven or a better hypothesis is proposed that explains previous evidence more accurately. Naysayers saying "you haven't proven that pattern matching becomes understanding to my satisfaction" is not a rebuttal. They need an alternative hypothesis that can make predications that better fit the model and can be tested.
The only rebuttals I've heard are "AI can't actually understand stuff and therefore can't do X" which is a testable hypothesis at least. But Invariably AI eventually does X, just in a different way than anyone really expected.
Yes indeed. That's a perplexing statement considering that a central concept or software engineering is architecture patterns.
> central concept or software engineering is architecture patterns.
Both RUP and PSP/TSP do stand on the ground of defect prevention. All sorts of defects, from incorrect sets of requirements to memory corruption.Architecture patterns can be of help in that regard and they also can be very error-prone, as right now I am in the process of removing a bug introduced through misunderstanding of one rather old singleton.
Was this comment generated by an hallucinating agent? It reads like poorly pieced together word soup.
closes tab
It's obviously true... and yet when the next word is the completion of a chat template, suddenly they can talk to you. I don't know how far that will ultimately go, but "they're fundamentally just X" isn't providing useful information anymore.
There are many times when writing a feature that my spidey senses flare up and tell me that this thing is a lot more painful to code then I was expecting (and will be painful to maintain) and that a more elegant process may actually solve the problem, at which point I'll draw up an alternative option and talk to the product owner.
I've definitely started to see the consequences of the converse, which is large amounts of shite brittle code that solved the original spec narrowly, but is now an elephant on our back when we need to add other concerns to the system that cross over.
(BTW, this isn't against the use of coding agents entirely, its more against high-level agentic usage. I tend to use Claude Code to do little well defined tasks whilst I reflect on it).
If we're going to stabilize the software industry, we need to have more discussions like this that identify what constraints apply. (We should have had those discussion before pushing AI out this widely, but that wouldn't have gotten anyone rich.)
I actually think that there's a world of software systems agents can change, but it's materially different from the one we have now, and has a different set of constraints that we've also mostly done a poor job identifying. So hopefully the discussion can help those of us on both sides. ;)
My gut feeling is that it will take at least a couple of orders of magnitude improvements before these LLMs can even hold large systems fully in their context, much less understand them holistically. And I don't see an order of magnitude improvement coming any time soon, it feels the last one was GPT 3.5.
Use them as capability enhancers, not drones who go do all the things without review.
AI has no judgement or critical thinking even if it seems so, so we have to be wary to not let AI do this bc it will be poor quality and 0 innovative
Rough example: have an LLM generate a plan. Have a skill that refines the plan considering security risks, another that ensures codebase structures are followed, another that considers the infrastructure and usage demands, etc. Then write code and tests. Another process to validate the tests, validate all the above, simplify the logic, etc.
The key is that an LLM can do every task capably, even in a complex system. We simply have not built reasonable orchestration of all the human intent behind each filter, and many of them are constantly in flux. It may be that some elements resist encoding because the complexity of encoding is not worth the hassle to maintain.
For better or worse, managing intent, orchestrating narrow agentic tasks and solidifying patterns into deterministic code (i.e. validation/tests) is going to be the focus of engineers going forward.
But I pretty much agree with what they are saying. The missing "thing" is the developer context. Each agent I kick off needs a nonlinearly increasing amount of coaching, as a function of feature complexity. The sweet spot for productivity is currently the first 3 steps (from TFA), to get things into _my head_, then using the writing abilities more as ubersed or ubergrep with LSP integration. Love it for that.
For example, I'll often write the first 5th to 3rd of a feature by hand, then ask the agent to extrapolate from there. The "Core" contains the important bits but in a large system there's a lot of corner cases and wiring, and agents are good a discovering those. I interrupt when it tries to fix things by departing from the design and instead nudge or write a better solution quickly.
I absolutely hate the "Spin a cadre of agents to design/implement a feature from a concise spec" workflow. It involves so much planning to get the automatic execution working that it's often just easier to switch to hybrid planning/execution with both AI and people.
I’ve been trying out this cadre of agents idea with PR stacking and while I think it’s going to end up working fine, it took so much massaging to get it to where I needed it to be. Whereas with the hybrid approach, the problem space is a lot narrower and easier for me to define and the LLM to implement.
The human / llm needs to have some form of error correction signal. Either you have a corpus of tests or proof system that prevent regressions.
If you have a working system with no tests or validation and let a human loose on it then it will break. How is this different?
This is causal reasoning, not pattern extension. LLMs predict tokens, not consequences — and that is why the leap from writing code to producing a safe, system‑aware PR‑ready diff is not incremental but a shift into a fundamentally different problem space."
This is well said. We need a new paradigm. I could go into the shortcomings of the current agent-oriented approaches but it would turn into a huge post. If you want to read it, I wrote it up here: http://safebots.ai/agents.html
Jesus, it's fucking 2026. Even LeCun would never say this again.
[1] https://arxiv.org/abs/2410.02724
[2] https://arxiv.org/abs/2304.15004