> A diff can show what changed in the artifact, but it cannot explain which requirement demanded the change, which constraint shaped it, or which tradeoff caused one structure to be chosen over another.
That's not true... diffs would be traceable to commits and PRs, which in turn are traceable to the tickets. And then there would be tests. With all that, it would be trivial to understand the whys.
You need both the business requirements and the code. One can't replace the other. If you attempt to describe technical requirements precisely, you'll inevitably end up writing the code, at very least, a pseudocode.
As for regenerating the deleted code out of business requirements alone, that won't work cleanly most of the time. Because there are technical constraints and technical debt.
So what did you say about version contol?
> By regenerable, I mean: if you delete a component, you can recreate it from stored intent (requirements, constraints, and decisions) with the same behavior and integration guarantees.
That statement just isn't true. And, as such, you need to keep track of the end result... _what_ was generated. The why is also important, but not sufficient.
Also, and unrelated, the "reject whitespace" part bothered me. It's perfectly acceptable to have whitespace in an email address.
How different the output is each time you generate something from an LLM is a property called 'prompt adherence'. It's not really a big deal in coding LLMs, but in image generation some of the newer models (Z Image Turbo for example) give virtually the same output every time if the prompt doesn't change. To the point where some users claim it's actually a problem because most of the time you want some variety in image gen. It should be possible to tune a coding LLM to give the same response every time.
Note, when Fabrice Bellard made his LLM thing to compress text, he had to make sure it was deterministic. It would be terrible if it slightly corrupted files in different ways each time it decompressed
I strongly disagree. Nowadays most LLMs support updating context with chat history. This means the output of a LLM will be influenced by what prompts you have been feeding it. You can see glaring changes in what a coding agent does based on what topics you researched.
To take the example a step further, some LLMs even update their system prompts to include context such as where you are in the world at that precise moment and the time of the year. Once I had ChatGPT generate a complete example project based around an event that was taking place at a city I happened to be cruising through at that moment.
What the author actually wants is ADRs: https://github.com/joelparkerhenderson/architecture-decision...
That’s a way of being able to version control requirements.
Their main idea is to version control the reasoning, which, OK, cool. They want to graph the reasoning and requirements, sounds nice, but there are graph languages that fit conviently into git to achieve this already...
I also fundamentally disagree with the notion that the code is "just an artifact". The idea to specify a model is cute, but, these are indeterminate systems that don't produce reliable output. A compiler may have bugs yes, but generally speaking the same code will always produce the same machine instructions, something that the proposed scheme does not...
A higher order reasoning language is not unreasonable, however the imagined system does not yet exist...
The way we solve the why/what separation (at minfx.ai) is by having a top-level PLAN.md document for why the commit was built, as well as regenerating README.md files on the paths to every touched file in the commit. Admittedly, this still leans more into the "what" rather than "why". I will need to think about this more, hmm.
This helps us to keep it well-documented and LLM-token efficient at the same time. What also helps is Rust forces you into a reasonable code structure with its pub/private modules, so things are naturally more encapsulated, which helps the documentation as well.
I think commenters here identified many of the issues we would face with it today, but thinking of a future where LLMs are indeed writing virtually all code and very fast, ideas like these are interesting. Our current tooling (version control, testing, etc.) will certainly need to adapt if this future comes to pass.
People need to remember how good it feels to do precise work when the time comes!
I’m sorry but it feels like I got hit in the head when I read this, it’s so bad. For decades, people have been dreaming of making software where you can just write the specification and don’t have to actually get your hands dirty with implementation.
1. AI doesn’t solve that problem.
2. If it did, then the specification would be the code.
Diffs of pure code never really represented decisions and reasoning of humans very well in the first place. We always had human programmers who would check code in that just did stuff without really explaining what the code was supposed to do, what properties it was supposed to have, why the author chose to write it that way, etc.
AI doesn’t change that. It just introduces new systems which can, like humans, write unexplained, shitty code. Your review process is supposed to catch this. You just need more review now, compared to previously.
You capture decisions and specifications in the comments, test cases, documentation, etc. Yeah, it can be a bit messy because your specifications aren’t captured nice and neat as the only thing in your code base. But this is because that futuristic, Star Trek dream of just giving the computer broad, high-level directives is still a dream. The AI does not reliably reimplement specifications, so we check in the output.
The compiler does reliably reimplement functionally identical assembly, so that’s why we don’t check in the assembly output of compilers. Compilers are getting higher and higher level, and we’re getting a broader range of compiler tools to work with, but AI are just a different category of tool and we work with them differently.
Except you can't run english on your computer. Also the specification can be spread out through various parts of the code base or internal wikis. The beauty of AI is that it is connected to all of this data so it can figure out what's the best way to currently implement something as opposed to regular code which requires constant maintenance to keep current.
At least for the purposes I need it for, I have found it reliable enough to generate correct code each time.
And before you say "that's indirect!", it genuinely does not matter how indirect the execution is or how many "translation layers" there are. Python for example goes through at least 3 translation layers, raw .py -> Python bytecode -> bytecode interpreter -> machine code. Adding one more automated translation layer does not suddenly make it "not code."
>noun A system of symbols and rules used to represent instructions to a computer; a computer program.
In the other hand the prompt is for the AI. It's not meant for instructions to a computer.
> Except you can't run english on your computer.
I can't run C on it either, without translating it to machine code first. Is C code?
and C is for the compiler not "the computer".
It's commonplace for a compiler on one computer to read C code created on a second computer and output (if successfully parsed) machine code for a third computer.
Would love people's thoughts on this: https://0xmmo.notion.site/Preventing-agent-doom-loops-with-p...
Do you have a working implementation for this? Just a one-to-one index of files and reasoning traces? I'd like to trace these changes easily back to a feature or technical spec too (and have it change that spec if it needs to? I suppose the spec would have it's own reasoning trace)
would not be unfamiliar to mechanical engineers who work with CAD. The ‘Histories’ (successive line-by-line drawing operations - align to spline of such-and-such dimensions, put a bevel here, put a hole there) in many CAD tools are known to be a reflection of design intent moreso than the final 3D model that the operations ultimately produce.
Fixing this in CAD is already a massive pain, fixing it with black-box LLMs sounds nearly impossible.
Please please don’t get me started..: https://github.com/ricksher/ASimpleMechatronicMarkupLanguage
If you use an LLM and agents to regenerate code, a minor change in the "specification" may result in huge changes to the code. Even if it's just due to forcing regeneration. OK, got that.
But there may be no "specification", just an ongoing discussion with an agentic system. "We don't write code any more, we just yell at the agents." Even if the entire sequence of events has been captured, it might not be very useful. It's like having a transcript of a design meeting.
There's a real question as to what the static reference of the design should be. Or what it should look like. This is going to be difficult.
I think commit messages should actually have a concise "what" in them.
I frequently enough end up looking at git log trying to sort out what changed (to track down a bug or regression), and based on the commit message, do a git show to see what the actual diffs are.
So in that context, at least, knowing what changed in a commit is actually quite useful, and why is arguably less so.
I suspect my idea of "what" and your idea of "why" overlap in this scenario.
Edit: and after typing all that, I realized your comment doesn't imply there shouldn't be a "what" described anyway so maybe I'm just discussing nothing at all.
Deltas are just an implementation detail, and thinking of Git as diffing is specifically shunned in introductions to Git versioning.
The only practical obstacle is:
> Non-deterministic generators may produce different code from identical intent graphs.
This would not be an obstacle if you restrict to using a single version of a local LLM, turn off all nondeterminism and record the initial seed. But for now, the kinds of frontier LLMs that are useful as coding agents run on Someone Else's box, meaning they can produce different outcomes each time you run them -- and even if they promise not to change them, I can see no way to verify this promise.
Looking at individual line changes produced by AI is definitely difficult. And going one step higher to version control makes sense.
We're not really there yet though, as the generated code currently still needs a lot of human checks.
Side thoughts: this requires the code to be modularized really well. It makes me think that when designing a system, you could imagine a world where multiple agents discuss changes. Each agent would be responsible for a sub system (component, service, module, function), and they would chat about the format of the api that works best for all agents, etc. It would be like SmallTalk at the agent level.
Not sure what stops you from doing that just right now.
CONGRATULATIONS: you have just 'invented' documentation, specifically a CHANGE_LOG.
Anyone want to try and lmk how far you get?