Compilers literally made your project possible!
https://reproducible-builds.org/docs/source-date-epoch/
(although Nix sets it as a default)
Here, since any whatwg cartel web engine is an issue, the author should not bother.
Umm there are like 60,000 lit tests checked into LLVM that verify that the output is absolutely a deterministic function of the input of at least that compiler.
> Even though the source code had the same bytes, the output of the compiler was wildly different.
This is the goofiest I've seen written unironically in quite a long - the C preprocessor is not part of the compiler. The pre in preprocessor should probably give it away.
Just a tip: you should probably actually understand something before you decide you hate it.
I for one enjoyed the article and understand what you're getting at.
This is true but doesn't seem relevant; does replacing the word "compiler" with "build chain" change anything? Because that seems like the clear meaning.
If you want to have users trust that someone else hasn't modified it, then sign it with your identity.
Being able to reproduce the binary from the source code and being able to verify that it's the same as the original is quite important in some contexts.
I disagree. The contexts that people come up with are purely theoretical, and are not practically important. Please do try and convince me otherwise by sharing such a context. From my view the juice of trying to accomplish this is no where worth the squeeze.
That tooling is a compiler. The higher level, the better chance the LLM can be steered to good output. Machine code is hopeless, don’t bother.
Also there are dynamic compilers were the shape of machine code changes as the code executes, and each single execution will certainly generate different sequences, depending on the program execution and where it is running.
Deterministic JIT compiler code generation, at least on optimising ones, is not a solved problem.
I don't see why that's the case. LLM trained on binary would totally see it, not?
Also the tool can also be running the test and a debugger.
It would not. You find the correct version by counting the number of bytes to the destination. LLMs are famously bad at this kind of problem (counting).
> Also the tool can also be running the test and a debugger.
The test needs to provide a good amount of signal. That’s too hard if you are throwing machine code at the wall.
In order for debuggers to work, you need some kind of model that describes what the code should do and what state the computer should be in after each instruction. That model is high-level code.
I can understand the intuitive appeal of training LLMs with machine code, but all of my experience with LLMs suggest that they are incredibly ill-suited to the task, and we just don’t have the capacity to train them to make useful machine code.
It applies to humans too. Calculus is “simple” but it takes something like sixteen years to train a human to do it, if all goes well. Meanwhile, most humans think that inverse kinematics is, like, the easiest thing in the world (it’s a super complicated task).
You can have LLMs help you optimize code but I don’t think you can do this unattended for non-trivial code.