To me that implies the input isn't deterministic, not the compiler itself
You might argue that this is redefining the question in a way that changes the answer, but I'd argue that's also an academic objection; pragmatically, the important thing isn't the exact language but the intent behind the question, and for an engineer being asked this question, it's a lot more likely that the person asking has context for asking that cares about more than just the literal phrasing of "are compilers deterministic?"
If we're not going to assume the input state is known then we definitely can't say what the intent behind the question is - for many engineering applications the compiler is deterministic. Debian has the whole reproducible builds thing going which has been a triumph of pragmatic engineering on a remarkable scale. And suggests that, pragmatically, compilers may be deterministic.
And that's just one really low hanging fruit type of example, there are many more for instance selecting a different optimization path when memory pressure is high and so on.
or the system upon which the compiler is built (as well as the compiler itself) has made some practical trade offs.
the source file contents are usually deterministic. the order in which they're read and combined and build-time metadata injections often are not (and can be quite difficult to make so).
Either way it's a nitpick though, a compiler hypothetically can be deterministic, an LLM just isn't? I don't think that's even a criticism of LLMs, it's just that comparing the output of a compiler to the output of an LLM is a bad analogy.
The point being that determinism of a particular form is expected and required in the instances where they do that.
(I'm not arguing for or against that, I'm simply saying I've seen it in real life projects over the years.)
Determinism would help you. With a bit of engineering, you could make LLMs deterministic: basically, fix the random seed for the PRNG and make sure none of the other sources of entropy mentioned earlier in the article contribute.
But that barely impact any of the issues people bring up with LLMs.
Determinism is a red herring. What matters is how rigorous the relationship is between the input and the output. Compilers can be used in automated pipelines because that relationship is rigorous.
The real issue is prompt instability (chaos). A one word change to a prompt/spec will produce a drastically different program. Until that is solved there’s no world where we just check in the prompt and almost no one ever has to worry about the code.
So in other words, determinism (or lack thereof) is the hard problem!
I think this is the more important property and I'm not sure if it has a well-known name. The article obliquely calls it reliability, but regardless it's the key difference from LLMs. Compilers mostly achieve it, ignoring an endless list of exceptions you learn with experience.
LLMs usually don't, even with 0 temperature and floating point determinism.
Sure everything you have unit tests for might stay the same, but unless your unit tests are testing all observable behavior (and if they are they’ll be 100x longer than the code) users will notice incredibly confusing differences in every build.
But lets all hope these are not vital systems we end up depending on.
What is included in the 'verify' step? Does it involve changing the generated code? If not, how do you ensure things like code quality, architectural constraints, efficiency and consistency? It's difficult, if not (economically) impossible, to write tests for these things. What if the LLM does not follow the guidelines outlined in your prompt? This is still happening. If this is not included, I would call it 'brute forcing'. How much do you pay for tokens?
Compilers aren't deterministic in small ways, timestamps, encoding paths into debug information, etc. These are trivial, annoyances to reproducible build people and little else.
You cannot take these trivial reproducibility issues and extrapolate out to "determinism doesn't matter therefore LLMs are fine". You cannot throw a ball in the air, determine it is trivial to launch an object a few feet, and thus conclude a trip the moon is similarly easy.
The magnitude matters, not merely the category. Handwaving magnitude is a massive red flag a speaker has no idea what they're talking about.
Lots of engineering effort goes into making this be true.
TFA argues that you can't control the inputs perfectly, and so the behavior may differ if you fail to control the inputs. Yeah sure.
But the answer to the clickbaity question in the title is simply "Yes".
A compiler, making my job harder by being unpredictable? All the time.
So did other programmers, users with creative input, random parallel processes running at the same time.
LLMs are actually kind of tame in comparison.
Maybe you don't build or tinker with things enough to have warranted making a dartboard out of the gcc contributor graph, but damnit, some of us do. That compiler is not magic, continually floats around, and when you're just trying to get something from the stage of "doesn't exist at all" to "exists", does absolutely throw curve balls your way. I start wit -O0 -g and then crank up the optimization level once everything works. Otherwise come debug time, shit's missing, stuff happens at weird times, etc. If you don't treat the compiler as spooky, you haven't paid enough attention to it.
Also having an -O0 debug build is standard practice.
My point isn't that compilers are super easy to use and never frustrating, my point is that the notion that LLMs "compile" english to code is a bad analogy. Compilation is a translation from one formal representation to another. LLMs are an interpretation of informal language into a formal language. They just are not at all the same thing.
Attempting to get consistent results from floating-point code is another rabbit hole. GCC and clang have various flags for "fast math" which can enable different optimisations that reduce precision.
Before SSE, fp on x86 was done by the "x87" FPU which always had 80-bit precision, even if the type in the source code was 32 or 64 bits — and it used to be accepted to sometimes get more precision than asked for. Java got its "strictfp" mode mainly because of x87.
I wrote "We have not remotely solved the halting problem in the formal sense", which does not read like a claim that LLMs have solved the halting problem to me, but I'm open to rewording it. How would you put it?
I added in a bit about compiler contract, wdyt?
When did the girlfriend enter the discussion? Did I miss something?
The OP brings up testimony of someone other than himself who prefers when software drives their car rather than him.