8 pointsby beardyw8 hours ago5 comments
  • luckystarr4 hours ago
    I tried the way they used "oracles" to verify their implementation against known good ones, and I must say: this works.

    Not all implementations are re-implementations, so this approach won't work for everything. But for a new implementation in a new programming language than the original implementation, it works great. Built myself a MIB compiler, checked against smidump. Now it is more correct than the original, because smidump still crashes on some inputs, while mine does not.

    So the news is not so much that Anthropic built a C compiler, but HOW.

  • fuhsnn7 hours ago
    Didn't see people mention this in the initial discussion, but despite not having access to internet, the agents actually had access to the source code of GNU binutils (has assembler, linker and readelf), and many C compilers (SDCC, PCC, chibicc, cproc, etc) in their working directory [1]. These are supposed to test compilation, but there's no way to prove that Claude didn't get crucial info from these projects.

    I also found the compiler to have uncanny resemblance with chibicc. With 30-ish edge cases[2] yielding the same behavior, it's hard to believe there's no influence of chibicc in its algorithms.

    [1] https://github.com/anthropics/claudes-c-compiler/blob/6f1b99...

    [2] https://github.com/anthropics/claudes-c-compiler/issues/232

    • rPlayer65545 hours ago
      The binary would not be nearly as useful as the source code. And even if the AI read the binary and copied it, the story is still it reverse engineered and re-wrote a c compiler all on its own. Which is still pretty impressive to me and has real world use cases.

      Maybe Anthropic could release the logs to show how the AI was accessing the files.

  • nuc1e0n3 hours ago
    > but it's much closer to an "interesting lab demo" than an "obituary for human programmers."

    Was it being sold as the latter? I thought it was strange that a fuss was being made over this tech demo, and that explains why.

    Parsing regular languages into other regular languages is exactly what transformer based LLMs should be good at.

    A while ago there was an AI system publisized that was trained to generate animations from the game doom that was sold as AI created computer games. But the output made no sense if you watched for more than a few seconds. Isn't this the same kind of scare tactics dressed up as innovation?

  • geldedus2 hours ago
    AKA: moving the goal post
  • logicprog7 hours ago
    It's really funny how the goalposts shift over time. Before this happened, they would have all been into something that it was impossible for an agent swarm to do this. Now that it has happened, they're finding all the flaws with it, ignoring the trajectory of improvement over time, and insisting it isn't truly impressive just because it isn't perfect.

    Basically, every point they make feels like pointless nitpicking in order to be able to deny what's going on. It essentially feels like a cope.

    The argument that the model depended on extensive test suites and a custom harness doesn't really hold up to me, because that harness was just a simple hack to gather version of concepts that people experimenting with Ralph loops and agent swarms and agent orchestrators have been doing for a while now, which would be easy to build and out of the box generalized solution to, since very little the actual harness was custom to the project itself.

    Similarly, the argument that this only worked because of extensive test suites, including torture testing, doesn't really make sense to me because like, because like we have a long tradition of systems that can provide that kind of test suite to specify any product you want to produce, from BDD to PBT to DST, and the blog post explicitly acknowledges that the point of this is sort of to show that the job of software engineers from now on might end up be coming about specifying a problem sufficiently instead of directly writing the code that achieves it; and even that would vastly change the entire industry.

    Similarly, I find it very funny that even this article is forced to admit that the code is of pretty solid quality, even if it isn't as beautiful or elegant as something a rust expert might write (and it's always easy to criticize code quality from the peanut gallery, so I take that point with a grain of salt).

    In a similar manner, I don't really find the argument that because things like TCC and GCC were in the models training data that this doesn't matter convincing. Previous C compilers that would be in its training data were implemented in C, almost certainly not Rust, and implementing almost anything in rust that has been implemented in sea requires a substantially different architecture in the large in order to account for lifetimes and borrow checking and also typically to maintain even basic rust code quality and avoid unsafe completely different idioms and approaches to algorithms in the small as well. I say this having written several tens of thousands of lines of rust in the past. This means that in my opinion, it's difficult to call anything that these LLMs did just retrieval and reorganization; I think the best you can probably say is that they picked up a few general approaches to compiler algorithms and structure and understanding what a compiler is and generally how it should work from how those code bases in its training set. But you can't say that iturg's regitating or translating them directly. And at that point, that's the equivalent of someone having taken courses on or read books about compilers producing a new compiler. It is still impressive.

    Similarly, I find the argument that because there is a course that's much cheaper than $20,000 that teaches you how to write a basic C compiler that this is not impressive, very strange. The whole reason this is impressive is because this is something computers could not do autonomously before and now they can. The price of doing it with a computer compared to having a human do it isn't really relevant yet. And the price will come down[0]. I also think it's very likely that the basic C compiler you'll get from a course like the one they linked would not actually be able to compile SQLite, DOOM, or the Linux kernel 6.9 if you actually put it to the test.

    Also, it's really funny to me that they complain that this compiler project didn't also implement a linker and assembler. The entire point was to implement a compiler. That was the project under discussion. The fact that it uses an external linker and assembler is not a point against it. It's a complete non-sequitur.