48 pointsby tortilla2 days ago5 comments
  • hebejebelus16 hours ago
    This was mentioned recently and I had a look[0]. It seems like the benchmark is not quite saying what people think it's saying, and the paper even mentions it. The benchmark is constructed by using a model (Deepseek) to define questions of a certain level of difficulty for the models. That may result in easier problems for "low-resource" languages such as Elixir and Racket and so forth since the differentiator couldn't solve harder problems. From the actual paper:

    > Section 3.3:

    > Besides, since we use the moderately capable DeepSeek-Coder-V2-Lite to filter simple problems, the Pass@1 scores of top models on popular languages are relatively low. However, these models perform significantly better on low-resource languages. This indicates that the performance gap between models of different sizes is more pronounced on low-resource languages, likely because DeepSeek-Coder-V2-Lite struggles to filter out simple problems in these scenarios due to its limited capability in handling low-resource languages.

    At the same time I have used Claude Code on an elixir codebase and it's done a great job. But for me, it's undefined that it would have done a worse job if I had picked any other stack.

    [0]: https://news.ycombinator.com/item?id=46646007

  • podlp15 hours ago
    I tried Elixir a few months back with several different models (GPT, Claude, and Gemini). I’m not an Elixir or BEAM developer, but the results were quite poor. I rarely got it to generate syntactically correct Elixir (let alone idiomatic). It often hallucinated standard library functions that didn’t exist. Since I had very little prior experience, steering the models didn’t go well. I’ve since been using them for JS/ TS, Kotlin/ Java, and a few other tasks where I’m much more familiar.

    My takeaway was that these models excel at popular languages where there’s ample training material, but struggle where the languages change rapidly or are relatively “niche.” I’m sure they’ve since gotten better, so perhaps my perception is already out of date.

  • pjm331a day ago
    I’ve had a fantastic experience building out an internal AI agent service using elixir and phoenix - after only dabbling with it in side projects for almost a decade

    OTP fits agents like a glove.

  • TomBers18 hours ago
    Wanted to second this. Been using AI extensively on a relatively large Phoenix / Elixir code base, and it mostly produces excellent results.

    The features of Elixir that lead to good software are amplified with LLM's.

    One thing that I would perhaps add to the article (or emphasise) is the clarity and quality of error messages in Elixir. In my opinion some of the best error logging in the game. The vast majority of the time the error gives enough information to very quickly fix the problem.

  • pjmlpa day ago
    If it doesn't target the GPUs, with the same kind of existing tooling for C++, Python and Julia, it isn't.
    • flexagoon19 hours ago
      Did you even open the article? It claims it's the best language to use with AI, not the best language to develop AI
      • pjmlp19 hours ago
        I did, for me to use with AI implies also developing.