150k lines of vibe coded Elixir: The good, the bad and the ugly(getboothiq.com)

97 pointsby InternetGiant13 days ago16 comments

viktorcode13 days ago
It's the second time today when I see that the higher number of LoC is served as something positive. I would put it strictly in "Ugly" category. I understand the business logic that says that as long as you can vibe code away from any problems, what's the point of even looking at the code.
- dkersten13 days ago
  As the saying goes:
  Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs.
  150k sounds like a lot. I do have to wonder what the program does exactly to see if that’s warranted, but it sounds bloated.
- esafak13 days ago
  Think of it as 60 man-years of work.
  - idle_zealot13 days ago
    If that's true then I can ship 60 man-years of work with
    yes 'println("a very important and useful line of code");' >> main.c
    in under a second!
    esafak12 days ago
    How is that 60 man-years of work? You are not going to replicate what the LLM generated in under a second without the LLM.
    12 days ago
    undefined
- pjmlp13 days ago
  Remember, there used to be a time programmers productivity was measured in LoC per hour.
  As such, this is high productivity! /s
  - michaelcampbell13 days ago
    > Remember, there used to be a time programmers productivity was measured in LoC per hour.
    Do you remember such a time or company? I have been developing professionally since the early 1990's (and hobbyist before then), and this "truth" has been a meme even back then.
    I'm sure it happened, but I'm not sure it was ever as widespread as this legend would make it sound.
    But, there were decades of programmers programming before I started, so maybe it just predated even me.
    pjmlp13 days ago
    I do, besides the sibling comment, there is hacker lore about these kind of issues,
    > They devised a form that each engineer was required to submit every Friday, which included a field for the number of lines of code that were written that week.
    https://www.folklore.org/Negative_2000_Lines_Of_Code.html
    kryptiskt13 days ago
    IBM had such a culture back in the day, where they feted 1 kloc/day programmers. That was what Bill Gates sneered at with the "Measuring software productivity by lines of code is like measuring progress on an airplane by how much it weighs" quote.
- njhnjhnjh13 days ago
  [flagged]
  - miningape13 days ago
    Yes, as we all know, when evaluating which programming language to use, you should get a line count of the compiler's repo. More lines = more capabilities.
    Why would I ever want a language with less capabilities?
    williamcotton13 days ago
    I mean, awk? jq? SQL?
    OoooooooO13 days ago
    APL
  - adw13 days ago
    > to the extent that our systems' world models are effectively indistinguishable from the real world.
    https://genius.com/Jorge-luis-borges-on-exactitude-in-scienc...
  - enricotr13 days ago
    'Means' according to what? Put some (laughtable) reference so I can laught louder.
  - quietbritishjim13 days ago
    Genuinely hard to tell if satire.
    Just in case not, consider whether the short function
    def is_even(x): return (x%2) == 0
    Handles a wider range of input conditions than the higher LOC function
    def is_even(x): if x == 0: return True if x == 2: return True if x == 4: return True ... return False
pmontra13 days ago
> In Elixir tests, each test runs in a database transaction that rolls back at the end. Tests run async without hitting each other. No test data persists.
And it confuses Claude.
This way of running tests is also what Rails does, and AFAIK Django too. Tests are isolated and can be run in random order. Actually, Rails randomizes the order so if the are tests that for any reason depend on the order of execution, they will eventually fail. To help debug those cases, it prints the seed and it can be used to rerun those tests deterministically, including the calls to methods returning random values.
I thought that this is how all test frameworks work in 2026.
- netghost13 days ago
  I did too, and I've had a challenging time convincing people outside of those ecosystems that this is possible, reasonable, we've been doing it for over a decade.
  - gavmor13 days ago
    Story of my life in so many dimensions.
  - njhnjhnjh13 days ago
    [flagged]
- Winsaucerer13 days ago
  Some database tests can't be done within transactions. With postgres, I create a copy of the database via WITH TEMPLATE, and each test runs in its own copy of the database. Then it can use or avoid transactions as it pleases, because the whole thing is local to that one tests anyway.
- pdimitar12 days ago
  I use Claude Opus and it gets the idea of Elixir's Ecto DB test case isolation through ultimately-rolled-back transactions just fine, FWIW.
- dnautics13 days ago
  > And it confuses Claude.
  I've never had this problem.
- whalesalad13 days ago
  This is a skill issue in prompt engineering. You explain how that works in CLAUDE.md and it is a non issue.
- 13 days ago
  undefined
- vmg1213 days ago
  Why not just write to the db? Just make every test independent, use uuids / random ids for ids.
  - mystifyingpoi13 days ago
    > Just make every test independent
    That's easier said than done. Simple example: API that returns a count of all users in the database. The obvious correct implementation that will work would be just to `select count(*) from users`. But if some other test touches users table beforehand, it won't work. There is no uuid to latch onto here.
    christophilus13 days ago
    That’s why you run each test in a transaction with proper isolation level, and don’t commit the transaction— roll it back when the test ends. No test ever interferes with another that way.
    dnautics13 days ago
    yes, Now this test also has to check that your redis-based cache is populated correctly. And/or sends stuff down your RabbitMQ/Kafka pipeline.
    pmontra13 days ago
    That looks like an integration test. A possible way to handle that scenario is to drop all the databases after it ends and create them again, or truncate all the tables or whatever it makes sense for that possible set of different data stores.
    That could run on developer machines but maybe it runs only on a CI server and developers run only unit tests.
    dnautics13 days ago
    so in elixir you can do this async alongside your unit tests.
  - 13 days ago
    undefined
  - vladraz13 days ago
    Frankly this is the better solution for async tests. If the app can handle multiple users interacting with it simultaneously, then it can handle multiple tests. If it can’t, then the dev has bigger problems.
    As for assertions, it’s not that hard to think of a better way to check if you made an insertion or not into the db without writing “assert user_count() == 0”
    MikeNotThePope13 days ago
    I don’t disagree with you, but there are diminishing returns on making your test suite complex. To make async test work properly, you need to know what you’re doing in regards to message passing, OTP, mocks, shared memory, blah blah blah. It can get really complicated, and it is still isn’t a substitute for real user traffic. You’re going to have to rely on hiring experienced Elixir developers (small talent pool), allow for long onboarding time (expensive), or provide extensive training (difficult). Personally for most cases, writing a sync test suite and just optimizing to keep it not to slow is probably more practical in the long term.
  - szundi13 days ago
    [dead]
jonator13 days ago
I can attest to everything. Using Tidewave MCP to give your agent access to the runtime via REPL is a superpower, especially with Elixir being functional. It's able to proactively debug and get runtime feedback on your modular code as it's being written. It can also access the DB via your ORM Ecto modules. It's a perfect fit and incredibly productive workflow.
- ogig13 days ago
  Some MCP's do give the models superpowers. Adding playwright MCP changed my CC from mediocre frontend skills, to really really good. Also, it gives CC a way to check what it's done, and many times correct obvious errors before coming back at you. Big leap.
- ch4s313 days ago
  Which models are you using? I’ve had mixed luck with GPT 5.2.
  - barkerja13 days ago
    Opus 4.5 with Elixir has been remarkably good for me. I've been writing Elixir in production since ~2018 and it continues to amaze me at the quality of code it produces.
    I've been tweaking my skills to avoid nested cases, better use of with/do to control flow, good contexts, etc.
    ch4s313 days ago
    I'll have to check it out. I've found GPT to be adequate at producing running code that I can improve either by hand, or very specific prompting.
    What does your workflow look like?
    barkerja12 days ago
    I don't have a fancy workflow per se, but I have started leaning into git workspaces more which has really been a boon with Elixir (especially in large projects where compile times can be in the many tens of seconds).
  - jonator13 days ago
    I've been using Opus 4.5 via Claude Code
- manmal13 days ago
  Is an MCP really required for this?
  - dnautics13 days ago
    sure, you could in principle write a script that calls into the running vm, executes code, and just make this a text-based command attached to a script + skill.
    6 of one one-half dozen of the other.
    At the point where you have a phoenix project in dev, you're already exposing an http endpoint, so the infra to not have to do a full on "attach to the VM and do RPCs" is nice, and you just pull tidewave in as a single dependency, instead of downloading a bunch of scripts, etc.
botacode13 days ago
Great article that concretizes a lot of intuitions I've had while vibe coding in Elixir.
We don't 100% AI it but this very much matches our experience, especially the bits about defensiveness.
Going to do some testing this week to see if a better agents file can't improve some of the author's testing struggles.
tossandthrow13 days ago
It seems like the 100% vibe coded is an exaggeration given that Claude fails at certain tasks.
The new generation of code assistants are great. But when I dogmatically try to only let the AI work on a project it usually fails and shots itself in its proverbial feet.
If this is indeed 100% vibe coded, then there is some magic I would love to learn!
- dnautics13 days ago
  I think by 100% vibe coded most people on hn mean that 100% of the code is written not by hand. The hand only does the delete key and prompting. We're mostly not talking about amateurs with no CS background just prompting and shitting out software with all sorts of bugs they would never be able to see.
- ogig13 days ago
  My last two projects have been 100% coded using Claude, and one has certain complexity. I don't think there is coming back for me.
  - tossandthrow13 days ago
    What is your secret sauce? How do you organize your project?
    ogig13 days ago
    I decided to really learn what is going on, started with: https://karpathy.ai/zero-to-hero.html That give a useful background into understanding what the tool can do, what context is, and how models are post trained. Context management is an important concept. Then I gave a shot to several tools, including copilot and gemini, but followed the general advice to use Claude Code. It's way better that the rest at the moment. And then I dive deep into Claude Code documentation and different youtube videos, there is plenty of good content out there. There are some ways to customize and increase the determinism of the process by using the tools properly.
    Overall my process is, define a broad spec, including architecture. Heavy usage of standard libraries and frameworks is very helpful, also typed languages. Create skills according to your needs, and use MCP to give CC a feedback mechanism, playwright is a must for web development.
    After the environment and initial seed is in place in the form of a clear spec, it's process of iteration via conversation. My session tend to go "Lets implement X, plan it", CC offers a few route, I pick what makes most sense, or on occasions I need to explain the route I want to take. After the feature is implemented we go into a cleanup phase, we check if anything might be going out of hand, recheck security stuff, and create testing. Repeat. Pick small battles, instead of huge features. I'm doing quite a lot of hand handling at the moment, saying a lots of "no", but the process is on another level with what I was doing before, and the speed I can get features out is insane.
    tossandthrow12 days ago
    Thanks! Very valuable insights.
    I have been through Karpathy's work - however, I don't find that it helps with large scale development.
    Your tactics work successfully for me at smalle scale (at around 10klocs, etc) and starts to break down - especially when refactorings are involved.
    Refactoring happens when I see that the LLM is stumbling over it's own decisions _and_ when I get a new idea. So the ability to refactor is a hard requirement.
    Alternatively refactoring could be achieved by starting over? But I do have a hard time accepting that idea for projects > 100klocs.
    nprateem13 days ago
    It is until it's not. That's the problem. The AI gets tripped up at some point, starts frigging tests instead of actually fixing bugs, starts looping then after several hours says it's not possible. If you're lucky.
    Then on average your velocity is little better than if you just did it all by hand.
    ogig13 days ago
    The AI gets tripped phenomenon is something I've experienced, and I think it's again related to context usage. Using more agents and skills will reduce the pollution on the main context, and delay the moment where things go weird. /clear after each small mission. As said above, CC needs heavy guidance, but even with these issues, I'm way faster.
dnautics13 days ago
ok, so im "vibe-" building out my company's lab notebook in elixir ahead of the first funding check coming in.
im doing some heavy duty shit, almost everything is routed through a custom CQRS-style events table before rollup into the db tables (the events are sequentially hashed for lab notebook integrity). editing is done through a custom implementation of quill js's delta OT. 100% of my tests are async.
I've never once run into the ecto issues mentioned.
I haven't had issues with genservers (but i have none* in my project).
claude knows oban really well. Honestly I was always afraid to use oban until claude just suggesting "let's use oban" gave me the courage. I'll be sending Parker and Shannon a first check when the startup's check comes in.
article is absolutely spot on on everything else. I think at this point what I've built in a month-ish would have taken me years to build out by myself.
biggest annoyance is the over-defensiveness mentioned, and that Claude keeps trying to use Jason instead of JSON. Also, Claude has some bad habits around aliases that it does even though it's pretty explicitly mentioned in CLAUDE.md, other annoying things like doing `case functioncall() do nil -> ... end` instead of `if var = functioncall() do else`
*none that are written, except liveviews, and one ETS table cache.
[0] CQRS library: https://hexdocs.pm/spector/Spector.html
[1] Quill impl: https://hexdocs.pm/otzel/Otzel.html
phplovesong13 days ago
"It writes 100% of our code"
- Silently closes the tab, and makes a remark to avoid given software at any cost.
- Ronsenshi13 days ago
  You're not missing much. Seems to me like they wrote 150k lines of code for some glorified photo app with ChatGPT in the backend for image processing. Oh and some note-taking it seems.
  - timacles13 days ago
    I await (also doubt) the day this produces something truly useful and not just generic derivative functionality glued together
  - andnand12 days ago
    I was confused by this. Watching the demos on their page, it looks extremely slow. And it just does some image recognition to fill a form?
deadbabe13 days ago
Everyone always ends these articles with “I expect it will get better”
What if it doesnt? What if LLMs just stay mostly the same level of usefulness they are now, but the costs continue to rise as subsidization wears off?
Is it still worth it? Maybe, but not worth abandoning having actual knowledge of what you’re doing.
- CyberDildonics13 days ago
  If it's not working now when extravagant amounts of money are being put into it, it might be time to just accept what it is and work around that instead of keeping all the grand predictions.
  Anyone can sell the future.
  - deadbabe13 days ago
    All the money in the present has been taken, you can now only make money from the future.
- solumunus13 days ago
  I expect the costs at source will go down even if model performance doesn’t improve much, and hopefully that will offset the unraveling of subsidisation. I’d be happy enough with that outcome, I don’t really need them to be any better although of course it would be nice. I would love for them to be faster and cheaper.
logicprog13 days ago
It's interesting that Claude is able to effectively write Elixir, even if it isn't super idiomatic without established styles in the codebase, considering Elixir is a pretty niche and relatively recent language.
What I'd really like to see though is experiments on whether you can few shot prompt an AI to in-context-learn a new language with any level of success.
- dnautics13 days ago
  I gave a talk about this. Without evidence, I suspect it's due to the "poisoning" phenomenon, only a few examples (~250 IIRC) is enough to push the needle, seemingly independent of LLM parameter count. Elixir has some really high quality examples available so, there is likely a "positive poisoning" effect.
  - logicprog13 days ago
    I'd love to see that talk!
    dnautics8 days ago
    https://m.youtube.com/watch?v=YZa5GqrZeao
- d3ckard13 days ago
  I would argue effectiveness point.
  It's certainly helpful, but has a tendency to go for very non idiomatic patterns (like using exceptions for control flow).
  Plus, it has issues which I assume are the effect of reinforcement learning - it struggles with letting things crash and tends to silence things that should never fail silently.
  - troupo13 days ago
    > has a tendency to go for very non idiomatic patterns (like using exceptions for control flow).
    It tends to always write Java even if it's Elixir. Usage rules help: https://hexdocs.pm/usage_rules/readme.html
- ch4s313 days ago
  You can accurately describe elixir syntax in a few paragraphs, and the semantics are pretty straightforward. I’d imagine doing complex supervision trees falls flat.
- majoe13 days ago
  I tried different LLMs with various languages so far: Python, C++, Julia, Elixir and JavaScript.
  The SOTA models come do a great job for all of them, but if I had to rank the capabilities for each language it would look like this:
  JavaScript, Julia > Elixir > Python > C++
  That's just a sample size of one, but I suspect, that for all but the most esoteric programming languages there is more than enough code in the training data.
  - ogig13 days ago
    I've used CC with TypeScript, JavaScript and Python. Imo TypeScript gives best results. Many times CC will be alerted and act based on the TypeScript compile process, another useful layer in it's context.
- dist-epoch13 days ago
  Unless that new language has truly esoteric concepts, it's trivial to pattern-match it to regular programming constructs (loops, functions, ...)
te_chris13 days ago
The imperative thing is so frustrating. Even the latest models still write elixir like a JS developer, checking nils, maybe_do_blah helper functions everywhere. 30 lines when 8 would do.
- cpursley13 days ago
  Try these:
  - https://github.com/agoodway/.claude/blob/main/skills/elixir-...
  - https://github.com/agoodway/.claude/blob/main/agents/elixir-...
  - https://github.com/agoodway/.claude/blob/main/agents/elixir-...
  Getting pretty good results so far.
  - simmanian13 days ago
    Haven't used skills so far -- do you simply store them in your skills directory and have them automatically get used or do you have to specify one of the skills every time?
    cpursley13 days ago
    Yes regarding directory. They merged the concept of slash commands so I often do /elixir-genius to force it. Or if I just need subagents tell it to use "elixir-expert" or "elixir-qa" in parallel with other appropriate subagents. Also helps to put a mention in the Claude.md file.
  - barkerja13 days ago
    These should get added to https://skills.sh/?q=elixir
    cpursley12 days ago
    https://skills.sh/agoodway/.claude/elixir-genius
- njhnjhnjh13 days ago
  [flagged]
  - ahub13 days ago
    Are you trolling ?
Sharlin13 days ago
I don’t understand how the author can simultaneously argue that Claude is great at Elixir because it’s a small language with only one paradigm, and also that Claude is bad at Elixir, spewing out non-idiomatic code that makes little sense in the functional paradigm?
- davidclark13 days ago
  The secret is that the author is also Claude.
epolanski13 days ago
I'm a bit lost on few bad and ugly points.
They could've been sorted with precise context injection of claude.md files and/or dedicated subagents, no?
My experience using Claude suggests you should spend a good amount of time scaffolding its instructions in documents it can follow and refer to if you don't want it to end in the same loops over and over.
Author hasn't written on whether this was tried.
calvinmorrison13 days ago
I dont know erlang. My hobby LLM project is having it write a fully featured ERP in Erlang.
An ERP is practically an OS.
It now has
- pluggable modules with a core system - Users/Roles/ACLs/etc. - an event system (IE so we can roll up Sales Order journal entries into the G/L) - G/L, SO, AR, AP - rollback/retries on transactions
i havent written a line of code
andnand12 days ago
Im so confused by this. Watching the demos it seems like they just do some OCR to fill in a form? Then grab an email and linkedin url? The demo takes ~10s to do that.
alecco13 days ago
Async or mildly complex thread stuff is like kryptonite for LLMs.
- catlifeonmars13 days ago
  Also for humans.
  - omnicognate13 days ago
    [flagged]
13 days ago
undefined