No LLM Code in Dependencies(joeyh.name)

108 pointsby edward9 hours ago15 comments

StableAlkyne5 hours ago
Clicking through to https://git-annex.branchable.com/no_llm_code/
It looks like git after 2.22 was dropped because it took an LLM commit. Same with ghc.
If I have to choose between this or git and the latest ghc, I think I'm going to just wait for someone to fork annex.
I don't even feel strongly one way or the other on AI stuff; pragmatically, I'm just not going to stop using the most widely used version controller, or Haskell, just for some guy's (forkable, AGPL licensed) hobby project.
- remywang4 hours ago
  > This will probably prevent git-annex from taking advantage of most new improvements to the Haskell language going forward. That is deeply unfortunate. This is the main reason why git-annex is not guaranteed to never change to depend on LLM generated code, because cutting it off from all future Haskell language improvements may be worse than the alternative.
  Looks like they are aware, and git-annex has been around for decades written by one of the best Haskellers. “Some guys hobby project” is not fair
- pseudalopex4 hours ago
  They said the non LLM dependency build was not default and could become untenable.
  They said git-annex supports git back to 2.22. Not git after 2.22 was dropped.
  An incompatible change in ghc would break compilation of other software also.
- zahlman4 hours ago
  TFA is about the dependencies of this project. How does that prevent you from using those things yourself?
hypfer2 hours ago
What confuses me about this stance is that LLMs are basically indistinguishable from any mid-to-low-tier dev.
And those we've let into our codebases with no concerns. Hell, some even threw parties inviting in more of them.
At least LLMs don't call HR on you when you rightfully tell them that they're full of shit. Though.. well. Claude probably might.
- lmkg2 hours ago
  Godot's recent announcement spelled something out clearly: when a mid-tier rando contributes, you can provide feedback to that person and possibly help them grow into being a senior contributor or even a maintainer. That possibility of helping the human behind the code is part of the motivation for doing open-source. Mentoring shitty devs is itself giving back to the community, in a different form than the code itself is. And that is qualitatively different than giving feedback to an LLM.
  - pooploop64an hour ago
    I think I'm also going to refer people to the Godot foundation's statement on this from now on. Too often people try to lay it out like a moral conundrum, or some kind of purity test for "real" programmers vs larpers, but that's all just lips flapping. Meanwhile there are real-life practical consequences that follow taking AI contributions, and the Godot foundation has done a great job articulating what those things are. It's very nice for there to be a voice like saying these things.
  - hypfer2 hours ago
    That is a good point indeed.
    I am wondering though if that was really the world we were living in just before chatGPT launched, given that the whole OSS thing was already harvested super hard.
    The "mentoring opportunities" often were just extracting free consulting out of experts + building a portfolio for getting hired by big tech.
    Would we really want to go back to that?
    So I agree with the idea but only in a vacuum, I think.
- bigstrat20032 hours ago
  LLMs are worse at programming than any dev I've ever worked with. Yes, even $latest_model. They have no understanding or ability to reason, and they make mistakes no human would make. They are, in short, bad at programming.
  - sscaryterry2 hours ago
    You've not worked with average developers then, or this is a purely reactionary/emotional statement.
  - hypfer2 hours ago
    I know what you mean regarding the class of mistakes, but strong disagree on the "worse" part.
    Or rather I envy you for your experience with humans so far.
- gspr2 hours ago
  > What confuses me about this stance is that LLMs are basically indistinguishable from any mid-to-low-tier dev.
  I disagree. Behind an LLM sits a developer. They steer the LLM. For them, directions to the LLM is the preferred form of modification of the software. The output of the LLM is not a preferred form anymore. This poses a huge problem for free software, especially when the LLM that translates preferred form into "source code" is not FOSS.
  The low-tier dev was not used in this way.
- sscaryterry2 hours ago
  This. So many assumptions. If you disclose you used an LLM, it is immediately assumed all of it is done by an LLM.
  If there is a bug, its because you are a lazy piece of shit, not because humans make mistakes, and you missed it. It is branded slop.
  We're living in interesting times, socially, OSS will die because of this.
  Contributors are dwindling, and will continue to do so. If you want to play in your sandbox, please do. Don't open-source, keep it to yourself.
  - mcculleyan hour ago
    > OSS will die because of this
    OSS will not die.
  - gspr2 hours ago
    I think you're wrong. And I think that FOSS is our last best hope to keep software under the control of the individual.
    The sloppers are diving head-first into a world where not knowing how a basic idea translates to code is embraced. This is not true of every slopper, but it is true of enough that sloppers are a threat.
    sscaryterry2 hours ago
    I hear you, but again there are a lot of assumptions in this statement: "sloppers are diving head-first into a world where not knowing".
    The problem is you've redefined LLM-coding as slopping. "This is not true of every slopper".
    lsaferite28 minutes ago
    I find your comment here interesting. The parent never called out LLM-coding, they said "sloppers". If we take that choice of word as deliberate, it stands to reason there's a distinction there between "sloppers" and LLM assisted coding in general. You quoting "This is not true of every slopper" as proof they are equating the two seems like a weakly defended assertion. It's entirely possible there are 3 broad classes of LLM users in the parent's explicit and implicit beliefs. The thing is, you don't know any more than I know. You are attributing a held belief to someone that you inferred from incomplete information. That being said, if you based your assertion on external, unreferenced knowledge, then you could potentially know they hold that belief.
    I'd venture to say that a large number of developers are using LLM tooling at this point. Not all of those developers are out there generating massive, poorly engineered PRs and wasting project maintainer time. For me there are at least those 3 broad categories of user of LLMs for software development, maybe more if I sat and thought about it for a while.
    sscaryterry25 minutes ago
    The article is about LLM code. I’m sure you can condense the many lines into less than 5 lines. I’m not sure what you are trying to say.
jsnell4 hours ago
It's nicely symmetrical, because conversely I prefer my LLM-generated code to have no dependencies.
- chollida13 hours ago
  > It's nicely symmetrical, because conversely I prefer my LLM-generated code to have no dependencies.
  How do you get your code to the point where it has no dependencies? How do you do any sort of database writing without a library, or web access without sockets from an os library?
  What sort of code has no dependencies? I'm now very curious as I can't see how you can do anything without altest including the std lib from your OS to do any file i/o.
  - 3 hours ago
    undefined
  - tayo423 hours ago
    Write assembly to do the syscall instruction with whatever params you need.
    jaggederest3 hours ago
    Relevant, reader mode recommended: https://www.ee.torontomu.ca/~elf/hack/recovery.html
- nancyminusone3 hours ago
  You mean aside from previous work it was trained on?
  - 20k3 hours ago
    Dependencies: stolen from all code ever written without permission, including extremely illegal content
    But other than that, totally dependency free!
  - moffkalast3 hours ago
    Beats copy pasting from stackoverflow and calling it yours.
    NewJazz2 hours ago
    Does it? Seems roughly equivalent. At least with SO there is a clear problem and solution being solved.
    moffkalastan hour ago
    Well one is silent and the other has a big "Co-authored by Claude" label on it, so it's at least more visible.
    With SO there's an unclear problem and a closed as duplicate being served if we're being real.
    sscaryterry2 hours ago
    Mic drop
- moffkalast3 hours ago
  It's March 18th, 2087, npm and conda are considered crimes against humanity in 23 countries...
- 12hasgt3 hours ago
  It isn't your code, it is stolen.
  - sscaryterry2 hours ago
    I guess code you get from a book too? Or learning it, and typing it out after you've mastered a particular algorithm?
    (FYI I'm not disputing that the LLM vendors didn't steal, that doesn't mean the technology is shit)
- peteforde4 hours ago
  Bingo.
  These days, my only deps are TinyUSB and LVGL - stuff that would be completely pointless and absurd to recreate.
bwestergard3 hours ago
Git annex is a remarkable piece of software and I've been inspired by lead developer joeyh's approach to both FOSS and life. For example:
https://joeyh.name/offgrid/
- jaapz2 hours ago
  I am unreasonably irritated by their use of kw instead of kW
neutrinobro5 hours ago
Was this done by manually reviewing commit messages? I think it would be interesting/useful to have a tool that could use some basic heuristics about LLM generated code to detect code-blobs even if they are not explicitly called out in a commit message.
- jonathrg4 hours ago
  The diff of the linked commit in git is completely trivial, clearly it just got tagged because of the signoff in the commit message: https://github.com/git/git/commit/d7971544fe17378f44f4998301...
  I would be surprised if there is no LLM-assisted code in there prior to this commit, this is just the first where the author chose to disclose it.
- wrs3 hours ago
  Apparently, though not very carefully. The "particularly large LLM generated code churn" in the ram library, for example, is the LLM being used to simply git-revert a change that was not originally done by an LLM.
- dijksterhuis5 hours ago
  when i was reading this i thought of writing some quick and dirty cli tool that checks commit co-authors. wouldn't be perfect, but would eliminate a good chunk of low hanging fruit.
- api5 hours ago
  Just like with writing, any kind of AI detection is going to be inaccurate to the point of snake oil.
  LLM detection in writing is basically today's polygraph test pseudoscience. There was a blog a while ago where someone fed classic literature into one and it was detected as probably AI.
  - neutrinobro5 hours ago
    I'm not sure that is the case in this instance. Certainly general writing is a lot more variable and harder to classify, and on the other extreme certain one-line code changes don't have enough information to say anything. However, a blob with a 500+ line code change and 200+ lines of comments is a dead ringer for some of the current class of LLMs. That isn't to say it this behavior couldn't be obfuscated, but some basic categorization could probably separate the majority of human authored commits vs. AI commits. Heck, you could probably train an AI to detect commit-style just by using pre-2022 code archives and existing known-to-be-AI edits/commits.
  - zahlman4 hours ago
    The heuristics that would be used to "detect AI" here would be things that shouldn't be happening anyway, so false positives wouldn't matter.
  - perrygeo4 hours ago
    It's not just "the code itself looks LLM generated" - it's also LOC/hr by a particular author which suggests vibe coding. You could look at the author's github contributions to identify time periods when the author was generating code at super-human speeds. Combine the two signals and you might get something better than a pseudoscience?
  - verdverm5 hours ago
    An agent doesn't have to be perfect to be useful. If it can find clear examples of stuff you don't want to see in a (potential) dependency quickly, that will save you time. Give it search tools and some policies, then have it go find things. You then check them out, ask followups.
    Agents as a super powered (re)search assistant is underrated.
InTheArena3 hours ago
This is completely infeasible in the age of mythos. The reality is that the velocity is just not going to feasible from a security PoV without leveraging these tools.
- 20k3 hours ago
  Analysing codebases with LLMs to find security vulnerabilities is completely unrelated to committing code generated with LLMs
  - alchemism3 hours ago
    It's a fair comparison. There's a fair amount of plausible-sounding bullshit being peddled as a transparent advertisement for an ai-driven "code security" firm.
  - bsamuels2 hours ago
    and how do you propose fixing the hundreds, if not thousands, of valid, impactful security bugs that frontier models will find?
    gspr2 hours ago
    If you can't fix them without LLMs, then you can't fix them. You probably shouldn't be trusted with maintaining the codebase in the first place.
    simonw2 hours ago
    How about if you don't have time to fix them without LLMs?
    gspr2 hours ago
    Then you don't have time to maintain the codebase. Sad, but sometimes true.
    sscaryterry2 hours ago
    When given the choice between putting food on the table, and being a purist, I'd take some bread. It is hard out there.
    simonw2 hours ago
    Welcome to volunteer-driven open source.
    (Update: you're a Debian developer so you're even more familiar with how that world works than I am.)
    slopinthebag2 hours ago
    That seems like an unfounded assumption. Why should one assume that Git Annex has hundreds or thousands of critical, exploitable security vulnerabilities?
    bsamuels2 hours ago
    This isn't a problem that is isolated to Git Annex. There are many maintainers out there taking anti-LLM stances, and you don't have to look very far to find OSS projects drowning from the wave of bugs.
    https://daniel.haxx.se/blog/2026/05/26/the-pressure/
    slopinthebag2 hours ago
    Wave of bug reports which is quite a different thing.
    If you aren’t happy with their stance towards LLMs you can fork and fix yourself if you feel it’s necessary.
- moffkalast3 hours ago
  In ten years we'll look at human written code like the unreliable garbage it is, and never rely on anything that wasn't at least seriously looked over by an LLM. It won't be even close.
  - KronisLV2 hours ago
    > never rely on anything that wasn't at least seriously looked over by an LLM
    I can imagine LLMs becoming a mainstay, but what you are describing isn't wholly different from sufficiently advanced static code analysis - where you'd want more determinism than most LLMs normally provide.
    The problem is that such a thing might take a decade and billions of dollars of investments to create per-language (e.g. actually useful code analysis for Java, for Spring Boot, for processing and validating form data, and DB schemas and document processing and rendering reports etc., literal domain checks for anything and everything that is common across various enterprises) so nobody wants to do that, so it's easier to throw LLMs at it and call it good enough.
    moffkalastan hour ago
    I remember back in the pre-2023 days where SonarQube was a big deal for Java static analysis, and I let it rip across an entire 120k line project at one point upon which it found something like seven issues, out of which only one or two were actual bugs. It was almost entirely useless. I think even Qwen would've done leagues better today.
    Most bugs are far too nuanced to be caught by static analysis imo, you do need to actually understand what's going on in the program, the intent, the environment, etc. instead of blindly verifying if everything technically checks out, compilers already do a perfect job at that.
  - hypfer2 hours ago
    Yes, the same way 10 years from now-10years, we'd all be looking back at how insane it was for people to drive cars.
    Man do I enjoy my totally real full self driving.
    BurritoKingan hour ago
    I find this attitude really weird. I just checked and I have 8,793 miles driven on my car and of that I'd say ~8750 of them were done by fsd (self driving). These days most of my interventions are to pick a different parking spot, and I can't remember the last time I had a serious disengagement, it's always just me wanting to drive a different route, park in a different location, or sometimes to handle things like car washes and the like.
    For me, for all intents and purposes, self driving is here today.
    karahime2 hours ago
    You're being sarcastic, but I do enjoy it. I just took a Waymo recently and it was thrilling, it felt great to feel the wind and the sun, listen to music, and get where I was going without having to drive there. I still like driving, obviously, but being able to decide one or the other is wonderful.
  - gspr2 hours ago
    Alternate take: in ten years we'll be pulling our hair out cursing at the world over how we could possibly accept "10k lines added, 8k lines removed" as the normal everyday churn of software development. We'll curse the morons who gave up understanding our own code.
bitbasher3 hours ago
I agree we need to address the elephant in the room, but our community is about as polarized as politics in America.
- tumetab12 hours ago
  Which elephant?
  - kzrdude2 hours ago
    I think the copyright and license question is one of the elephants in the room that hasn't had a satisfying conclusion. It's very important to have a clear idea of this for the open source movement.
periodjet17 minutes ago
These arguments are increasingly smelling strawman-ish to me. The authors seem to pick the absolute worst possible examples of LLM usage in software development, ignoring the fact that it is ultimately just a tool, and that all the blame for a shitty product continues to lie at the feet of the one wielding it like a giant doofus.
kstenerud4 hours ago
This is a hill many people will choose to die on.
And they shan't be missed.
- tuvix4 hours ago
  They will absolutely be missed, maybe not by any individual but the impact of them leaving will be felt. People willing to go to bat for code quality and who are also careful about copyright and the community aspect of open source is why this whole thing worked in the first place.
  - kstenerud4 hours ago
    Copyright won't be a problem. There's enough big business wrapped up in AI usage that the laws will bend towards them. Code quality and community don't die just because people haven't quite figured out how to use the new tools properly yet; quality merely dips for awhile, and the community continues as before. We survived PHP. We can survive this.
    theLiminator3 hours ago
    If anything I think discipline and rigor will go up.
    I think it will force us to adopt stronger type systems, formal methods, and more automated verification.
  - peteforde3 hours ago
    I disagree, but not in a downvote sort of way. I think your position is defensible, but there is a valid second perspective.
    The sorts of folks who "won't be missed" put pedantry over productivity. To paint with a very broad brush, it's been my experience that they also tend to be stubborn and frustrating team members who don't understand that there's a time to debate and the rest of the time is for shipping.
  - kordlessagain4 hours ago
    5 years ago, I would agree with you. But when you go ALL IN on LLM development, and use annealing with multi-agent harnesses, these issues disappear. One caveat: I build everything off other things that originated with my own hand written code. Auth for my site, for example. Also, most of my current projects are packed with advice I've rendered to the LLM on how git commits go down and cadence of those commits into deployments. Claude Code rarely fucks this up, and has memories and plan files that it updates if we find a hole. So, I'm comfortable with an occasional hiccup in the process. It'll get caught, eventually. Maybe. ;)
    A recent analysis on my Claude Code prompts showed 1.5B input tokens over the last few months. I use 4-5 provider agents (all CLI) DAILY, so this is a small subset. I spend a lot of time using transcription services to drone on about how some agent fucked things up and how I want it fixed and how to do it.
    To assist with that process, I'm currently building out a search engine that is exposed via MCP to allow auditing of the dev runs. I already have the foundation of file changes (ala Splunk style) that let me keep an eye on the agents, and an agentic terminal that allows one agent to keep an eye on what the other agent is whacking on. Combined with my constant badgering for proper systems development, these things are improving the process at an acclerated rate.
    Look, I get being an "engineer" on these types of things, and I think there is an absolute purity in pushing LLM generated code out of a codebase you control. That said, that's not the ONLY way to do things, and your milage will vary based on your systems thinking hat. I prefer to push hard on getting the outcomes and sacrifice the exhaustive process of reviewing every single line of code.
    Consider frameworks. They make things easier to do, if they are complete and stable. There's an argument here that LLM harnesses should probably not ALSO be maintained by LLMs (something I'm completely ignoring so probably ironic I'm mentioning it). But the point being is the harnesses SHOULD have eyes on most lines of code. Eyes on every package though? Hard to say. I've settled on doing most stuff in Rust nowadays, just because it keeps the LLM more honest. And, we can build most "packages" by hand so we can change them to match our outcomes without code bloat. By bitching at it about code refactoring constantly, annealing the codebase by high level overview, not exhaustive review, I've found things get easier to work on as I go and still stay sane.
    I do catch the LLMs occasionally hard coding things that belong in their own file or configs, and am a hardass about that and file length. I do read some code and hate it being overly long (and it sucks for burning tokens).
    FWIW, I typed all this out on my keyboard myself. However, if I ran it through an LLM for cleanup or whatever, the very wall of text itself helps FORCE the LLM to stick to the substantive argument and steers it away from slop prompts. The same applies to code, if you are careful.
- jurgenaut233 hours ago
  The fact that you think that way is probably because they have something that they care enough about to go to such extremes. I think they deserve a lot of admiration.
- pull_my_finger4 hours ago
  Ethically, selling code or programs built on other peoples code without consent is wrong.
  Legally, it's probably also unlawful, unless you believe that smoke they're selling that it was trained on code that was open licensed or in the public domain.
  Professionally, it's a poor choice to ship code that wasn't produced with human care and consideration or even thorough oversight or understanding based on recent trends.
  Software developers like to call themselves "engineers", but more and more they're showing they're more than happy to be configurators of black boxes of modular software. Whether that means pulling random NPM packages with thousands of other random packages as dependencies (none of which are even browsed or licenses checked), or "vibe coding" slop the LLM spits out.
  When the main problem was people assembling random packages, I always likened it to "sandwich artists" at Subway. They just stand behind the counter and configure the product of random combinations of ingredients (someone else's NPM packages). Now it's like they can't even see the selection of ingredients, they just grab handfuls and shove it together until they get something sandwich shaped. Bad times in software.
  - scotty792 hours ago
    Most software of the future will have userbase of 1.
    You won't be selling software. You'll be selling a service of assisting someone so they can build software for themselves.
- slopinthebag3 hours ago
  They will be missed. The people who won’t be missed are those who delegate their thinking and knowledge building to LLMs. They’re already obsolete.
- LandoCalrissian4 hours ago
  Ah yes, open source will be better with less people who can actually write code.
- bigstrat20032 hours ago
  If all the people who actually know how to program and care about quality get pushed out by the AI bros, software will collapse. I'll certainly miss them when that happens.
porphyra2 hours ago
How come all the open source projects are fretting over the copyright status of LLM code but big companies are just vibe coding slop all day for their internal closed source projects without a care in the world?
- dehugger2 hours ago
  Risk exposure of "internal closed source" vs "open source". No one (external) cares nor can inspect a companies pile of internal utilities and code. As long as the code works than there's no problems.
  Everyone and their cat can look at open source projects, which can and will result in being called out publicly. This can also have legal ramifications on the project itself.
- scotty792 hours ago
  Because open source community is idealistic and corporations are pragmatic.
skybrian5 hours ago
Maybe an LLM could be used to check for this :)
verdverm5 hours ago
We are all figuring this new technology out and people will make mistakes. Would seem overreactionary to swear things off completely because of a single commit and reversion. Look for patterns in dependencies and your own work.
bioninf_n_door2 hours ago
[flagged]
botfriendsarent5 hours ago
I think this is a fair and normal reaction to AI slop. Alot of work though. I think OSS projects are at serious risk of implosion due to the vigilance required which honestly may end up being a fool's errand anyway.
But maybe we are thinking about it backward. Have you ever wondered why there is so much "free software"? Beware of strangers bearing gifts.
I have always wondered and been suspicious of people who are so eager for you to use their software. Which isnt to say OSS isnt high quality. Im just saying that maybe when people are pushing free software on you they are kind of in it for themselves.
As for whats next, me personally, last year I pulled all my personal repos about 80 of them off of bitbucket and self host that all now. I think OSS projects should setup a paywall and charge money to create PRs.
Like 10-100 bucks per PR to cover the cost of the extra vigilance. Also I could see migrations away from github, to AI free dependency hosting or something like that. Its an interesting challenge. But its not insurmountable.
Either paywall OSS projects or take them off the interwebs. Also one option the OP didnt explore I dont think is forking and freezing the dependencies. Huge maintenance burden, but its better than source corruption.
Also use fewer dependencies. Maybe set a limit of 5.
- haywalk3 hours ago
  > when people are pushing free software on you they are kind of in it for themselves
  I strongly disagree with this. The free (as in both freedom and as in free beer) software movement was to provide an alternative to proprietary and closed-source software, which is developed by people and corporations who are openly in it for themselves.
  > Like 10-100 bucks per PR to cover the cost of the extra vigilance. Also I could see migrations away from github, to AI free dependency hosting or something like that. Its an interesting challenge. But its not insurmountable.
  You could just leave your project where it's at, keep it open source, and simply not accept outside contributions. Lots of open source software operates this way. The Ladybird browser notably switched to this model recently as a reaction to AI pull requests.
- scotty792 hours ago
  Paywalling contributions is an interesting idea. You could auction maintainer capacity. I'm super cutious how much people would pay to get their code into a popular oss project.
gravatron3 hours ago
funny enough if you spent just a few minutes with a LLM working on the design of your website it wouldn't look like complete shit.
- 3 hours ago
  undefined
- NooneAtAll33 hours ago
  wdym?
  looks like a normal html-only website to me
- bitbasher3 hours ago
  function > form
  - hombre_fatal2 hours ago
    If you had to pick one, sure. But it's trivial to have both which is the point of those cringey websites like http://bettermotherfuckingwebsite.com