54 pointsby pixelmonkey7 hours ago11 comments
  • cheesecompileran hour ago
    > I personally think all of this is exciting. I’m a strong supporter of putting things in the open with as little license enforcement as possible. I think society is better off when we share, and I consider the GPL to run against that spirit by restricting what can be done with it.

    I like sharing too but could permissive only licenses not backfire? GPL emerged in an era where proprietary software ruled and companies weren't incentivized to open source. GPL helped ensure software stayed open which helped it become competitive against the monopoly proprietary giants resting on their laurels. The restriction helped innovation, not the supposedly free market.

  • rzerowan40 minutes ago
    Strange this with this whole incident apart from the rewrite/LLM part is the general misundrstanding of the licences. LGPL being a pretty permissive one going as far as allowing one to incorporate it in propriety code without the linking reciprocity clause [1] and MIT is even more permissive. Importantly these were meant to protect the USER of the code.Not the Dev , or the Company or the CLA holder - the USER is primary in the FreeSoftware world.Or at least was supposed to be , OSS muddied the waters and forgetting the old lessons learned when thing were basically bigcorp vs indie hacker trying to getthir electronic device to connect to what they want to connect to and do what they need is why were here.

    Bikeshedding to eventually come full circle to understand why those decisions were made.

    In a world where the large OEMs and bigcorps are increasinly locking down firmware , bootloaders , kernels and the internet. I would think a reappraisal of more enforcement that benefits the USER is paramount.

    Instead we have devs looking to tear down the few user protections FLOSS provides and usher in a locked down hacker unfiendly future.

    [1] https://licensecheck.io/blog/lgpl-dynamic-linking

  • nomdep3 hours ago
    In this emerging reality, the whole spectrum of open-source licenses effectively collapses toward just two practical choices: release under something permissive like MIT (no real restrictions), or keep your software fully proprietary and closed.

    These are fascinating, if somewhat scary, times.

    • f33d517322 minutes ago
      I don't think it changes much about licensing in particular. People are going on about how since the AI was trained on this code, that makes it a derivative work. But it must be borne in mind that AI training doesn't usually lead to memorizing the training data, but rather learning the general patterns of it. In the case of source code, it learns how to write systems and algorithms in general, not a particular function. If you then describe an interface to it, it is applying general principles to implement that interface. Its ability to succeed in this depends primarily on the complexity of the task. If you give it the interfaces of a closed source and open sourced project of similar complexity, it will have a relatively equal time of implementing them.

      Even prior to this, relatively simple projects licensed under share alike licenses were in danger of being cloned under either proprietary or more permissive licenses. This project in particular was spared, basically because the LGPL is permissive enough that it was always easier to just comply with the license terms. A full on GPLed project like GCC isn't in danger of an AI being able to clone it anytime soon. Nevermind that it was already cloned under a more permissive license by human coders.

    • embedding-shapean hour ago
      > or keep your software fully proprietary and closed.

      I guess it depends on your intention, but eventually I'm not sure it'll even be possible to keep it "fully proprietary and closed" in the hopes of no one being able to replicate it, which seems to be the main motivation for many to go that road.

      If you're shipping something, making something available, others will be able to use it (duh) and therefore replicate it. The barrier for being able to replicate things like this either together with LLMs or letting the LLM straight it up do it themselves with the right harness, seems to get lowered real quick, massive difference in just a few years already.

    • vintagedavean hour ago
      Or GPL. Which I’m increasingly thinking is the only license. It requires sharing.

      And if anything can be reimplemented and there’s no value in the source any more, just the spec or tests, there’s no public-interest reason for any restriction other than completely free, in the GPL sense.

    • measurablefunc2 hours ago
      If you listen to the people who believe real AI is right around the corner then any software can be recreated from a detailed enough specification b/c whatever special sauce is hidden in the black box can be inferred from its outward behavior. Real AI is more brilliant than whatever algorithm you could ever think of so if the real AI can interact w/ your software then it can recreate a much better version of it w/o looking at the source code b/c it has access to whatever knowledge you had while writing the code & then some.

      I don't think real AI is around the corner but plenty of people believe it is & they also think they only need a few more data centers to make the fiction into a reality.

      • pixl97an hour ago
        Real AI will never be invented, because as AI systems become more capable we'll figure out humans weren't intelligent in the first place, therefore intelligence never existed.
        • measurablefuncan hour ago
          Don't worry, just 10 more data centers & a few more gigawatts will get you there even if the people building the data centers & powerplants are unintelligent & mindless drones. But in any event, I have no interest in religious arguments & beliefs so your time will be better spent convincing people who are looking for another religion to fill whatever void was left by secular education since such people are much more amenable to religious indoctrination & will very likely find many of your arguments much more persuasive & convincing.
      • GaggiXan hour ago
        >Real AI is more brilliant than whatever algorithm you could ever think of

        So with "Real AI" you actually mean artificial superintelligence.

        • measurablefuncan hour ago
          I wrote what I meant & meant what I wrote. You can take up your argument w/ the people who think they're working on AI by adding more data centers & more matrix multiplications to function graphs if you want to argue about marketing terms.
          • GaggiXan hour ago
            I was just thinking that calling artificial superintelligence "Real AI" was funny.
            • measurablefuncan hour ago
              Corporate marketing is very effective. I don't have as many dollars to spend on convincing people that AI is when they give me as much data as possible & the more data they give me the more "super" it gets.
      • HappyPanaceaan hour ago
        > b/c whatever special sauce is hidden in the black box can be inferred from its outward behavior.

        This is not always true, for an extreme example see Indistinguishability obfuscation.

  • erelong2 hours ago
    hopefully this continues to show how awkward the idea of "intellectual property" (IP) is until people abandon it

    IP sounds good in theory but enables things like "patent trolling" by large corps and creating all kinds of goofy barriers and arbitrary questions like we're asking about if re-implementations of ideas are "really ours"

    (maybe they were never anyone's in the first place, outside of legally created mentalities)

    ideas seem to fundamentally not operate like physical things so asserting they can be considered "property" opens the door for all kinds of absurdities like as pondered in the OP

    • AuthAuthan hour ago
      I have no data to back this up but patent trolling seems to happen far less than companies that already own significant infra/talent ripping products from smaller companies and out competing them with their scale. I'd rather have patent trolling than have Amazon manufacturer everything i launch.

      The problem with IP laws and the US is that the big companies already do what IP is suppose to protect and the US refuses to legislate effectively against them.

    • moralestapia36 minutes ago
      Is there anything you have created, spending considerable resources and time, that you ended up giving up for free? For the betterment of humanity?

      If so, let's see it!

  • cheesecompileran hour ago
    After cloning a test suite you're still left with ongoing maintenance and development, maintaining feature parity etc. There's a lot more than passing a test suite. If the rewrite is truly superior it deserves to become the new Ship of Theseus. But e.g. I doubt anyone's AI rewrites of SQLite will ever put a dent in its marketshare.
  • 7777777philan hour ago
    The legal question is a distraction. GPL was always enforced by economics: reimplementation had to cost more than compliance. At $1,100 for 94% API coverage, it doesn't. Copyleft was built for a world where clean-room rewrites were painful but they aren't anymore.
    • badc0ffee16 minutes ago
      I don't think it's been established that clean-room rewrites are no longer painful. We don't know if chardet could have been rewritten so easily if the original code wasn't in the training set.
  • 4 hours ago
    undefined
  • coldtea2 hours ago
    >Something related, but different, happened with chardet. The current maintainer reimplemented it from scratch by only pointing it to the API and the test suite.

    Only "pointing it". But the LLM, who can recite over 90% of a book in its training set verbatim *, would have also have had trained on the original code.

    Maybe "the slop of Theseus" is a better title.

    * https://the-decoder.com/researchers-extract-up-to-96-of-harr...

    • logicprog7 minutes ago
      Also from that exact same study (why not cite the actual study? It's quite readable) the LLMs couldn't recite more than a small fraction of many other books, often ones just as well known[0] — in fact, from the bar charts shown in the exact news article you cited, it's pretty clear that Sonnet 3.7 was a massive outlier, and so was Harry Potter and the Sorcerer's Stone, so it really seems to me like that's an extremely unrepresentative example, and if all the other LLMs couldn't recite even a small fraction of all the other books except that one outlier pairing, despite them being widely reproduced classics, why would we expect LLMs to actually regurgitate regularly, especially a relatively unknown open source project that probably hasn't been separately reproduced that many times?

      Not to mention the fact that, as the other commenters mention, that appears to just... not have happened at all in this case, so it's a moot point.

      [0]: https://arxiv.org/pdf/2601.02671

    • the_mitsuhiko2 hours ago
      Maybe, but the LLM did not recite the chardet source code so that argument does not appear to apply here.
      • 4star3star25 minutes ago
        I agree. If we look to music, how can a musician unhear what they've heard? We celebrate musicians when they cite their influences. In the case of a software library, it is a tool, not a work of art. Its beauty is in accomplishing a specific, useful task. If we can accept musicians drawing inspiration from all the music they've ever listened to, we should be able to do the same for software, especially when its internal code is unrecognizable from a similar tool.
      • irishcoffeean hour ago
        This whole "today" fascination with chardet is a classic example of manipulation. I suggest you disregard this term instead of defending it.
  • scuff3d3 hours ago
    The solution to this whole situation seems pretty simple to me. LLMs were trained on a giant mix of code, and it's impossible to disentangle it, but a not insignificant portion of their capabilities comes from GPL licenced code. Therefore, any codebase that uses LLM code is now GPL. You have a proprietary product? Not anymore.

    Not saying there's a legal precedent for that right now, but it's the only thing that makes any sense to me. Either that or retain the models on only MIT/similarly licenced code or code you have explicit permission to train on.

    • nkmnz2 hours ago
      What about the code that wasn't even GPL, but "all rights reserved", i.e., without any license? That's even stronger than GPL and based on your reasoning, this would mean that any code created by an LLM is not licensed to be used for anything.
    • keithnz2 hours ago
      if you train yourself by looking at GPL code then go implement your own things, is that code GPL?
      • dec0dedab0dean hour ago
        it can be, depending on if it is different enough to convince a jury that it is not a copyright violation. See the lawsuits from Marvin Gaye's family to see how that can be unpredictable.
      • AberrantJ2 hours ago
        Of course not, because everyone making these arguments wants people to have some magic sauce so they get to ignore all the rules placed on the "artificial" thing.
        • bakugo2 hours ago
          If you genuinely believe that you are not above a literal text completion algorithm and do not deserve any more rights than it, that says more about you than anything else.
      • estimator7292an hour ago
        If you copy and paste one line from a thousand different GPL projects, is the resulting program GPL?

        Let's be honest about what's happening here.

    • moralestapia39 minutes ago
      100% agree, if we are fair and honorable.

      In practice, well ... you saw what's been going on with the Epstein files, etc... we are far from being ourselves in a world that's fair and honorable.

      (I'm not condoning it, I think it's massively trashy to steal code like this then pretend you're the good guy because of some super weird mental gymnastics you're doing)

  • moralestapia5 hours ago
    >I personally have a horse in the race here because I too wanted chardet to be under a non-GPL license for many years.

    Ugh, it's so disgusting to see people who are either malicious or non mentally capable enough to understand what is the purpose of software licenses.

    "But I wish that car was free", sure pal, but it's not. Are you like, 8 years old?

    Licenses exists for a reason, which is to enforce them. When the author of a project choose a specific license s/he is making a deliberate decision. S/he wants these terms to be reigning over his/her work, in perpetuity. People who pretend they didn't see it or play dumb are in for some well-deserved figuring out.

    • the_mitsuhiko2 hours ago
      > "But I wish that car was free", sure pal, but it's not. Are you like, 8 years old?

      Just because things are not as one wants, does not stop that desire to be there.

      > When the author of a project choose a specific license s/he is making a deliberate decision.

      Potentially, potentially not. I used to release software under GPL and LGPL but changed my mind a few years after that. I did so in part because of conversations I had with others that convinced me that my values are closer aligned with permissive licenses.

      So engaging in a friendly discourse with a maintainer to ask them to relicense is a perfectly fine thing to do and an issue has been with chardet for many, many years on the license.

    • coldtea2 hours ago
      >Licenses exists for a reason

      Yes, and the choice of license for a project is made for a reason that not necessarily everybody agree with.

      And the people who don't agree, have every right to implement a similar, even file-format or API compatible, project and give it another license. Gnumeric vs Excel, for example, or forks like MariaDB and Valkey.

      But whether they do that alternative licensed project or not, it's perfectly rational, to not like the choice of license the original is in. They legally have to respect it, but that doesn't mean there's anything irational to disliking it or wishing it was changed.

      And it's not merely idle wishing: sometimes it can make the original author/vendor to reconsider and switch license. QT is a big example. Blender. Or even proprietary to open (Mozilla to MPL).

      "It's so disgusting to see people who are either malicious or non mentally capable enough to understand this"

      • moralestapiaan hour ago
        Hmm ... you don't have to ask for consent. You just slap the license you want to your code and that's it.

        It's not some sort of democracy, lol, it's a set of exclusive rights that are created the moment the work being copyrighted is produced.

        (For a quick intro I recommend: https://www.youtube.com/watch?v=bxVs7FCgOig)

        In the case of the license in question (L/GPL), it's one of the most strict ones out there, it explicitly forbids relicensing code under a different non-compatible license, like MIT; let me says that again, L/GPL EXPLICITLY FORBIDS the thing that happened here from happening.

        I sympathize with the guy that spent 12 years of his life maintaining the code, thank you for your service or something, but that does not make a difference. The wording of the (L/GPL) license is clear and the original author and most of the other 50 or so contributors did not approve of this.

        • coldtea7 minutes ago
          >Hmm ... you don't have to ask for consent

          Nobody said you have.

          >You just slap the license you want to your code and that's it.

          Nobody said you can't.

          >It's not some sort of democracy, lol

          Nobody said it is, lol.

          I'm answering to what you actually wrote, that those expressing their dislike of a project having a speicific license are "either malicious or non mentally capable enough" what licenses are for.

          That's a stupid argument putting other people down with a silly strawman.

          One can be perfecty capable to understand what licenses are for and still think a project made a mistake chosing a specific language, or want it to change to another (and sometimes, like in the examples I gave, the latter works too).

    • jimmaswell3 hours ago
      This entirely misses the point. Re-implementing code based on API surface and compatibility is established fair use if done properly (Compaq v. IBM, Google v. Oracle). There's nothing wrong with doing that if you don't like a license. What's in question is doing this with AI that may or may not have been trained on the source. In the instance in the article where the result is very different, it's probably in the clear regardless. I'm sympathetic to the author as I generally don't like GPL either outside specific cases where it works well like the Linux kernel.
      • blell3 hours ago
        This reminds me of people crying over toybox https://en.wikipedia.org/wiki/Toybox#Controversy
      • trueismywork2 hours ago
        The real test would be to see how much of generated code is similar to the old code. Because then it is still a copyright. Just becsuse you drew mickey mouse from memory doesnt above you if it looks close enough to original hickey mouse.
        • the_mitsuhikoan hour ago
          > The real test would be to see how much of generated code is similar to the old code.

          I have looked at the project earlier today there is effectively no resemblance other than the public API.