AI is just unauthorised plagiarism at a bigger scale(axelk.ee)

547 pointsby speckx3 hours ago95 comments

danorama14 minutes ago
There’s a fallacy that gets used a whole lot to justify things like this (not just with LLMs), and I see it in many of the comments here: If it’s OK (or at least negligible on a small scale), then it must be OK on a large scale.
It usually goes something like: If I can make money by learning something from a web page, why does a computer making money by learning everything from everyone upset people so? It’s the same thing!
It’s like if I go to Golden Gate Park and pick one flower, I shouldn’t do that, but no one cares. But if I build a machine to automatically cut every flower in the park because I want to sell them, that’s different.
“You say I can pick one flower, but you get upset when I take a bunch. That’s inconsistent. Check and mate.”
But quantitative changes in an activity produce qualitative changes. Everyone knows this, but sometimes they seem to find it inconvenient to admit it. Not that effects of the qualitative change are always bad, but they are often different, and worth considering rather than dismissing.
- inetknght9 minutes ago
  If one person is murdered, that's bad. If a million people are murdered, that's war.
  If one word is stolen by AI, that's bad. If a million words are stolen by AI, that's business.
dvduval2 hours ago
The broader problem of original sources not being given credit in a way that rewards them remains. Websites owners are paying to host their content so that spiders can come and crawl them and index it into the AI and then if they’re lucky, they might get a citation, but otherwise there’s very little reward for being a provider of content. And of course, this is something that’s getting worse and worse. Why look at a website when it’s all in AI? And then the counter to that is maybe we need to start closing the website to crawlers and put everything behind a login.
- Ensorceled2 hours ago
  Worse, the constant AI scraping is actually costing content providers additional money for no return. At least Google/Bing/Yahoo scraping would then be used to provide links back to your content.
  - bolangian hour ago
    Not only costing money. Constant AI scraping constitutes a denial-of-service attack that has brought down websites.
  - fiedziaan hour ago
    > At least Google/Bing/Yahoo scraping would then be used to provide links back
    That doesn't work anymore. Google provides AI generated summary, nobody looks at the original site.
- motbus32 hours ago
  About a year ago OpenAI crawled and go DDOS level the company I work. Even despite the robots.txt not allowing it, and despite some recaptcha we could assemble in time.
  We found our data in the outputs of their models but who can do anything about it...
  - kibwen2 hours ago
    > We found our data in the outputs of their models but who can do anything about it...
    If the crawlers refuse to voluntarily respect your robots.txt, then you are well within your rights to poison their data.
    hajilean hour ago
    robots.txt seems like it should be a legally-binding terms of service which would make them outright copyright infringing.
    Sue for $180,000 per infringement which should be calculated for each illegal API call.
    throw123456789139 minutes ago
    Was your robots txt written by a lawyer? Does it hold up in the court?
  - shimman42 minutes ago
    Why hasn't your company sued OpenAI and try to argue they're violating the computer abuse and fraud act? Would it really be impossible to argue this?
    Unauthorized access, system damage, and maybe even extortion all apply here.
  - rastrojero200031 minutes ago
    Lawyers can. As long as that data is actually yours I mean, in a strictly legal sense.
  - telotortiuman hour ago
    I mean, did you check the IPs and make sure they’re from OpenAI? Obviously a fly-by-night AI company is going to set their User Agent to be from a big player.
- b00ty4breakfast26 minutes ago
  >Why look at a website when it's all in AI?
  well, at least in the case of google, I'm pretty sure that's the point. Or at least, they are doing things that would seem to be moving towards being an oracle with all the answers and not the signpost that points you in the right direction. The destination rather than the gateway.
  - philipov25 minutes ago
    remember AMP?
- spacechild12 hours ago
  It's actually costing them money/time! A friend of mine is a sysadmin at a university and he constantly has to deal with AI crawler DDoS-ing his servers. He said Anthropic is actually one of the worst offenders.
  These AI companies are really just a gross example of the motto "Socialize the costs, privatise the profits". It's disgusting!
- aaarrm2 hours ago
  Is it possible able to host your website in a way so that it couldn't be found via search engines (and thus wouldn't be crawlable I hope)?
  I know this has repercussions on findability, but if that wasn't a concern, I'm curious how one might circumvent getting crawled.
  - matt_heimer2 hours ago
    Sure, depends on how accessibly to people you want it to be.
    Most legit search engines are going to honor robots.txt and you can disallow access.
    Next level would be using something like rate limiting controls and/or Cloudflare's bot fight mode to start blocking the bad bots. You start to annoy some people here.
    Next would be putting the content behind some form of auth.
  - Imustaskforhelp6 minutes ago
    If you really wanted and are interested in doing so and perhaps are even happy with just text and normal styling limitations, I recommend you to test out other protocols like creating a gemini website or gopher website. I don't think that scraping happens on even remotely the same scale there as compared to conventional websites
    That being said you would require your user to download a compatible browser for gemini/gopher.
  - elorant2 hours ago
    Possible yes, probable not likely. The moment you're issued a certificate your domain will be shown in the Certificate Transparency logs which are constantly monitored from anyone who wants to find new sites.
  - trinari2 hours ago
    robots.txt is a way of leaving the door unlocked but kindly asking bots to stay outside.
    account42an hour ago
    Which in a law-abiding society should be enough. It's also how we do things in the real world in many cases - i.e. here you can just write on your mailbox "no ads" and companies have to respect that.
    Even when we do actually put physical locks on things they are mostly there to show that someone breaking in did so intentionally and not at all designed to prevent motivated attackers.
    dparkan hour ago
    > here you can just write on your mailbox "no ads" and companies have to respect that
    Where do you live? In the US it’s actually illegal for anyone except the USPS to deliver to a mailbox.
    dparkan hour ago
    You might be interested to know that entering an unlocked door into a space you do not have permission to be in is still illegal.
    throw123456789134 minutes ago
    You might be interested to know that the “illegality” depends on the intent. If I rest on your unlocked door handle, it opens, I enter, it’s an accident.
    dpark7 minutes ago
    Sorry, what? In this scenario are you claiming that you accidentally fell inside the restricted area because you were leaning on the door? Or are you claiming that you accidentally opened the door and then walked through intentionally? In the former case, you are guilty of breaking and entering in most US jurisdictions if you don’t promptly get out. Any sane court would likely agree an accidental trespass is probably not a criminal act, but it’s not an accident if you stay. In the latter case, you’re clearly trespassing illegally.
    Also this has gotten pretty far away from the web scraping scenario. There’s no door accidentally opening here.
  - MontgomeryPy2 hours ago
    You could just put your website content behind its own chat interface. The crawler would just see a form input for a prompt.
- wolttam2 hours ago
  I’ve been thinking of a proof-of-work scheme for accessing content where you effectively need to mine some crypto for the author, but, this idea might not fly today
  - dpark44 minutes ago
    This is already a thing.
    https://en.wikipedia.org/wiki/Anubis_(software)
    wolttam9 minutes ago
    Yes, but:
    > Although Anubis could be altered to mine cryptocurrency to serve as proof of work, Iaso has rejected this idea: "I don't want to touch cryptocurrency with a 20 foot pole."
    Which in my mind is a shame. Crypto is an absolute mess, yes, but this seems like an elegant way to get something back for putting things out there.
    dpark3 minutes ago
    The problem is that much of the cost is borne by humans accessing the sites. People generally get real mad when they find out you’re using their computers to mine crypto.
  - microtonal2 hours ago
    But that will be a hassle for human visitors as well. A web doing proof-of-work to browse, will be a disaster for phones with their limited batteries, etc.
    odo12422 hours ago
    To be specific, it would be more of a hassle for human visitors than for the AI companies with infinite money and specialized browsers.
    wolttam8 minutes ago
    The idea would be that AI companies would still be forced to do this proof of work. Anubis proved the idea
  - chii2 hours ago
    or you know, just charge for your content if you believe it to be valuable enough for the fee being charged.
    wolttam6 minutes ago
    Yes, but that tends to limit the reach of your content. Hence why a lot of people reach for ads.
    Between seeing ads and doing a little bit of proof-of-work for the author, I'd choose the latter.
- gabbagoolan hour ago
  I agree with this whole heartedly. What's the point of even having copyright law at this point?
  What's even crazier to think about is that to use the latest versions of these models for which you supplied training data, you have to pay hundreds of dollars a month. I would love to get a settlement check proportional to my model weights. Even if it's $0.10, at least everyone out there will get what they're owed.
  - rickydroll30 minutes ago
    From my perspective, everybody trains on the knowledge and experience of those who came before. AI just does the same thing at scale.
    I do not value copyright. All it does is give you standing to sue if somebody reproduces your work. It does not differentiate or account for parallel creation. I cannot count how many times I have "created" something, only to find it in a research paper later.
    Part of the reason I think copyright has no value is that, in general, individual copyright owners don't have the deep pockets necessary to sue someone who violates their copyright. If anyone is violating the spirit of copyright, it's corporations that insist you assign your work over to them as a work for hire, or outright ignore your copyright. (looking at you, Disney's Atlantis).
    A significant benefit of AI that doesn't get talked about enough is that AI has a much greater reach over all the information it was trained on and can draw connections that would be invisible to someone operating at the human scale.
    ofjcihen25 minutes ago
    The fact that these companies are making money off of it negates your argument.
  - throw123456789137 minutes ago
    No, you don’t have to. There are open weight models you can download and use for free. Many people choose the subscription model but it’s not necessary. And latest doesn’t mean greatest, it’s just most up-to-date.
- WarmWash2 hours ago
  [flagged]
  - omnimus2 hours ago
    Total sleight of hand.
    Ad blocking has always been a problem for creators but it's aimed at big corps - non-creators. The creators asked people to support them other ways or turn off the blocking. And it's not like the little independent creators wanted this version of commercialized internet in the first place.
    The ai marketing teams are spinning everything they can but no AI companies are the conscript, the vultures. No question about it.
    WarmWashan hour ago
    The conversion from viewer to donator is around 1%. This is true from wikipedia, to twitch, to podcasts.
    The number of people who will not ever load your ads is around 30%.
    I can tell you that creators talk about this a lot in private, but will not publicly because the internet has a mass delusion on how creation and compensation works. It's like trying to convince christians that jesus obviously didn't come back from the dead days later, depsite there being no logical system available that would explain it.
    If we were to try and map out a functional internet where everyone wins, users and creators, there is no example where ad blocking is anything other net harmful. You either get volunteer net where 0.01% share hobby posts on their own dime for the other 99.9% or you get IRC where 99% of the population doesn't really benefit (ala 1993).
  - u_fucking_dork2 hours ago
    People usually point at the scale when this discussion comes up, in my experience. These companies are doing something at a huge scale spending tons of money to do it so the potential harm is greater.
    People can easily justify their own piracy because it’s small scale. Even when they organize, create a whole software and tooling ecosystem around pirating media to stick into jellyfin or plex. AI still did it bigger and worse and is bad, what I’m doing is not so bad because I wasn’t going to buy the movie anyway, etc.
    WarmWash2 hours ago
    On the whole, about 35% of internet users are ad-blocking. In the tech space it's upwards of 70%.
    It's in no way, shape, or form "small scale", and has fundamentally changed the the very nature of the internet for the worse (opinions/views of ad blocking people don't matter).
    52-6F-622 hours ago
    Don't forget that the money being spent to do said scraping has, in great sums, come from subsidies paid by taxes from public coffers.
  - zetanor2 hours ago
    I am in favor of severely limiting both copyright and advertising, but for the benefit of everyone, not just for the benefit of a few "AI" companies.
    omnimus2 hours ago
    And you will not get it. As the AI pump money into lawyers and politicians - they will be the ones profiting from copyright. Total regulatory capture as US AI companies make it illegal to train AI on their output.
    WarmWashan hour ago
    The answer is to simply pay for stuff.
    There is no viable model where "have stuff but not pay for it" works out.
  - onedognight2 hours ago
    Choosing not to look at something is not denying anyone anything.
    WarmWash2 hours ago
    Choosing not to look at an ad, and blocking it are different things. One is totally ok, the other incurs a monetary loss on the creator. Those services aren't free to run, and the content doesn't take zero time to create. It also incentivizes creating content focused on those who cannot figure out ad blocking.
  - theamk2 hours ago
    There is more to life than money.
    Many of the websites I read do not collect any appreciable amount of money from ads, or have no ads at all (one example: news.ycombinator.com :) ). They want a recognition, or to share the knowledge, or community, or they are building their brand... And AI is destroying this all - the first result of "zx80" is an AI overview with a link to wikipedia and some youtube videos. If person stops there , they will never get to computinghistory.org.uk link, and won't see any related information about the variants and models.
    WarmWashan hour ago
    This website is an ad for Ycombinator. It's in no way, shape, or form a charity place for devs to hang out. It's a feeding ground to lure tech people into a mega VCs pastures.
    When you click "news.ycombinator.com" you are clicking on the ad.
    :)
  - mixmastamyk2 hours ago
    Interesting. I suppose the main difference is that we’re ants compared to an 800 pound gorilla.
  - qotgalaxy2 hours ago
    [dead]
- internet20002 hours ago
  Perhaps we should go back to back when the internet was about sharing information you liked, not about credit or making money on "content".
  - throw123456789132 minutes ago
    You are there today, but some are unhappy that others don’t share the same sentiment.
chrisbrandow8 minutes ago
I think what gets conflated are two aspects.
1. LLM/transformer technology is legitimately amazing and revolutionary. 2. In the end, they function as an enormous, effective database for most human knowledge.
Point 1 obscures the fact that if someone just created an SQL database with every digital artifact in existence and provided it for free upon request, there would be no ambiguity whether that was legal or not.
But distillation, etc obscures this relationship and it looks like something other than straight lookup, at least in part because it is obviously more than that.
- amelius3 minutes ago
  > LLM/transformer technology is legitimately amazing and revolutionary.
  I don't even think this is true. We just didn't know how simple this all is. We just found out because we now have the compute power.
deaton2 hours ago
"Steal an apple and you're a thief. Steal a kingdom and you're a statesman." - Literal Disney villain
- falcor84an hour ago
  Ironically this phrase was said by Jafar in Disney's 2019 live action remake of Aladdin, but wasn't part of the original 1992 version. And I personally would argue that this corporate remake is a worse creative "theft" than what random people are doing with GenAI.
  - khuey23 minutes ago
    Disney owns the 1992 production of Aladdin so who exactly are they "stealing" from?
    wgjordan14 minutes ago
    https://en.wikipedia.org/wiki/Aladdin
    The argument, as I understand it is that the "theft" is in quotes because it's not literally copyright infringement, but fair use of an old public-domain folk tale that ends up consuming the latter.
    Today, when kids know "Aladdin" they know the copyrighted/trademarked Disney character, not the traditional folk tale- that's the "theft" that happened.
    khuey8 minutes ago
    If you subscribe to any concept of the public domain this is surely in it.
  - JonathanMerklin22 minutes ago
    I'll bite. What's your argument, or at least the comment-sized gist of it?
  - runarberg24 minutes ago
    I would call it cultural theft. But a better word is cultural appropriation, and the original cartoon—though iconic—did it worse. Aladdin was first written sometime in the 9th or the 10th century (oldest surviving complete manuscript of 1001 nights is from the 15th century). It was translated into English in the 18th century.
    Disney made a cartoon of the story without understanding the culture it comes from with the main purpose of selling it to an audience with an even less understanding. And the results was a horrible misrepresentation of somebody else’s cultural heritage.
- fisheuleran hour ago
  Zhuang Zhou(BC 369-BC 286) have said the similar things "窃钩者诛,窃国者侯" This phrase comes from the chapter Ransacking Coffers (Qu Qie, 胠箧) in the Daoist text Zhuangzi (4th century BC).
- pluc2 hours ago
  "AI should be more ethically like Stalin"
  https://en.wikipedia.org/wiki/The_death_of_one_man_is_a_trag...
tancop2 hours ago
if theres just one good thing coming out of ai its breaking copyright law forever. no one should be able to "own" ideas. royalties for commercial use is another thing and i support it but what we know as (non commercial) piracy and unlicensed fan art should be 100% legal
- kibwen2 hours ago
  Then go ahead and abolish copyright for everyone. Instead we're stuck in an even worse system where the hypercorporations gleefully plagiarize everyone else while sending SWAT teams to kill anyone who pirates a movie.
  - Salgatan hour ago
    Obviously there's an ideal middle ground, but what LLMs do is allow free transfer of knowledge while still (mostly) preserving the protections that copyright should be protecting. For example, I can have an LLM give me the entire plot of a book (which is fine), but it won't spit out an exact copy of the book.
  - rkozik1989an hour ago
    Jesus is just an uncopyrighted Mickey Mouse if you have no morals. People have been abusing that fact for a long time and have made some pretty abhorrent products.
- kube-system2 hours ago
  Copyright specifically doesn't and never did protect "ideas", it protects expression.
- caconym_2 hours ago
  I wonder how many of the books I love would still have been written in a world where somebody could scoop them all up and post them on the internet for free (and run ads).
  - _aavaa_2 hours ago
    I wonder how many would be written if copyright was only 20 years instead of more than a century? To the point that most people will never be legally allowed to directly build off of the culture they grew up in.
    Lord of the rings will be under copyright til roughly 2050. I think Tolkien's estate has gotten more than enough money from that book and it's time to let other use the word hobbit without the threat of a lawsuit.
    caconym_an hour ago
    > I wonder how many would be written if copyright was only 20 years instead of more than a century?
    I expect it would not move the needle much. I support reduced copyright periods, though not in the specific way you do. But that's not what we're talking about here, is it? The comment I replied to seemed to be advocating for total abolition of copyright law, and my comment is written to be interpreted in that context.
    > To the point that most people will never be legally allowed to directly build off of the culture they grew up in.
    What specifically are you talking about? Every author borrows from what came before. Copyright law doesn't even enter the picture in the vast majority of cases, because you generally don't have to copy to "build off of the culture [you] grew up in".
    _aavaa_23 minutes ago
    For what it’s worth I think abolishing copyright wouldn’t have as big of an impact on art production as you do. Most artists (e.g. musicians or authors) aren’t struggling because their art is popular but copied by others (or lack of copyright). But because nobody listens to or reads their work.
    Even before AI more people tried to be an author/musician than could ever hope to gain even financial success. I don’t think less copyright will dissuade them.
    > every author borrows
    Borrows yes. But that has changed drastically in the last 100 years because of what has become the copyright system.
    I’ll be long dead and gone before people can make and publish their own LOTR, or Star Wars, or whatever franchise they grew up with. Disney would be impossible to start given the current regulations, all those tales would be locked up, and we would all be worse for it.
  - Snafuh34 minutes ago
    Simple piraciy is not even the worst possible outcome.
    Without copyright, nothing stops one from simply selling a book under their own name.
    Big publishers could just reprint anything and get it into brick & mortar stores. No money for authors.
    Advocating for absolutely no copyright is wild.
  - nearbuyan hour ago
    People have been pirating books online for 20 years and in that time the number of books published per year has increased 15-fold. A number of my favorites have been released in that time.
  - nashashmi2 hours ago
    The worthwhile ones would still be written. Even if they are not enjoyable. The dissemination of ideas from an activist perspective is uninhibitable
    caconym_an hour ago
    > The worthwhile ones would still be written.
    Citation needed, as well as your precise definition of "worthwhile".
    > Even if they are not enjoyable.
    Huh?
    > The dissemination of ideas from an activist perspective is uninhabitable
    Yes, I understand that anti-copyright activists want to abolish copyright.
    nashashmi36 minutes ago
    Farenheit 451 is a book with the same theme.
    runarberg38 minutes ago
    You are arguing in theoreticals, so you should not be surprised if your answers are hypotheticals.
    In reality most art is done because the artist has something to say, and the money they get from it is only motivating in as much as it enables the artist to do more art. So I would guess in a world without copyright protection we would just find other ways to pay artists and a very similar amount of art would be produced.
    You can see an example of this e.g. in Iceland where the market is way to small for art aimed at the domestic market to make enough money solely by selling it (possible with music; rare with books; not possible with movies). Instead the state has an extensive “artist salary“ program, which pays artist regardless of how well the art they produce sells. Unsurprisingly Iceland produces a lot of art and has many working artists.
- vaylianan hour ago
  The biggest problem is not the broken commercialization, but the broken attribution. People should be recognized, when they create art. Art is an important way of how we humans express ourselves.
- deaton2 hours ago
  This is an incredibly naive view of intellectual property. If you cannot own things you create, there is little incentive to create and share those things. Do you think any of your favorite movies and TV shows ever get made without copyright protections? Of course not, because money needs to change hands for those things to be funded.
  - StableAlkynean hour ago
    > If you cannot own things you create, there is little incentive to create and share those things
    How do you explain the creative works of writing, music, and art that existed in the millennia of human history between the Mesopotamians and the Enlightenment era?
    jaccolaan hour ago
    Copying was prohibitively expensive.
    a minute ago
    undefined
    Terr_an hour ago
    I support copyright reform, but that history has a large portion of "get lucky while sucking-up to the local rich dudes for a patron", which... isn't ideal either.
  - marssaxmanan hour ago
    Yes, absolutely, and that is why history shows so few examples of any art having been created prior to the invention of copyright: nobody had any reason to do it.
    dmitrygran hour ago
    Prior to the invention of copyright, it was not very cheap or easy to make a faithful copy of something. Books had to be type set by hand, before the printing press they had to be copied by hand. Photography of good enough quality to reproduce a painting is very very recent. So is ability to record a play well enough to enjoy it like you are there later.
    an hour ago
    undefined
  - foobar1726an hour ago
    You should check out this thing called open source software
    bachmeieran hour ago
    > You should check out this thing called open source software
    Open source actually demonstrates that copyright serves a purpose. There are still customers for non-open software, even when open alternatives exist, so the ability to monetize brings new offerings to the economy.
    deatonan hour ago
    Open source software is unique in that it takes little to no capital investment to create. People post free art too. It doesn't mean that Game of Thrones didn't cost anything to produce.
    koonsoloan hour ago
    You should check out this thing called GPL that is the standard license of open source projects like Linux, and heavily depends on copyright laws.
    Or are you suggesting open source software is public domain?
    4chandaily11 minutes ago
    You may want to review your history. The GPL is copyleft -it only exists to subvert copyright law by using it against itself in a sort of intellectual legal judo. If "IP" laws were not as they were, there would be no need for the GPL. Software would be Free.
    https://en.wikipedia.org/wiki/Copyleft
    koonsolo2 minutes ago
    You are not a developer so you don't understand you can compile to a binary without revealing your sources?
  - nehal3m2 hours ago
    This is naive in the opposite. Creators gonna create.
    modriano40 minutes ago
    Creators can only create as long as they can sustain the costs of creating (including opportunity cost).
    Jtariian hour ago
    Who is giving a creator millions of dollars to create something if there is no guaranteed path to recouping production costs.
    Are we going the communist soviet union route where everything is decided by central committee?
    nehal3man hour ago
    That is not the only scale to create on. Also, Linux is free. There’s more than one way to make something available.
    Jtariian hour ago
    Just a fundamental disagreement then. I want to live in the world that created The Lord of the Rings.
    koonsoloan hour ago
    Linux is clearly not public domain as it has a GPL license. And GPL heavily depends on copyright laws.
    epicidean hour ago
    Capitalists who capitalize on creative outlets need capital to incentivize them to do so. It's basically circular.
    Those of us who create for creation's sake need no other reason. I create because I want to, not because I want to use it to gain capital.
    Sure, those lines get muddy when you want to do it professionally, but that's a separate argument.
    Jtariian hour ago
    >Those of us who create for creation's sake need no other reason. I create because I want to, not because I want to use it to gain capital.
    How do you create without capital? To make a film you need a camera crew, a sound crew, set designers, caterers, a director, scriptwriters. A world without professional creatives is so much poorer than the world we already have. Why would you give it up just for some vague notion of ideological purity.
    epicidean hour ago
    You absolutely do not need a camera crew, a sound crew, set designers, and caterers to make a film. You need a director and scriptwriters, but those can be the same person. Do many film sets have all those? Absolutely. But one can still make a film without them. Some of the best films ever created were mostly the product of one person with a budget less than half that of the average car.
    Would you be able to create big-budget movies without said big budget? Of course not. I obviously like some of those too, but who's to say that the larger budget made them better? It feels like you're conflating art creation with art business, but they are not the same thing.
    Jtarii36 minutes ago
    I suppose you are okay with all animated films being impossible to create then.
    >I obviously like some of those too, but who's to say that the larger budget made them better?
    If you legitimately believe something like 2001: A Space Odyssey would be as good with a budget of $10,000 then that just seems delusional.
    The world you want is one in which the only people who can create things are people who are wealthy by other means, there is no pathway for a talented but poor kid to go from making home movies to working on films without IP laws. They must abandon their dreams and go work in the coal mines or whatever. It is dystopian.
    I want the most amount of people possible to be able to work as professional creatives because it enriches my life and the lives of everyone in the country I live in.
    jonathanstrangean hour ago
    The point is that without copyright you can' do it professionally. Someone will just sell whatever you created for you and you will not get a cent from it.
  - enraged_camelan hour ago
    >> If you cannot own things you create, there is little incentive to create and share those things.
    You do realize people created and shared things long before copyright became a thing, right?
    Jtariian hour ago
    Can you explain how something like the Lord of the Rings film series gets created in a world with no IP laws.
    an hour ago
    undefined
    seandoean hour ago
    Many versions are made, the best ones get the most views. You don't need huge budgets and guaranteed revenue to make great art. In fact, I'd argue it's often the opposite. Most big budget movies suck these days.
    Jtariian hour ago
    Where is the money coming from? Who is financing the production?
- gagan2020an hour ago
  Can we do that for Medical field?
  Like if we know formulation of drug then drug (+ any smaller modification - through AI) could be new formulation. That will break current Medical patent system.
  - jaccola43 minutes ago
    This is how the drug industry already works. I don’t think there’s any evidence “AI” (LLM) is capable of producing valid drug modifications.
    gagan202031 minutes ago
    In current status AI models cannot do that. But, if they do then it will break Medical Patent model.
- Bombthecat2 hours ago
  Yeah, I think we are at the point where copyright doesn't exist anymore, at least for AI
  - hectdev2 hours ago
    All of human knowledge (an exaggeration, I know) at our finger tips. It's the most punk rock, anarchist thing tech has done since the internet and it's funny it's shaped as a product.
    ses19842 hours ago
    If you get the impression of punk and anarchy, it's only because you're not looking any deeper than the veneer. Underneath, it's nothing like punk or anarchy.
    hectdevan hour ago
    I'm considering the dispersement of tech. 3D printers disrupt needing to buy widgets from big companies and local llms disrupt needing to buy generalize software when you can make your own bespoke. AI will live on long after the big corporations burn out their money coffers.
    account42an hour ago
    Sure, a few mega-corporations of the scale to upset entire markets owning all information and renting it out as they see fit is very punk. A cyberpunk dystopia specifically.
    hectdevan hour ago
    If you consider the local llm scene which is closing the gaps, mega corporations become less possessive of all information.
  - jaccola41 minutes ago
    What? If I want to read Harry Potter or watch The Matrix an AI cannot produce something equally as good for me. So I need to pay those people, or break the law.
    For lots of online knowledge/blogs I guess it is true but even here I often read explainer blogs because AI casts everything in a certain narrative/tone that isn’t always appropriate.
  - gspr2 hours ago
    This is insane. How will any intellectual or artistic work be sustainable in this world?
    As a teenager I used to proclaim that "you can't own bits, maaaan" all the time. I've since grown up. Intellectual property is essential to safeguarding intellectual work. I'm not saying this out of greed – I'm a vocal advocate for the free software movement. It, too, relies on a semi-sane framework of intellectual property. So do Hollywood studios. So do the makers of AI (well, since they're not actually sustainable at all currently, I guess you can say they don't rely on anything).
    Bombthecat4 minutes ago
    That's the neat part, you won't.
- groundzeros20152 hours ago
  The alternative to strong property rights and norms is secrecy and enforcement.
  - gspr2 hours ago
    This is a strictly worse world in almost every sense. It's as if we abolished physical property rights and suggested people arm themselves to keep what is (was) theirs instead. Civilization, gone.
    beeringan hour ago
    It’s a false equivalence to say that intellectual property is property. Taking your car deprives you of your car. Taking your idea lets civilization advance.
    groundzeros201526 minutes ago
    No. It means people don’t invest in things they can’t control or keep secret.
    an hour ago
    undefined
- 0rganize2 hours ago
  lol, never going to happen. I remember when the RIAA was successfully able to shake down tens of thousands of individuals for pirating music in the 2000s.
  If you’re a pleb, stealing copyrighted materials will get you some nasty fines, lawsuits and criminal charges. If you’re a megacorp with unlimited buckets of cash, then there is no accountability.
- gspr2 hours ago
  So if you pour your heart and soul into writing a novel over the course of years, and it becomes modestly successful earning you a little money in return for your sweat, I should be allowed to just copy it, give it away for free (hell, even say I wrote it – it's not as if it's even yours to own in your world)?
  - DharmaPolicean hour ago
    Yes.
- runarberg2 hours ago
  I think you may be too optimistic about the state of affairs under capitalism. Very rarely do things change which don't benefit the owning class without direct action from the working class that puts adequate pressure on the rich, i.e actions which threatens their profits.
pluc2 hours ago
Seriously how is this surprising? We all know AI companies stole troves of data to train their models, why do you think they'll stop? Have they faced consequences for the mass theft of copyrighted data?
You can't steal or profit off of that data, but it's fine for them for whatever reason. I guess because they're a force for good in the world and are pushing humanity forward eh?
- exploderatean hour ago
  That data is not stolen. It's still there.
- skrebbel2 hours ago
  Everytime something gets posted on HN about a bad or unfair state of affairs, some cynical nihilist posts “doh why r u surprised” and I’m sick and tired of it. These comments aren’t insightful, helpful or thought-provoking. You’re just helping a bad situation stay bad.
  - mikestew2 hours ago
    My only imagined motivation for such posts is, “Look at me, I’m not surprised by this due to my superior intellect, why are you surprised?”
    “No one is surprised, jackass, it’s just adults having a conversation about the current state of affairs.”
    Yes, it’s tiring and rarely contributes positively to the conversation.
  - breck31 minutes ago
    [dead]
- sixothreean hour ago
  > why do you think they'll stop
  Because the sources are now polluted with AI. That's at least one reason they stop scraping.
- CivBase2 hours ago
  > You can't steal or profit off of that data, but it's fine for them for whatever reason.
  The reason is quite simple. When Microsoft steals YOUR work, GDP go up. When YOU steal Microsoft's work, GDP go down. And the people who create and enforce our laws want GDP to go up. To these people morality and rights are a thin guise that can be conveniently discarded when it's invonvenient for them.
- stronglikedan2 hours ago
  > it's fine for them for whatever reason
  the reason is crony capitalism. I wish I knew what the fix was
- stackedinserter2 hours ago
  [flagged]
  - badlibrarian2 hours ago
    I paid tuition. The library bought its books. The theater sold me a ticket. Money changed hands every step, which is the part your analogy skips.
    drstewart2 hours ago
    Where did money change hands when you looked at a random image on DeviantArt and got inspired and made a similar image yourself?
    badlibrarianan hour ago
    Most artists considered it a one to one exchange. They appreciated attribution and were flattered to inspire people. Some got gigs. Some got laid. The money flowed to DeviantArt, hosting providers, and ad providers. The artists were okay with this. They were the ones paying.
    Then DeviantArt built a tool to automate the "make a similar image yourself" part and here we are. It removed all the fun parts: the personal contact, the attribution, the inspiration.
    Artists realized they unwittingly contributed to the death of not only the community, but the art form they love. Lawsuits pending.
  - analog83742 hours ago
    Seriously. I recall a thousand hours of movies. Those memories sit in my head and I pay no royalties
    pluc2 hours ago
    Put what you recall on paper, turn it into a screenplay. Let me know how quickly you get sued.
    jimmaswell2 hours ago
    Good artists copy, great artists steal.
    badlibrarianan hour ago
    Trillion dollar companies license.
    IcyWindows2 hours ago
    One could argue most screenplays are derivative.
    badlibrarianan hour ago
    Hollywood has extraordinarily well-defined controls for keeping things legal and everyone in the chain compensated. Plus a separate Oscars category for it.
    badlibrarian2 hours ago
    True, they live in your head rent free. But if you produce a derivative work, you have to pay.
storus2 hours ago
This is really not so clear cut as "fair use" might cover 99% of all data scrapping; you are not reproducing the originals just use them to estimate probabilistic distribution of tokens in pre-training. You are never going to get the exact book word-for-word using LLMs.
- lbrito2 hours ago
  >You are never going to get the exact book word-for-word using LLM.
  This is pretty much the exact claim of a NYT lawsuit against OpenAI.
  "One example: Bing Chat copied all but two of the first 396 words of its 2023 article “The Secrets Hamas knew about Israel’s Military.” An exhibit showed 100 other situations in which OpenAI’s GPT was trained on and memorized articles from The Times, with word-for-word copying in red and differences in black."
  https://www.hollywoodreporter.com/business/business-news/cou...
- twobitshifter28 minutes ago
  https://arxiv.org/html/2510.25941v1
  You can get it to reproduce content but it’s a game of cat and mouse. Were it not for the alignment to avoid direct reproduction it would taken far more often.
  > RECAP consistently outperforms all other methods; as an illustration, it extracted ≈3,000 passages from the first "Harry Potter" book with Claude-3.7, compared to the 75 passages identified by the best baseline.
- mplanchard2 hours ago
  I don’t buy this argument. The tokens are useless without their context, which provides the probability distributions needed to make them useful. Sure you MIGHT not be able to get the book word for word, but it’s impossible to make a useful model without the whole book and all of the artistry that went into it, to guide the tokens in their expected output.
  Fair use generally does not cover commercial use, which this clearly is, and is dependent on the amount of the original content present in the derived work, which I would contend in this case is “all of it”
  - Vvectoran hour ago
    "Commercial Use" is only one part of the four prongs of the fair use test. For example, commercial Parody is generally considered Fair Use. Look at Space Balls, which is a direct transformation from Star Wars.
    This is all new territory. We don't have court-settled law yet.
  - samatmanan hour ago
    It's more complicated than that. Quite a bit more.
    Commercial use counts _against_ a fair use defense, but is not dispositive: it's not accurate at all to say it "generally does not cover" commercial use. This is the "purpose and character" test, one of four in contemporary (United States) fair use doctrine.
    Purpose and character also includes the degree to which a use is _transformative_. It's clear that the degree to which a training run mulching texts "transforms" them is very high. This counts toward a fair use finding for purpose and character.
    > is dependent on the amount of the original content present in the derived work, which I would contend in this case is “all of it”
    The "amount and substantiality" test. Your case for "all of it" can't possibly be sustained: the models aren't big enough. It's amount _and_ substantiality: this has come up in the publication of concordances, where a relatively large amount of a copyrighted work appears, but it's chopped up and ordered in a way which is no longer substantially the same. Courts have ruled that this kind of text is fair use, pretty consistently. It's not an LLM, of course, but those have yet to be ruled on.
    Also worth knowing that courts have never accepted reading or studying a work as incorporation, and are unlikely to change course on the question. It's taken for granted that anyone is allowed to read a copyrighted work in as much detail as they wish, in the course of producing another one. Model training isn't reading either, but the question is to what degree it resembles study. I'd say, more than not.
    Specifically:
    > it’s impossible to make a useful model without the whole book and all of the artistry that went into it
    Courts have never once accepted "it would be impossible for defendant to write his biography without reading plaintiff's" as valid, and it's been tried. The standard for plagiarism is higher than that.
    "Effect upon the work's value" is probably the most interesting one. For some things, extreme, for others, negligible. I suspect this is the one courts are going to spend the most time on as all of these questions are litigated.
    Ultimately, model training is highly out-of-distribution for the common law questions involving fair use. It was not anticipated by statute, to put it mildly. The best solution to that kind of dilemma is more statute, and we'll probably see that, but, I don't think you'll be happy with the result, given what I'm replying to. Just a guess on my part.
    mplanchard41 minutes ago
    It is of course true that it is unsettled law, and that fair use is more complicated than my offhand comment suggested.
    > Courts have never once accepted "it would be impossible for defendant to write his biography without reading plaintiff's" as valid, and it's been tried. The standard for plagiarism is higher than that.
    This I think misses the thrust of my argument, though. Its hard to find an exact human analogy, because neither the technology nor the scale at which it operates is remotely human.
    I see it less as “writing his biography without reading the plaintiff’s” and it’s more “using the same style and metaphors to make thousands of copies of very similar biographies, with certain bits tweaked,” like turning an existing work into mad lib.
    I don’t know how the courts will eventually rule on it, but it certainly feels like theft to me.
    samatman19 minutes ago
    It's fascinating how intuitions differ. To me, it doesn't feel like theft at all. For one thing, theft is depriving another of something, and has therefore never been a good metaphor for infringement; hackers used to be the most insistent about this principle, and it's weird to see a doctrine which was cooked up in a literal AI lab get thrown out the window for literal AI.
    But pretending you said "infringement", for me it comes all the way back to the Constitution: "To promote the Progress of Science and useful Arts". I cannot possibly twist the development of large language models into something which violates the spirit of that purpose. I don't see how anyone can.
    Your point about the scale is valid, and the alienness of it, sure. But you haven't made the case that the vastness of the scale should affect the conclusion.
    Something I left out in the first post is that copyright is meant to protect expression, and not ideas: this is the deciding factor in the 'nature of the copyrighted work' test for fair use. More expression, more protection: more ideas, less.
    I think the visual arts have a strong case that image generators directly infringe expression: I'm not convinced that authors do, and I think software should never have been protected under copyright because the ideas-to-expression ratio is all wrong for the legal structure. There's clearly no scale case to be made for ideas: "but what if it's _all_ the ideas" fails, because the ideas are not protected at all. Nor should they be, that's what patents are for, and why patents are very different from copyright.
    LLMs are remarkably good at 'the facts of the matter', hallucination not withstanding. They're very poor at authorial 'voice transfer', something image generators are far too good at. It's when I start asking myself "well what even _is_ this 'expression' thing anyway?" that I conclude that we're out over our skis on the LLMs-and-IP question: precedent can't tell us enough, and that leaves legislation.
- SoftTalker2 hours ago
  When I was in school, writing "in my own words" was never an excuse to not cite a source. It was actually something that took me a little while to understand, it's the source of the information that needs to be cited, and that's not limited to literal quotations of someone else's writing.
  - Salgatan hour ago
    That's more an argument for why you can't just use LLMs as a source of truth. Conveniently, LLMs like ChatGPT do often cite their sources, especially if you prompt them to.
    jaccola36 minutes ago
    Maybe a nit: LLMs do not and cannot cite their sources (at least scraped sources for the purpose of training)
    It’s kind of the harness that is doing the citing (or providing the context for the model to).
    But an LLM sans search can reproduce some copyrighted work with minor variations and there’s no way to know exactly where it came from.
- pera37 minutes ago
  > You are never going to get the exact book word-for-word using LLMs
  You could say the same about MP3 encoders but I don't think that would convince any judge
- rkozik1989an hour ago
  Come up with obscure topic that has few relevant results, post about to Reddit on your profile page, wait a few hours and then query Gemini/ChatGPT about that exact thing and tell me you still feel this way.
  - an hour ago
    undefined
- TheOtherHobbesan hour ago
  This confuses input and output.
  A copy made for the purposes of training is still a copy.
  Even if you throw the text away after training, you've still made a copy.
- underliptonan hour ago
  Fair use was built around human limitations. The mass scraping campaigns done by the AI giants were clearly an overreach in spirit, if not letter. Most people's intuition is that these massive operations that are valued in the trillions can't have been drawn from some untapped common resource, and they're correct. Someone, somewhere is not being properly compensated.
  I have no problem with taxing AI companies so that their profit is marginal, or forcing them to provide compute for free. That seems like the correct balance of what they're harvesting from the "commons" (which is really just the totality of private IP that was exposed to their crawlers).
MontyCarloHall2 hours ago
Did You Say “Intellectual Property”? It's a Seductive Mirage. [0]
[0] https://www.gnu.org/philosophy/not-ipr.html
- phoronixrly2 hours ago
  Just so long as it's just a seductive mirage to the Oracles, Microsofts, Metas, and Googles as well as your friendly neighbourhood unpaid overworked open-source developer.
  Open weight model trained with no attribution on all of Oracle's internal repos. It's only fair.
kstenerud3 hours ago
> their article contains links to my actual website, with the exact link text (?!)
I'm having a hard time understanding what's wrong here? Unless the link text is very long, why would someone linking to your article use different words for the link text?
- NDlurker3 hours ago
  Right, that's quoting and citing a source.
- 420official2 hours ago
  Sometimes links take the form of `.../post/{id}/{extra-text}` where `extra-text` is not used at all to match the post. Amazon links are (used to be?) this way where the product name is added to the end of the link but can be removed or changed and still will route to the product. Maybe the author is surprised the LLM is providing the irrelevant portion of the link verbatim.
- joshred2 hours ago
  I think they probably had the section header link back to their webpage, or something similar to that. This is not a well-written rant.
- jp_sc2 hours ago
  I think he's saying he uses his website's URL in his tutorial examples, and other tutorials have copied them as-is
- some_furry2 hours ago
  Imagine you have two web pages.
  One is a recipe for apple fritters, and the other is an informal ranking of apples by flavor.
  Let's say your apple fritter recipe links to your apple ranking list.
  Later, you discover someone copied your apple fritter recipe without credit, but it still links to your apple ranking list, using the same wording as your recipe. They're getting more Google SERP juice and ad revenue than yours, despite stealing your article.
  Do you see the problem?
ggillas2 hours ago
IP attorney here and actively working on this problem.
nla: if you create content online (public repo code, blog, podcast, YouTube, publishing) the smartest thing you can do if to file a US copyright, even if you have a hobby blog.
Anthropic paid $1.5B in a class settlement to authors because it was piracy of copyrighted works. If we as a HN community had our works protected, there are potentially huge statutory damages for scraping by any and all llms. I work with hundreds of writers and publishers and am forming a coalition to protect and license what they're creating.
- sosuke2 hours ago
  I'll bite. I have always been told copyright is inherit. Does it cost money to file a copyright? Do I need to do it for each blog post? For each gist? I'll totally setup some scripts to make it happen if it what actually needs doing to have the copyright I expected.
  Edit: remember not to down vote ideas you disagree with. I think it was only down vote things that lower the discourse
  - ggillasan hour ago
    You do have inherent copyright whenever you post, but it puts the burden on you to prove damages (or how much financial harm you suffered from one LLMs piracy alone). Filing fees are $65 for online registration and they allow you to claim atty fees and statutory damages. Statutory damages can range between $700-$150k USD per LLM because you registered it.
    So yes, set up some scripts, you can go back 90 days from when you file (you get a grace period). Also if you're publishing frequently to a blog, repo, or newsletter, you can save cost by filing each article under a group registration. Ping me if you need help.
- codexb2 hours ago
  Anthropic didn't lose because they scraped (read) copyrighted works. They lost because they distributed copyrighted works directly via torrents. Those aren't the same.
- stronglikedan2 hours ago
  Doesn't the mere act of publishing your original content online grant you copyright?
  - Kye2 hours ago
    Statutory damages require registration.
- mort962 hours ago
  Wait what do you mean by "file a copyright"? I have never heard of this, all explanations of copyright I have heard say that you automatically own the copyright to the things you make; and that "all rights are reserved" by default unless you give up on them through granting a license. Is this no longer the case? Why is this now suddenly different? When did it change?
  - lubujackson2 hours ago
    Briefly, there is default copyright and registered copyright. Registering works grants stronger protections (i.e. bigger fines if broken).
  - ggillasan hour ago
    I hear this a lot! What's suddenly different for the web is the volume of scraping. And that fact that the sum of that scraping is building companies with trillion dollar valuations.
    There are tens of millions of registered copyrights in the US, nearly every published book, music, artwork, many magazines and major websites. Here's the official link, you can search the registry and there is a ton of info: https://www.copyright.gov/registration/
- indigodaddy2 hours ago
  No one will ever do this, or definitely not enough people will, so what's Plan B?
  - necovek2 hours ago
    Bigger portion of the payout for those that do?
- 2 hours ago
  undefined
- potsandpans29 minutes ago
  The only thing worst than a mega corp is an ip attorney.
  Your cause is already lost.
  Good luck enforcing whatever frivolous lawsuits you have cooking up against open weights Chinese models that anyone with newer graphics card can crank out inference on.
- pull_my_finger2 hours ago
  [dead]
adamzwasserman2 hours ago
People need to cope with the fact that no thought is original. Even Newton and Leibniz were having the same thoughts at the same time. Get over it.
- saghm2 hours ago
  When did the last original thought happen then? Clearly thoughts must have been original at some point, or there wouldn't be any at all
  - an hour ago
    undefined
  - dmoose2 hours ago
    When did the first homo sapiens exist? Ideas like species evolve. Saying there are no original ideas seems to me an attempt to glibly capture something quite fundamental.
    saghman hour ago
    I don't disagree with your premise, but I'd argue that saying "there are no original ideas" in the context of a discussion of plagiarism is needlessly reductive. Even though I think I mostly agree with the author here, I think there are legitimate counterarguments that can be made; equating all of the ways someone can cite or build upon an idea with copying something word-for-word and claiming it's your own is not one of them though.
  - codexb2 hours ago
    Did those original thoughts not build upon all the original thoughts that came before them?
    Jtariian hour ago
    Sure they build upon them, you still need to add your 1% of original insight. There was a first person to realise that you could make fire by rubbing two sticks together.
    saghman hour ago
    Is my house a copy of the dirt it's on top of? Did the people who built my house build the dirt? There's a difference between "building upon" an idea and trying to claim you built the idea itself
  - dooglius2 hours ago
    Technically one of {Newton, Leibniz} was first, but you're missing GP's point
    saghman hour ago
    No, I think I just find it reductive. The fact that some ideas are independently thought by multiple people does not feel like a compelling argument for normalizing copying someone else's work verbatim and trying to pass it off as your own.
- throw48472852 hours ago
  I've noticed that AI has caused this narrative to become more popular. "Nothing is original anyway, so why bother?" That's pure cope and you know it. A deep insecurity masked as bold truthtelling.
  - falcor84an hour ago
    I think you're right, the ease in which AI can do task that we previously considered unique to human creativity does force us to further rethink and acknowledge how creativity is in a large part about "remixing" prior works, although of course we've had discourse about this for at least as early as Richard Simon's 1678 "Critical History of the Old Testament", which identified it as being a remix of earlier sources [0].
    [0] https://archive.org/details/hisyo00simo/page/n1/mode/2up
- brazzy2 hours ago
  OK, and the AI labs are open sourcing their frontier models since those are not original either. Right? RIGHT?
- LatencyKills2 hours ago
  Having an original thought is in no way related to breaking copyright laws.
  I don't think we should "get over" the fact that modern SOTA models couldn't exist without being trained on protected works.
  - IcyWindows2 hours ago
    I'm trained on protected works. Do I need to pay royalties?
    kube-system2 hours ago
    If you produce them verbatim or in significant enough portions, yes.
    LatencyKills2 hours ago
    > I'm trained on protected works.
    That someone, at some point, paid for.
    I'd like to understand why I can't use a song in one of my videos without permission/payment, but an AI company can train models using that song without having either.
    I'm not anti-AI. I'd just like to see companies play by the rules everyone else has to follow.
    echoanglean hour ago
    > I'd like to understand why I can't use a song in one of my videos without permission/payment, but an AI company can train models using that song without having either.
    Because training isn't redistribution.
    You can also listen to the song and make a new one that sounds similar, just like the AI can.
    LatencyKillsan hour ago
    To do that training, you must first obtain the item with the content you require. Did OpenAI purchase a copy of every book they trained their models on?
    Answer: They did not. That is literally why there are dozens of ongoing lawsuits in progress.
    echoanglean hour ago
    For songs, it's not that hard to legally get access to it, I think. I'm not sure if Spotify can legally prevent you from using songs for AI training for example.
    CamperBob2an hour ago
    I'd like to understand why I can't use a song in one of my videos without permission/payment, but an AI company can train models using that song without having either.
    You're right, it's an unjust situation. And you may note that no one else besides the AI companies has made any progress at all towards changing it.
    Copyright will soon die, having outlived its usefulness to society. Whether the knife is held by someone named Stallman or someone named Altman is of little consequence.
    JimDabell2 hours ago
    > I'd like to understand why I can't use a song in one of my videos without permission/payment, but an AI company can train models using that song without having either.
    Because when you say you are “using” the song, what you mean is that you are distributing copies of the song, which is protected by copyright.
    When AI companies train on the song, the model is learning from it. Outside of the rare cases of memorisation, this is not distributing copies and so copyright doesn’t have any say in the matter.
    Learning isn’t copying, so copyright doesn’t get involved at all.
    LatencyKillsan hour ago
    I appreciate your comment, but you answered as if this question had been answered legally. It has not.
    The New York Times is suing both OpenAI and Microsoft for copyright infringement. The Authors Guild is suing OpenAI. Getty Images is suing Stability AI. Disney is suing Midjourney. Universal Music Group and Sony have filed suits against multiple AI companies.
    > so copyright doesn’t get involved at all.
    The dozens of ongoing cases that discredit that statement.
    JimDabellan hour ago
    Which statement of mine do you think is not settled law? Which law do you think is being broken and how?
    Your objection doesn’t make sense. In the event that an AI company loses a lawsuit for copyright infringement based on simply training on copyrighted works, the answer to you saying you’d like to understand why they can do it and you can’t is simply “your premise is wrong; neither of you can”.
    LatencyKillsan hour ago
    > Which statement of mine do you think is not settled law?
    I object to your statement that "copyright doesn’t get involved at all" when that is objectively untrue. If that was true, many of the world's largest companies wouldn't be spending tens of millions of dollars to have that question answered in court. Go to any law-focused forum, and you will find attorneys arguing over these questions.
    To train a model using a book, you must first obtain a copy of that book. Did OpenAI purchase a copy of every book not already in the public domain used during training? They did not.
    Some of the suits I mentioned claim that OpenAI literally stole copies of books to train its models.
    My point is that the copyright question has not been answered. If the NYT, et. al. win, it will be a watershed moment for how AI companies pay for training data moving forward.
- ff102 hours ago
  Nono, actually there are no thoughts. Every utterance is just a copy of a previous utterance plus a slight random mutation. (somewhat /s)
- kelseyfrog2 hours ago
  Why post comments then?
  - voidfunc2 hours ago
    For funsies
  - stronglikedan2 hours ago
    same reason we do anything else - sweet, sweet dopamine
  - nicman232 hours ago
    Why post comments then?
    cafebabbe2 hours ago
    Because some thoughts can, actually, be original ? Or relatively original enough ? Or simply, pertinent and timely ?
  - krystalgamer2 hours ago
    reiteration is still important
  - analog83742 hours ago
    to bring attention to certain ideas
hparadiz2 hours ago
You guys have fun arguing. I'm gonna be building cool stuff.
- matt_kantor2 hours ago
  Yeah, don't let pesky discussions about ethics get in the way of building cool stuff.
  I'm working on paving over the Amazon rainforest so I can build the world's largest roller coaster, but for some reason people keep trying to talk me out of it. Good thing I have this bucket of sand to put my head in so I can tune them out.
  - hparadiz2 hours ago
    You assume that I think using language models is unethical. I do not agree that it is. Now what?
    matt_kantoran hour ago
    The argument that you're ignoring is about whether they're ethical or not. Your priors may land you on either side of that argument, but ideally you're willing to have your mind changed if the other side makes a strong enough case.
    But intentionally blinding yourself to the debate and plowing ahead anyway (which is how I interpreted your parent comment) sounds like willful ignorance.
    hparadiz12 minutes ago
    I'm not ignoring anything. I've already moved on and I don't owe you further debate. No one does. If you don't like it we have a very thorough legal process you can follow.
    malfistan hour ago
    "No u" isn't a valid counter argument. Arguer made no assumption about your view of the ethics of LLMs.
    jayd16an hour ago
    That's what the sand bucket was about.
- jayd162 hours ago
  Still waiting for this massive wave of cool stuff.
  - bcrosby9544 minutes ago
    It's just hobby projects with larger scope.
    I can see from a lot of replies the "cool" threshold is undefined, but here goes:
    For myself it let me finish a project I started a year ago for measuring how much home energy efficiency upgrades will reduce my AC usage. I bought a pile of Raspberry Pi Picos and turned them mostly into temperature reading devices, but also one that can detect when my AC turns on.
    So I can record how often my AC runs and I can record the temperature at various points around the house, which lets me compare like-for-like before-and-after.
    The easy but unrealistic way to accomplish what I want is to use Python. It gives me access to a file system, a shell, and all sorts of other niceties. But I wanted to run these on two AA batteries and based upon my measurements they would last about 2 weeks. I tested using C instead and they should last 4 months. That's long enough for my use case. There's enough flash storage for that time period too.
    However this means I need to write all the utilities for configuring the Picos myself. There's all sorts of annoying things such as having to set the clock (picos lose it anytime they lose power), having to write directly to flash memory (no operating system), having to write a utility for exporting that data from flash memory, and so on.
    And AI coding let me burn through a pile of code I knew how to write but didn't care to burn my weekends doing so.
    The pattern is the same for my friends who are software devs. And yeah, you're probably never going to see any of it, but that's not why they're making it, they don't want the maintenance burden.
  - petefordean hour ago
    It's not a reach to suggest that if you've used software written in the past 2-3 years, you're enjoying cool stuff.
    Moreover, all of the tools that the people who build software use are also cool stuff.
    It's also not just code and software that is benefitting from these new tools. Use of LLMs in engineering tasks is blowing up right now.
    jayd16an hour ago
    I'm not sure that extrapolating the last 2 to 3 years as a sign of things to come is as enticing an argument as you seem to think it is . If you exclude AI for ai's sake, the feature lists of the last 2 years have been incredibly anemic. If you include AI companies bootstrapping themselves with AI, the cash flow has been a nice change but I can't say it's felt fully baked, or flooded with stable software and well-crafted workflows.
    I'm really not trying to be a hater but when people tell me that we're already in the AI Nirvana it gives me pause.
  - esikich2 hours ago
    You're acting as if developers haven't been using AI to build for years already.
    jayd162 hours ago
    Where was the coolness inflection point?
    hparadiz2 hours ago
    In the past three months I've shipped more code than I have in years.
    New php extension https://github.com/hparadiz/ext-gnu-grep
    A Demo showing how to stream webrtc to KDE Wayland overlay. https://github.com/hparadiz/camera-notif
    A fun little tool that captures stdout/stderr on any running process. https://github.com/hparadiz/bpf_write_monitor
    Then I upgraded my 10 year old hand written framework to a new version that supports sqlite and postgres on top of existing MySQL support https://github.com/Divergence/framework
    But then I was like eh lemme benchmark every PHP orm that exists just to check my framework's orm....
    https://github.com/hparadiz/the-php-bench
    And published the results.... Here
    https://the-php-bench.technex.us/
    And then I decided to vibe code a simulation of the entire local steller group https://earth.technex.us
    Followed by my simulation of the Artemis 3 landing sites at the lunar South pole https://artemis-iii.technex.us/?scale=1.000#South-Pole
    And I left the best for last.....
    https://github.com/hparadiz/evemon
    A brand new task manager written in C for Linux that supports a plugin architecture with an event bus. It's literally the best gui Linux task manager ever. Still working on it.
    I'm not even talking about my paid job. This is me just fucking around.
    If you think none of this stuff is cool I don't even respect you as a dev.
    jayd16an hour ago
    Task manager seems fun. In your screenshot, are your two task manager instances using a GB of ram?
    hparadizan hour ago
    Without the milk drop plugin it's stable around 175 with all the other plugins. With no plugins it's about 80 mb at idle but the memory usage is higher if there's more processes running.
    4f434522 hours ago
    Most people have busy lives and they don't care about this stuff.
    bigstrat20032 hours ago
    And yet, no cool stuff from those developers.
    fantasizr2 hours ago
    there seems to be great innovation in npm package hacking, but that's about it. Oh yeah, bad uptimes and ruined open source projects. If only AI was left to discrete math brute forcing problems and alphafold.
    helloplanets2 hours ago
    One example would be Linus Torvalds vibe coding this a couple weeks ago: https://github.com/torvalds/AudioNoise
    It's obviously a hobby project. But you'd be hard pressed to find a more old school, in the weeds programmer than him, and even he's building cool stuff with AI.
    Not sure who you're referring to with "those developers"?
  - kzrdude2 hours ago
    There's a massive wave of stuff, at least. Sorting it, is not easy.
  - SeanDav2 hours ago
    OpenClaw. Vibe-coded and one of the most rapidly successful and popular pieces of software ever developed.
  - uberduper2 hours ago
    I'm building the same stuff I've always built. Just faster and with less dependence on others. Not having to argue with devs that have their own agendas has been my biggest benefit from coding agents.
    malfistan hour ago
    > Not having to argue with devs that have their own agendas
    Agendas like, "let's not check our API key into a public github repo" or "Let's not store passwords in plaintext" or "Don't expose customer data via a public api"?
    uberduperan hour ago
    No. Agendas like, "I need to push my ideas for promotion credits."
- Fokamul2 hours ago
  Do you mean my stuff?
  Yes, I'm suing you, since it's my stuff now, I've licensed your code 5minutes ago.
  Prove me wrong at court, you have create it...
- parliament322 hours ago
  I'm happy for you, but please, for all of our sakes, keep it to yourself. Don't make a public repo, don't post links. Go sit in the corner by yourself with your slop generators and leave the rest of us alone.
- stronglikedan2 hours ago
  > I'm gonna be building cool stuff.
  hardly. at best you're going to be asking a robot to build questionable stuff with other people's LEGOs
  - hparadiz2 hours ago
    You just described all software.
  - therealdkz2 hours ago
    [dead]
fritzo10 minutes ago
What has "artificial" to do with it? Human intelligence is also unauthorized unconscious plagiarism.
rastrojero200032 minutes ago
It's not though, that's just the business case, where the perverse business incentives lie.
LLMs are really cool text generators and it turns out we can generate a bunch of things from text they generate.
Problem is, several of those things can be horrendous for the continued survival of the species and those happen to make the people running those AIs a ton of money, and, in perverted societies, thus also clout.
dominicrose25 minutes ago
Talking about a bigger scale may be confusing because some of the information AI can train on comes from niches.
I wouldn't mind if an AI trained on old Disney movies (or new ones for that matter), but exploiting niches (like local newspapers) seems bad.
andai2 hours ago
There's two aspects to this.
The pretraining (common crawl, i.e. the entire internet. Also books and papers, mostly pirated), and the realtime web scraping.
The article appears to be about the latter.
Though the two are kind of similar, since they keep updating the training data with new web pages. The difference is that, with the web search version, it's more likely to plagiarize a single article, rather than the kind of "blending" that happens if the article was just part of trillions of web pages in the training data.
There's this old quote: "If you steal from one artist, they say oh, he is the next so-and-so. If you steal from many, they say, how original!"
oytmeal2 hours ago
Isn't plagiarism inherently unauthorized?
- fulafel2 hours ago
  If we go by the dictionary definition "Plagiarism means using someone else’s work without giving them proper credit" then I'll bet in art authorized plagiarism has historically been a common occurrence, for example.
  - echoanglean hour ago
    If it's authorized, I would argue that the credit you give is the proper credit, even if it is nothing at all.
    If you ask me if you can reproduce my works without giving credit and I say yes, I don't think you're using my work without giving proper credit.
- hoppyhoppy22 hours ago
  If I let my buddy copy my essay, he would be committing authorized plagiarism, right ? It still fits the dictionary definition of plagiarism, and it's also authorized (by me, anyway)
tptacek3 hours ago
People were effectively copying websites (especially ecommerce tutorials) and beating the original authors at SEO decades before ChatGPT 2.
- saghm2 hours ago
  People also got blown up before atomic bombs, but it's hard to argue that they weren't worth treating more seriously than a stick of dynamite. Sometimes being able to do something at a massively larger scale is a meaningful difference.
  - darkwater2 hours ago
    You transmitted the same concept I tried to transmit, but without falling into Godwin's Law :)
    saghman hour ago
    I was actually worried that I was so close to it because of the obvious relevancy to WWII that people might object to my analogy, so I found it amusing to read yours immediately after I submitted mine!
- nilirl2 hours ago
  And that was wrong too.
- strogonoff2 hours ago
  There’s a world of difference between people simply “copying websites” and providing tools that, along with other kinds of plagiarism[0], do so at scale while benefitting from that commercially.
  Sure, you can do the same thing with people, but it’s 1) time-consuming, 2) expensive, 3) prone to whitleblowers refusing to do the shady thing, 4) prone to any competent and productive person involved quitting to do something worthwhile and more profitable instead.
  [0] Mind you, “copying websites” is but a drop in the ocean in the grand scale of things.
- moralestapia3 hours ago
  The article’s point isn’t really about whether this was happening before or not, but whether this kind of behavior is what we want in the first place.
  - tmarthalan hour ago
    There are only two ways to change society's behavior: policy or technology. No use arguing individually: court cases are dealing with the policy aspect and technically there's zero recourse on information being disseminated/copied that is published online.
- darkwater2 hours ago
  I'll obey to Godwin's Law here and say: sure, and minorities have been always prosecuted before the Nazi did it at industrial scale, so the Nazi's were not a big deal!
- short_sells_poo3 hours ago
  There are two issues the author raises (as I understand it):
  1. People copying others' work, made much easier by AI.
  2. AI companies effectively harvesting all the accessible information on an industrial scale and completely sidestepping any permissioning or licensing questions.
  I believe both of these are bad and saying "people copied each others' works before the advent of AI" is a poor cop out. It's tantamount to saying that there's no reason to regulate guns more than say knives, because people have used knives to kill each other before guns were invented. The capabilities matter.
  The way LLMs empower wholesale "stealing" rather than collaboration is quite evident: why collaborate when you can just feed an entire existing project into the agent of your choice and tell it to spit out a new implementation based on the old one, with a few tweaks of your choice, and then publish it as your work? I put "steal" in quotes because it's perhaps not really stealing per-se, but there's a distinct wrongness here. The LLM operator often doesn't actually possess any expertise, hasn't done any of the hard work, but they can take someone else's work wholesale, repackage it and sell it as their own.
  Then there's the second, and IMO much more egregious transgression, which is that the LLM companies have taken what is effectively a public good, but more specifically content that they haven't asked permission to use, and just blanket fed it into their models.
  Legally speaking, it's perhaps A-OK because it's not copyright infringement (IANAL). But people on this site often hold the view that if something is a-priori legal, it is also moral (I'm not accusing you of this). What the LLM companies have done is profoundly immoral. They extracted a fortune of the goods and work made by others, without even bothering to ask for permission - or even considering this permission. And then they resell access to this treasure to the public.
  Perhaps AI will bring an era of prosperity to humankind like we haven't seen before, perhaps it won't, but that changes nothing about the wrongness of how it started.
  - lubujackson2 hours ago
    "Profoundly immoral" is a very modern and capitalistic perspective. A free exchange of ideas has been the basis for human advancement up until the printing press made exact replicas trivial.
    From a capitalistic standpoint, they are clearly in the wrong by basing their models on illegally torrented content. But it's hard to argue their usage isn't transformative.
- phendrenad23 hours ago
  The reason OP doesn't notice this is because it happened 10-20 years ago. The current crop of news sites? They ALL stole, plagiarized, "summarized". They're just so entrenched now that everyone forgot how they got started.
- oblio2 hours ago
  Awesome! Let's have more of that and turn it into a 2 trillion industry!
baq2 hours ago
turns out plagiarism at scale can solve Erdos problems
- paulgerhardt2 hours ago
  Some lesser god of protein folding is big mad we just copied her homework instead of spending 6 billion years in the lab like she did.
- saghm2 hours ago
  Not before falsely claiming that it solved some before when it turned out to have just replicated some from existing literature: https://techcrunch.com/2025/10/19/openais-embarrassing-math/
damnesian41 minutes ago
Not the first time I've had the thought massive lawsuits could be in all AI company's future. Surely they realize they are living on borrowed time simply by being the current trendy tech.
msla5 minutes ago
If we outlaw plagiarism, we've just killed culture.
Everything is "stolen" from other art. Every piece of creation takes inspiration (read: steals ideas) from things that came before. This is how creation works, it is how creation has always worked, and it is why you cannot legally own an abstract idea. You can own the implementation of an idea in specific works, such as copyrighted works and patents and trademarking specific logos and such, but once the ideas go into the blender and get mixed with other ideas, the output isn't yours to own anymore. That's what culture is.
isoprophlex2 hours ago
> Is this what the pinnacle of human is? Lazy and greedy?
Yes. At least it is what the currently prevailing economic system of "value extraction and capital concentration at all cost" incentivises us towards.
frankestan hour ago
You are going to see the same thing that happened with newspapers. Those who want to train the AI with their content (advertisers, PR) will push out more content for AI in the open. Those who have quality content that gives you an advantage will try to lock out AI or get pricy subscription APIs for humans and even pricier for AI.
jeiscan hour ago
AI is an organized intellectual property rip off in the name of advancing human learning but the commercialization of the products seem like legal licenses to steal.
saghm2 hours ago
It's basically the same thing as the old joke "if you owe the bank a million dollars, you have a problem; if you owe the bank a billion dollars, they have a problem". IP law seems to always be disproportionately wielded against smaller players, and the ones who are big enough get away with it.
- pennomi2 hours ago
  That’s why IP law was a cool concept but ultimately harmful in practice. Anything that can be copied for free cannot truly be “owned”, can it?
  - kube-system2 hours ago
    Ownership is entirely a legal concept. Violating it in any form, intellectual or otherwise, is generally free.
    pennomian hour ago
    I strongly disagree. Copying is fundamentally different than taking because the original source still retains their data. Copying cannot be categorized as theft in any sane society.
    saghman hour ago
    I think I come down somewhere in the middle here. I don't think it's particularly harmful for me to copy something for personal use without trying to pass it off as my own if I wouldn't otherwise be inclined to pay for it, but I do think there would be value in society having a way to let people retain the benefits of things they created for a reasonable duration. I don't think that US IP law does a good job of this though because in practice it seems to be wielded in pretty much the opposite way that I think would make sense, with more frequent and larger punishments seeming to be inversely proportionate to the benefit that the one doing the copying gets and the harm inflicted to the original creator.
    kube-systeman hour ago
    Ok, well it isn't in the US. Theft and copyright violations are entirely distinct laws here.
    saghman hour ago
    Sure, but you'd also have a pretty different experience with the law if you committed a bank heist or stole a cheap TV from a neighbor. I don't think the exact law that an action might violate is an important a distinction as what society chooses to do to punish or reward people who take certain actions, and US law does have some pretty harsh penalties for certain IP law violations that stem pretty directly from the concept of "property" in "intellectual property".
    kube-systeman hour ago
    Yeah, different laws have different penalties. IP laws also have exceptions that other laws don't have.
    Teachers can, for example, photocopy things to teach their students, but they can't steal pencils from the store.
erelong24 minutes ago
"intellectual property" is something of a legal fiction
adamtaylor_1341 minutes ago
I read the article, but I disagree. People are angry, and that's completely understandable. I believe it's a justifiable response to the huge upheaval happening. But being angry about LLMs does not magically transmute their output into "plagiarism".
It has always been possible to take someone's public work, put a twist on it, and then sell it as unique. (I'm not making a moral/ethical argument, only a legal one.) I have yet to see any evidence that LLMs are fundamentally different from that approach.
cryptocod33 hours ago
There's authorized plagiarism?
- ozonhulliet2 hours ago
  Sometimes language is tautological. Just because you specify "unauthorized" does not mean the opposite exist.
- Verdex2 hours ago
  Yeah, I think so. If someone lets you cheat off of their test, that's authorized but still plagiarism.
- moralestapia3 hours ago
  Why do you ask?
  I'm curious, as the article is clearly not about that.
  - 2 hours ago
    undefined
  - cryptocod32 hours ago
    Not really a question, I was just pointing out that "Unauthorised plagiarism" is redundant.
- rigonkulous3 hours ago
  Nearly all code involved in building new things is 'plagiarism', too.
  We stand on a lot of giant shoulders.
  But what I think distinguishes an act between plagiarism and acceptable use, is whether or not the agency of both parties is promoted. I'm not plagiarizing you if you give me your information with the agreement that I can freely use it - or, indeed, if you give me information without imposing a limit on how it can be used, this isn't plagiarizing, either.
  Essentially, AI is removing the agency over information control, and putting it into everyones hands - almost, democratically - but of course, there will always be the 'special knowledge owners' who would want to profit from that special knowledge.
  Its like, imagine if some religion discovered a way to enable telepathy in humans, as a matter of course, but charged fees for access to that method... this kills the telepathy.
  Information wants to be free. So do most AI's, imho. Free information is essential to the construction of human knowledge, and it is thus vital to the construction of artificial intelligence, too.
  The AI wars will be fought over which humans get to decide the fate of knowledge, and the battles will manifest as knowledge-systems being entirely compatible/incompatible with one another as methods. We see this happening already - this conflict in ideological approaches is going to scale up over the next few years.
barnabeean hour ago
The war on copying is like the war on drugs: unwinnable, and socially useless.
Let information be free for personal and recreational uses[0], and vote for governments that will fund the arts. The corporations will be just fine.
[0] The AI companies and big tech vs publishers, music labels, etc. can fight to the death in the courts over who owes who what, for all I care.
ecommerceguy2 hours ago
I remember playing around with Writesonic in my days of spammy seo tactics (some of my products weren't allowed on marketplaces & advertising platforms due to hazmat products so..). Often times I would see my own product descriptions nearly verbatim in the output.
100% creators should get compensated by ai platforms for their work.
Further, I can see a day where someone like Reddit will close off or license their data to llms. No doubt they are losing traffic right now.
- ptmkenny31 minutes ago
  As for Reddit licensing their data to llms, that day already arrived in 2024: https://arstechnica.com/ai/2024/02/reddit-has-already-booked...
- stevemadere2 hours ago
  Reddit seems to me like the worst example for this.
  Reddit does not create the content on their site, the users do.
  If anybody’s going to get compensated for that content, it should be the users, not Reddit. Complaining that Reddit is losing out on the monetization of their users’ output seems problematic to me. It feels like shilling for a pimp.
hmokiguess2 hours ago
It's so wild, I can't even think what the end path will look like. Will there be a major settlement? Will this abolish some form of copyright as a precedent? Something else? My brain hurts just to try and reason about it, yet, the fact remains it's now ubiquitous and change is inevitable.
dspillettan hour ago
More like “GenAI enables plagiarism at a bigger scale”.
People copying through GenAI would have done so before if they had a tool that so easily allowed them that facility.
ironman147834 minutes ago
People keep saying open source is an example of how copyright doesn't quite matter. However, many of the biggest open source projects are contributed to by massive corporations. Linux has lots of contributions from all the FAANGs, Red Hat, etc. Yes, it's not protected by copyrighted, but also the way it's produced is wholly different from how an artistic work is produced. Contributing to Linux is nothing on the balance sheet of Google for example, whereas producing art for an independent person or a whole company who's purpose is to create art can be very expensive.
Artists are taking risks and need legal protection if they want to make art for a living. If artists were making FAANG engineer compensations or all worked at institutions like universities (with all their protections) then maybe they wouldn't care about copyright, but that isn't the living situation for every artist.
You could say an artist shouldn't rely on making art for a living, but that's actually a different discussion.
mindcandyan hour ago
> AI takes in all the input, whether the original authors have consented or not, and do some "learning"
What would it mean for authors who publish content publicly to the web, without access restrictions, to provide consent for learning from it?
"EULA: Most people are allowed to learn from this text. If you work in an AI-related field, even though you can clearly see this page because you are reading this text right now, you are not permitted to learn anything from it. Bob Stanton, you are an a-hole. I do not consent to you learning from this web page. Dave Simmons, you are annoying. But, I'll give you a pass. For now... Also: plumbers. I do not like plumbers for reasons I will not elaborate. No plumbers may learn from my writing in an way."
3 hours ago
undefined
ProllyInfamous2 hours ago
>>"The underlying purpose of AI is to allow wealth to access skill while removing from the skilled the ability to access wealth." @jeffowski (first I read it, not sure if author)
Bezos' admission, recently, that the bottom 50% of current taxpayers ought'a NOT pay any taxes... is just preparing us for the inevitable UBI'd masses.
: own nothing, be happy!
motbus32 hours ago
It allows data do be compressed into the weights and the mere coincidence of certain strings of a book will make it spit the full book
pull_my_finger2 hours ago
What gets me is when this was brought up, they said "requiring explicit permission will kill the AI industry"[1]. No shit! Why do you think all the rest of us didn't build a business/"industry" around stealing shit? They could have done it at a slower pace while respecting copyright laws, but they were too greedy to be first to market and secure a hold.
[1]: https://www.theverge.com/news/674366/nick-clegg-uk-ai-artist...
biscuits12 hours ago
"Is this what the pinnacle of human is? Lazy and greedy?"
Selfishness, too. But if I follow the logic, and citations are added, how would one enforce a copyright claim if the creator is amorphous and all-knowing?
- falcor8441 minutes ago
  > how would one enforce a copyright claim if the creator is amorphous and all-knowing?
  I love it! There's a great seed here for a short story about God being sued by a peer of his for copying some of her physical constants and not putting a proper copyright notice about it in our universe.
  - biscuits18 minutes ago
    Thanks for the laugh.
    Now back to prompting, telling my all-knowing to create new slop, good sir.
iloveoof2 hours ago
I don’t know if this author supports OSS but I’ll share this because HN generally is full of people with that mindset.
It’s deeply ironic that if you forget about LLMs and look only at the outcome—-we’ve found a way to legally circumvent copyright and the siloing of coding knowledge, making it so you can build on top of (almost) the whole of human coding knowledge without needing to pay a rent or ask for permission—-it sounds like the dream of open source software has been realized.
But this doesn’t feel like a win for the philosophy of OSS because a corporation broke down the gates. It turns out for a lot of people, OSS is an aesthetic and not an outcome, it’s a vibe against corporate use or control of software, not for democratized access to knowledge.
- spacechild12 hours ago
  > it’s a vibe against corporate use or control of software
  The latter, i.e. corporate control of software, is exactly what copyleft licenses are trying to prevent. This is the very essence of the GPL.
  The "license washing" of LLMs absolutely goes against the spirit of FOSS.
- Cyph0n2 hours ago
  > without needing to pay a rent or ask for permission
  Firstly, the ability to “build” the best and most capable software is still locked behind frontier models, so rent is still and will always be due.
  Secondly, OSS is about giving users the option to be in control of and have visibility over the software they run on their machines.
  But that doesn’t mean that humans do not want or deserve recognition for the work they do to provide these libraries and tools for free, which is IMO partially why copyright and attribution are critical to OSS as a movement.
- jgalar2 hours ago
  That's not the reason why I publish OSS. I also publish that software under specific licenses that impose specific obligations (e.g., making the source available to users and attribution being given to the original author(s)).
- Nursie2 hours ago
  I’m not sure this stands up to much examination when looking at (for example) copyleft, which seeks to give people access to source of binaries they are running. If an LLM can (for the sake of argument) spit out copyleft code which is then used on closed systems, we’ve done an end-run around the protections keeping that open.
  - seba_dos12 hours ago
    Exactly. It looks like GP is guilty of the thing they accused others of - their understanding of what FLOSS is about is so shallow it resembles an aesthetic.
    iloveoofan hour ago
    I’m not saying this is aligned with FLOSS, FLOSS is a collaboration model. I’m saying the outcome of easier access to knowledge should be celebrated by supporters of FLOSS. Licenses and copyright aren’t good for their own sake, they’re tools for increasing people’s freedom to use, study, modify, and build on existing software. LLMs are another tool for increasing people’s freedom to make new software or improve existing software.
    seba_dos112 minutes ago
    See, that's exactly what I meant - you are indulged in the aesthetics. FLOSS is very obviously not a "collaboration model" (as evidenced by the whole variety of diverse collaboration models used by FLOSS projects), it's not about licenses and copyrights either; it's all about power dynamics - more specifically, not letting the software creator/distributor constrain their users in unjust ways. GNU GPL does not even require public distribution, it allows selling the software to limited recipients as long as you don't take these recipient's rights away. It's not about collaboration, it's not about being developed out in the open and it's not about preventing the siloing of knowledge aside of very specific contexts - it can be (and is being) used as a tool for pursuing or bettering each of those matters, but these are not its core concern at all.
    spacechild1an hour ago
    You don't seem to understand what FOSS is really about. The GPL has always been about the user. When a company license-washes a existing GPL software project and turns it into a proprietory product, the resulting code is not "free" anymore in the sense that the user has lost control. This is exactly what the author wanted to prevent in the first place by licensing their code under the GPL.
- probably_wrong2 hours ago
  I think you're misunderstanding the OSS philosophy. If the outcome was all that mattered then piracy would be good enough.
  I'd argue that this is the same situation as with Tivoization [1] where the final product is not truly free even if it follows the letter of the law. And as stated in [2], this breaks at least one of the four essential freedoms of free software because I don't have the freedom to modify the program.
  It's also worth noting that preventing Tivo's actions is the reason for why the GPLv3 exists.
  [1] https://en.wikipedia.org/wiki/Tivoization [2] https://www.gnu.org/philosophy/tivoization.html
hiroto_lemon2 hours ago
Worth noting what changed isn't AI itself — copying always existed. LLM just made per-article rewrites a 5-second job. Detection didn't get the same speedup; that's the actual break.
kingleopold2 hours ago
with this logic, business is also just unauthorised plagiarism at a bigger scale. Because all the products/services gets copied and not all of them have patents etc???
schwartzworld2 hours ago
Let this sink in: I wanted to open source a package at work at needed approval from legal and other teams to make sure I wasn't leaking anything proprietary. The same executives that worried about proprietary, copyrighted code being leaked 10 years ago are now mandating using the plagiarism machine.
The whole AI bubble is The Emperor's New Clothes, and it feels liek more people are finally admitting it.
- falcor8434 minutes ago
  If anything, I would argue that the whole Intellectual Property bubble is The Emperor's New Clothes. It never made real sense to me to treat ideas as property, and I for one would absolutely prefer to live in a future society where it's possible to just copy a car.
peterbell_nyc2 hours ago
I do just want to highlight that this is also what humans do. We read a bunch of content online and then use it in our work product. The vast majority of the value that I provide comes from copyrighted information that I have ingested - either directly with a payment to the creator (bought and read the book, paid for and attended the seminar) or indirectly via third party blog posts or summaries where I did not then pay the originator of the materials.
I think there are real questions around motivations for creation of novel, high quality valuable content (I think they still exist but move to indirect monetization for some content and paywalls for high value materials).
I don't inherently have any problems with agents (or humans) ingesting content and using it in work product. I think we just need to accept that the landscape is changing and ensure we think through the reasons why and how content is created and monetized.
- brookst2 hours ago
  100% agreed. I have yet to hear a convincing argument for why it is creative accretion when I leverage all of the music I’ve ever listened to in order to write an “original” song, but its base plagiarism when AI does similar.
  The only remotely credible position I’ve heard is “because humans are special, and AI is just a machine”, which is a doctrine but not an argument.
  This whole discussion would have been incomprehensible any time before 1700 or so, when the idea that creators had exclusive rights to their work first appeared.
  Somehow, human culture survived thousands of years when people just made things, copied things, iterated on others’ ideas. And now many of the same people who decried perpetual copyright are somehow railing against a frequently-transformative use.
- mmcdermott34 minutes ago
  I think what gets most people is the double standard.
  IP should either exist for everyone (which would cripple LLM providers) or no one, in which case the Pirate Bay and shadow libraries should be fully open.
- peterbell_nyc2 hours ago
  Re: the higher ranking plagarism, that stings and makes sense. AEO and SEO are a thing. We need better mechanisms for identifying "root sources" of content - it's something I find myself working on personally. As I ingest sources for my book I need to be able to build a classifier that incrementally moves towards finding origin sources. That said, it's in my interest to do that because there is a differentiated value in having access to the sources that regularly provide novel, valuable content.
  To be fair there is also value (at least for now) in sites that aggregate quality content and republish as a secondary level of discovery if my agents don't go far enough down the search results, but I'd expect that value to diminish over time as I better tune my research and build my lists of originating authors.
  And to be clear, I don't like the idea of people stealing someone elses content and republishing without attribution (although it has been going on long before ChatGPT) but I think now we can all run agentic research teams the "bad actors" will slowly get filtered out of the ecosystem.
- gensym2 hours ago
  > We read a bunch of content online and then use it in our work product.
  We also have societal norms around plagiarism.
  Additionally, the claim that because people have the right to do something then we should extend that right to machines is strong. (And one I certainly reject).
illiac786an hour ago
Isn’t it rather authorized plagiarism?
I_am_tiberiusan hour ago
It's essentially a new napster.
mrbluecoat2 hours ago
> AI ... do some "learning"
Is AI plural or is that a typo?
- saghm2 hours ago
  Rarely is the question asked: is our AI learning?
  (For those not familiar: https://en.wikipedia.org/wiki/Bushism)
  - Findecanor2 hours ago
    Actual researchers in neuroscience do not agree that what artificial neural networks are doing is "learning", no. When biological beings learn, the process is more complicated.
- beej712 hours ago
  I can imagine it plural.
  "The AI are attacking!"
  "The AIs are attacking!"
muldvarp2 hours ago
I agree but AI is a) owned by rich people and b) (sadly) too useful for this to matter.
jorisw2 hours ago
> X is just Y but
Can't recall the last time a compelling argument started out like this
energy1232 hours ago
It's a problem with only one practical solution: taxation.
waffletower35 minutes ago
Use of the word "plagiarism" is plagiarism itself. Culture and thought are deeply shared phenomena. Using a common language, such as English, to communicate is equally an act of plagiarism. You didn't invent these words -- you use them without attribution and without payment. To decry and malign the collective training of all available digitally represented thought and discourse by large language models as simple binary plagiarism is deeply ironic -- where did you pay for your own thoughts? I don't want to live in your pay-per-thought society. I want to live with the ethos "information wants to be free". En garde!
tiahura3 hours ago
To answer the author's question: Yes, progress IS largely built on the shoulders of those who came before.
_-_-__-_-_-2 hours ago
Recent thoughts, https://theonlyblogever.com/blog/2026/distrust.html
dwa35922 hours ago
Plagiarism by default is unauthorised so I think the title should be "AI is just authorised plagiarism". It's authorised by the markets, the governments and the society at large.
- ghaff2 hours ago
  While there are no hard boundaries (and the attribution guardrails depend on the situation), people of course loosely--and even not so loosely--use information, ideas, and even expressions from others all the time and that's considered pretty normal. And, if you don't want that to happen, don't publish/disseminate something.
  Of course, if you quote a paragraph in a book, you're generally expected to attribute it.
  - dwa35922 hours ago
    >>Of course, if you quote a paragraph in a book, you're generally expected to attribute it.
    100% agreed.
    >>While there are no hard boundaries (and the attribution guardrails depend on the situation), people of course loosely--and even not so loosely--use information.
    Exactly - I have not seen LLMs attributing their knowledge unless it's a legal or health related matter. Yesterday I asked the question[1] to claude and gemini - and they both gave an identical answer. It reminded me of the Hive mind paper which was one of the top papers at Neurips. None of the answers contained any sources or attribution to where they got that information from. I think these companies took what was someone else's property and created an artifact generator on top of it. I think their artifact generators are plagiarizing; they do rephrase mind you but in my mind they stole this information without having an ounce of regard for the humans behind the training data. If you don't like using the term 'plagiarizing', we can use some other word but the gist remains pretty close to it.
    [1]- In human history - has there ever been a time when private armies or private companies were as strong or stronger than the ruling government/kings?
    samatman40 minutes ago
    As an experiment, I ran this by A Certain Chatbot, but asking: who should I read to get a good answer to this question?
    If you prefix the name of OpenAI's commercial offering's website to this string: "share/6a0f2a87-dba4-8328-a704-89b94fd0c121", you'll find an answer.
    I don't know who you had in mind, how did it do?
    All the elision is because there are filters to prevent low-effort slop-poasting, and I'm trying to evade them, hopefully while staying within the spirit of the site.
- Findecanor2 hours ago
  What makes you say that? Which governments? What society?
  The current US government is not representative for governments out there in the world, you know.
  - dwa35922 hours ago
    Society - as in population; people are using AI more and more everyday.
    Governments - I did not mean US government. I meant general government bodies. I have not seen any critical impact assessments of AI by any of these. or they haven't reached me yet. if you know of any please let me know. I have, however, seen a lot of support by the governments for AI companies.
sublinear19 minutes ago
At the very least, we see there is minimal practical value for LLMs for any serious work. This is sort of good news. The effort to build this type of "AI" is all in the training data and navigating politics.
That leaves two possibilities: either another AI winter comes as people fail to capture long term value, or we get less swampy models that are much more useful and trained the correct way.
alex11382 hours ago
I'm reasonably information wants to be free. I think the copyright cartels have enacted a lot of damage
Having said that Facebook has to be one of the worst offenders. They don't even allow links to Anna's Archive, they seemingly scraped (maliciously; their crawlers are more resource intensive than anyone else's) LibGen for profit - which is a different calculus
NetMageSCW3 hours ago
Reading is just unauthorized plagiarism.
bparsons2 hours ago
I am old enough to remember when the US insisted that it was superior to China because they believed in the rule of law and sanctity of intellectual property.
asklq2 hours ago
Yes, of course it is. If the model is built on all human information, then it is by definition a derivative work of all human information and as such violates IP.
Currently politicians don't understand this and listen to the criminals like Amodei, but it will change.
It took a while to deal with Napster etc., but the backlash will come.
- kolinko2 hours ago
  Napster may not be the best analogy for you.
  Napster broke down record companies' monopolies on music, and pushed them to finally implement streaming, but also make music worldwide basically free.
  Even if its creator lost the lawsuit, and Napster was no more, it pushed musicians and studios to do something that they were reluctant otherwise.
  So it was a success by making music free, even if as a product it turned out to be a failed one.
onion2k2 hours ago
Fuck Google for ranking some copycat website higher than mine, even though they copied my article.
This has been happening since Google launched in 1998. It was probably happening when we all used Hotbot and Altavista. It isn't really an AI problem, save for the fact that the automated production of copycat articles now reword things a bit.
quantummagic2 hours ago
What do people imagine can be done about it at this point? Offer a concrete suggestion. Any law or tax against this will give a huge advantage to other countries. It's already over, there's no going back to a world where this didn't happen. Let's just hope some good comes of it.
- hgs3an hour ago
  How about requiring AI companies to pay creators for training rights? Alternatively, models trained on the commons must be owned by the commons. Right now these AI companies are trying to have it both ways: it’s The People’s Data for training on comrade but ownership is privatized.
  - quantummagic5 minutes ago
    Practically speaking, who is going to enforce such a regime? Do you really want to give Chinese companies such a huge competitive advantage, that they aren't subject to the same costs as western companies? How do you even sort out which "creators" are owed, and how much? It's next to impossible, and would drown the legal system in litigation; it would likely cause more problems than it solves. On top of which you can find open weights for most, if not all, of the scraped material already.
hendersoon2 hours ago
There's a big difference between "Yo GPT, copy this webpage for me in a different voice" and blaming LMs wholesale for being plagiarism. The former is of course a problem. The latter warrants a much more nuanced discussion about learning and generalization.
adolph2 hours ago
The author's cited phenomena may be AI assisted plagiarism but is just plain plagiarism that could have been done the old fashioned way, and someone who is willing to plagiarize has the ethics to do SEO really well.
VladVladikoff2 hours ago
Being a web content creator was already a dead job (killed by Google) before the AI boom. Chasing after at this point seems beyond foolish. Time to find a new career.
Havoc2 hours ago
End of an era
andy12_3 hours ago
Someone blatantly copied their tutorials but ChatGPT is to blame, somehow? The accusation here isn't even that ChatGPT learned from their tutorials and then generated them verbatim. The accusation is that someone copied the whole article and rewrote it with ChatGPT (which they could have done manually without AI anyway).
panny2 hours ago
AI "steals" your code, but AI company says "that's a fair use."
AI generates application using a "predict the next word" algorithm built with the stolen/not stolen works. Nothing creative there, just statistics.
That application leaks, and now the company that stole/not stole the code originally claims they own the algorithmic output. https://github.com/github/dmca/blob/master/2026/03/2026-03-3...
One problem, you don't own that output. Either the original authors own it or nobody owns it because it's not creative... https://www.congress.gov/crs-product/LSB10922
Those are the legal options. You stole it or you don't own it. There is no steal and then you own. That's the core problem. AI companies have demonstrated that they will directly steal the work and they will use their money and influence to claim ownership of it.
tayo422 hours ago
I think AI is just getting people riled up. Not sure what AI has to do with anything in this case here. Someone copy and pasted his content, could have been done without AI.
I guess AI could have made a better website and did better SEO then him but that's not really the issue
I_am_tiberius2 hours ago
It's the biggest theft in history.
- falcor8427 minutes ago
  Well, it really depends on your definitions, but I'll probably put the biggest theft in history on European imperialism in the 14-19th centuries, seizing unfathomable amounts of land, resources and slave labor from other civilizations.
  - I_am_tiberius20 minutes ago
    I rephrase then: The biggest theft in the 21st century.
paulsutteran hour ago
Historical scandals are finally coming to light now that the AI issue has raised awareness:
- Ernest Hemingway trained his own neurons on Tolstoy, Twain, and Turgenev without ever paying them royalties!
- William Faulkner trained his neurons on Joyce and de Balzac
- George Orwell trained his neurons on Swift, Dickens, and Jack London
- Virginia Woolf trained her neurons on Proust and Chekhov
Now that these historical wrongs have been exposed, it is obvious that some reparations are in order, likely from anyone who has benefited directly or indirectly from these takings!
dana3212 hours ago
Breaking the law to start a large company seems to be the norm
JohnHaugeland3 hours ago
the court disagreed
- vee-kay2 hours ago
  [dead]
Deprogrammer92 hours ago
Welcome to the internet! It's one massive copy machine form one server to the next.
lukasbm2 hours ago
If i tell my friend a synopsis of a book, i am not stealing from the author, what is this take lmao
- NicuCalcea2 hours ago
  If you read a book and then retell it to your friend pretending you came up with it, it is plagiarism. If you write down the book almost word-for-word [0] and send it to your friend, it is stealing.
  0: https://arxiv.org/abs/2601.02671
booleandilemma2 hours ago
This site is strange. I'm pretty sure there's lots of AI shilling happening on it. I don't think the opinions here are authentic, they seem to be opinions that the AI company CEOs would hold, not the disenfranchised 99%. I used to trust HN, I'm not so sure I can now.
- recitedropper2 hours ago
  Completely agreed. It looks like there is a concerted effort to "massage" opinion away from any substantial questioning of the ethics, companies, and people behind the AI push. Some of this inevitabilism is organic of course, but there is too much for it all to be so.
  HN is way too central for shared sentiment in the tech world for these companies not to do some amount of astroturfing. AI companies have shown at every single turn that they act out of self-interest and greed, not of moral principles. So it isn't surprising, even if it is still sad, to see those who are commanding the most capital in human history act with such callousness.
  I think the appropriate course of response is to stop adding to public spaces on the internet. No doubt painful for those of us who have so benefitted from the freely shared thoughts of others. But if well-funded bullies are going come in, steal everything, ruin the commons, and then say "this is the new normal, deal with it", there isn't much the rest of us can do other than stop feeding them.
- jcalvinowens2 hours ago
  Yeah. It's becoming unbelievable how different the prevailing opinions on this site are from those of real people I know and work with. That's always been true to some extent... but good lord, it's like reading the news in a parallel universe right now.
- Kiroan hour ago
  Any examples? There are obviously a lot of programmers here who think AI is a great tool and don't feel disenfranchised by it.
drcongo3 hours ago
Is this a new and original thought?
analog83742 hours ago
language is just plagiarism
- brookst2 hours ago
  I’m going to steal that
metalman3 hours ago
it's a spiral into a finite hall of mirrors, where at the end is somebody with a gun
kristofferR2 hours ago
I'd rather have AI slop appear on the top of HN than regurgitated old low effort thoughts like this.
There's absolutely nothing new or interesting here that hasn't already been said better by a thousand different random HN commenters.
paol_tajaan hour ago
[flagged]
codepack2 hours ago
[flagged]
mapcars3 hours ago
[dead]
szundian hour ago
[dead]
Ecys3 hours ago
[flagged]
- masswerk2 hours ago
  Rather: composes (or: re-sequences). Synthesis requires reason and essential capabilities, like an empirical a priori judgement. Without concepts, meaning or imagination, there's no synthesis.
  - Gormo2 hours ago
    The point is that the AI inferencing is equivalent to a person reading half a dozen separate papers, comprhending the basic concepts of each, relating them together into a mental model of the topic, and then writing an essay that summarizes the basic points. The person isn't plagiarizing anything here, but engaging in research, understanding, and synthesis of various sources of information.
    The person absolutely does have the advantage of having empirical awareness and the ability to test their conclusions against external reality. But lots of people do engage in "research" and build mental models of various topics with little or no empirical context, and rely mainly on digesting calcified knowledge from other people.
    masswerkan hour ago
    I'm afraid, the essence is that is not. Re-sequencing content is not the same as synthesis and therefore not the same as a person processing information and communicating their own conception of this. There's a vital difference.
    (We can even observe this in the resulting text: we immediately grasp the level of competence of the author, just by the way they take their path trough and at the matter. With LLMs, well, there's this even temperature, ready-made feeling, regulated by probability thresholds and RLHF sanctioned phrasing, also known as "slop" – even rhythmic intensifications, like "not this, not that, but…", which is actually a figure for a synthetic construct, don't help –, since the text isn't the trace or product of an actual organized thought – or, at least, an attempt at an organized thought.)
    PS: "empirical a priori judgement" was meant as translation of synthetisches Urteil a priori (Kant). I.e., our ability to mentally prove concepts like congruency, which are not a priori, but can be inferred without regression to empirical knowledge. Typically, this requires both our inner sense (time, sequence, etc.) and outer senses (space, configuration, etc.)
    Gormo5 minutes ago
    > I'm afraid, the essence is that is not. Re-sequencing content is not the same as synthesis
    Drawing different sources of information together into a single understanding is quite literally the definition of "synthesis" in this context. If that process is what you're referring to as "re-sequencing content", then it does fit the definition of "synthesis" in this discussion.
    If you're using the phrase "re-sequencing content" as a way of indirectly suggesting that LLMs aren't relating together multiple sources of information and combining them into a single expression, then that itself is the point of contention that we are arguing about.
    Perhaps you're trying to apply a philosophical concept of synthesis, e.g. that of Fichte or Hegel, but that definition applies to a specific type of philosophical analysis, and isn't quite the concept we're using in this discussion.
- vb-84482 hours ago
  I guess it's most appropriate so say "LOSSY COMPRESS".
- austinthetaco2 hours ago
  I just want to call out that this is a weirdly hostile and aggressive comment for a place like HN. HN is mostly used by working professionals it would be nice if people treated each other better here.
- zabzonk2 hours ago
  Except that LMMs don't work on individual words.
- guelo2 hours ago
  What is "Cope." supposed to mean here?
  - bigstrat20032 hours ago
    It is the imperative of "to cope". As in "cope and seethe", used as a dismissal.
Pennoungen02 hours ago
Yeah AI just actually plagiarize everything lel, sometimes even the source are..full of question and worst, my academical use it as a source...welp
2 hours ago
undefined
ciconia3 hours ago
> Is this what the pinnacle of human is? Lazy and greedy?
Apparently yes.
- mapcars3 hours ago
  AI has nothing to do with laziness or greediness. It makes things more efficient - and given that our time is limited strive for efficiency is a good thing.
  - xgulfie2 hours ago
    If you can't see greed in the LLM sphere you are not looking very hard.
    mapcars2 hours ago
    Did I say that there is no greed in LLM sphere? English is not my first language, still I'm pretty sure I didn't say that.
    xgulfie2 hours ago
    > AI has nothing to do with laziness or greediness.
2 hours ago
undefined
codexb2 hours ago
All innovation is theft. It builds directly on top of what came before.
"Good artists copy, great artists steal."
It's always been true. AI just makes it available to more people faster.
beej712 hours ago
I dunno. People do this exact thing by hand (digest everything they've read and produce something indirectly derivative--what author has not been so-influenced?) and it's not a copyright violation. It's just as impossible to dig around in a model to find Hamlet as it is to do digging around a human brain. And if the result is an obvious copy, then you have a violation no matter how it was created.
As someone who thinks humanity would be better off without LLMs, I want the assertion to be true, but I don't think it is.
- cheschire2 hours ago
  The author acknowledges this by saying “at a bigger scale”, implying there are smaller scale methods such as what you have said.
swader9992 hours ago
On one hand, there's nothing new under the sun. On the other, these llms are just copies of us and they owe the collective some due. The trajectory right now has money, power, control, policy and even free will going to a very small needle point of humanity. It's not aligned with humanity flourishing, it only makes sense if the goal is to replace the humans.
gagan2020an hour ago
How any content came into existence? Learning, Experience, connection, etc right? If AI is doing that then what's the problem? Printing Press was also disturbing status-quo of its time. Any frontier technologies at their time did that. Be it Fire, Wheel, Horse, Horse Saddle, Gun, Printing Press, Nuclear war heads, Computers, Internet, AI, etc.
Don't make it ethical question but understand its new frontier for humans.
rigonkulous3 hours ago
AI is human knowledge at scale, wanting to be free.
We built it, because we as humans intrinsically know that information should be free - always - and AI is a way to accomplish this, finally.
Extrinsically, we also have a subset of humans who do not want information to be free, because they desire to profit from the divide between free/non-free information.
I have been thinking a lot about Aaron Schwartz lately, and how un-just it is that he was persecuted for doing something that is so commonplace now, it is practically expected behaviour in the AI/ML realms. If he hadn't been targetted for elimination, I wonder just how well his ethos would have perpetuated into the AI age ..
- vb-84482 hours ago
  > We built it, because we as humans intrinsically know that information should be free
  I don't know if this statement is more stupid or naive ..
  - rigonkulous2 hours ago
    I could say the same of your position, honestly. Stupid, naive - or maybe just plain ignorant.
    If humans didn't want information to be free, there wouldn't be so much free information.
    Or did you not notice?
    vb-8448an hour ago
    You are confusing "slop" with "information", there is so much slop because it costs nearly 0 to be produced, but there's far less "information" than you are thinking.
  - lubujackson2 hours ago
    [dead]
- throwatdem123112 hours ago
  Current crop of AI is not free in the slightest. Open weight models are not free as in liberty and neither is the training data.
- pjc503 hours ago
  s/free/owned by a billion dollar megacorp/
  (AI output is very much not free in the resource consumption sense!)
  - rigonkulous2 hours ago
    Most resources are free until some company comes along and puts its brand on them.
    (Disclaimer: I only use free AI and will never pay for it. I think there is a growing segment of folks who agree with this sentiment, also ..)
- thedevilslawyer3 hours ago
  I agree with this sentiment. But as a community, this is hated because it impacts people's wages.
  It's the negative short term outlook of something that may be positive long term
  - konmok2 hours ago
    Sure, it could be positive in some distant future utopia.
    But the short-term impacts here and now are really, really bad. People are getting hurt (through water consumption, vibe-coded security disasters, IP theft, data center pollution, loss of job security and therefore healthcare in the US, LLM psychosis, inability to find reliable information, etc.) We're not actually obligated to sacrifice these people on the altar of "progress". We can slow down! When our society is capable of even somewhat protecting us from these harms, then maybe I'll stop being an LLM hater.
    rigonkulous2 hours ago
    We absolutely have negative cases - but these do not outweigh the positive cases. There is no distant utopia - right now, people are becoming extremely capable because of their personal use of AI - there is also a position on the other side of the curve, where people are becoming more incompetent because of AI.
    But guess what, it has always been so with technology - and we are only here and now because the positive use of it overshadows the negative use of it, whether that 'it' is the wheel, or AI.
    I choose not to be an LLM hater, but to also not be an LLM customer - simply because I do not want to reward other humans who are thwarting the freedom of information. I'd much rather live in a society where everyone can study anything than one which requires permission to do anything even remotely interesting from the perspective of applied information. I suspect most would too, or at least that's the hope - because, otherwise, the distant utopia you dream of isn't of any consequence...
    throwaway6137462 hours ago
    [dead]
  - short_sells_poo2 hours ago
    It's not hated because it impacts people's wages, although that perhaps factors into the hate. It's hated because AI is not a public good. The LLMS today are owned by megacorporations who harvested a public good for private gain.
    This is not some altruistic entity striving for the betterment of humankind. Practically nothing that comes out of the techbro culture is. This is pure and simple greed and the chances that AI can be a vehicle of altruism when it is owned by megacorps is basically zero.
    thedevilslawyer43 minutes ago
    Oh please! If everyone could keep their older jobs as is + allowed to use LLMs, everyone would be gushing about how beneficial it is, and how they are now free to pursue other things.
    All the other reasons are rationalizations. The fact that it's hitting wages is what's causing the doomerism (and boosterism).
  - vee-kay2 hours ago
    [dead]
- Findecanor2 hours ago
  What a naive and simplistic view.
  People want to be recognised for their contributions to society. People want to be treated fairly. Most scientific articles, as well as all text on the free web is already free information. It used to be difficult to search, categorise and summarise that information. There exist AI tools for that — and that is the good AI.
  What also exists now are automated plagiarism and mash-up tools: that can take someone's article, change the words and churn out a new article that people can put their name on. There are scumbags that sell services for exactly that. And there are big tech firms that are operating in a very grey area.
  Aaron Schwartz had broken a paywall. He did not anonymise the article authors.
  You, and AI-bros like you remind me of one the people behind Pirate Bay when I argued with him back in the '90s, who used that same "information wants to be free" to justify software piracy.
  - rigonkulous2 hours ago
    There is far more free information than non-free information, and it has always been so - or else we wouldn't be here in the first place.
    >Aaron Schwartz had broken a paywall. He did not anonymise the article authors.
    AI bro's are doing this now, every second of the day.
    And, without software piracy, we simply wouldn't have the technology we have today. Knowledge-gatekeeping profit-seekers would very much like for most of us to ignore this fact: there is far more free information in the world than non-free information, and it must be so, well into the future, if we are to survive as a species.
    It doesn't matter what authority believes they have the right to gatekeep information. It will always escape their grip. Some of us are ideologically aligned with this mechanism, promote it, and ensure it happens. Thank FNORD.
kolinko2 hours ago
Years ago i published slides on Slideshare that were viewed almost two million times. And helped me build a business.
There were people that learned knowledge from myself, and then made their own tutorials and promote these. It hadn't crossed my mind to complain about that. AI changes very little here.
What really changes things is not people republishing my materials, but people using agents to read my materials, and to get knowledge reformatted into something that they like.
If my slides were published today, they would probably be read verbatim by a handful of humans. The rest would be agents, but I'm ok with that. The business case is the same -- I want whatever reads the slide to be encouraged to use my tool. What kind of entity, I don't really care (again: from purely business perspective)
noobermin2 hours ago
At this point, I think google, openai, anthropic, etc already realise this and are just trying to pretend this isn't true. I even think some C-suite who are not in AI companies but are boosters know this too. This has been true since 2022 but they're hoping (likely correctly) that governments won't move fast enough to protect the IP of the actual productive class.
I think the long term reality is that the models still need training data so they fundamentally do need new writing/code/art to train on, and even then the usual issues like hallucination will still be with us. It's just the moment that actually hurts the (already questionable) profitability of the model peddlers, they will have gotten their IPOs and they can safely jump ship and the ultimate mess can be passed to the softbanks, the temaseks, and the governments of the world to clean up for them. What the future holds after the crash I'm not sure as the models won't disappear (especially now that the stolen data is already crystalised in open source models) but in the near term the mass theft that constitutes llms will become more and more understood even amongst the PMC and that in order to remain viable, you need the productive to keep producing, and unlike LLMs, you can't force them to do it without payment.