AI's Unpaid Debt: How LLM Scrapers Destroy the Social Contract of Open Source(www.quippd.com)

63 pointsby birdculture2 months ago5 comments

p0w3n3d2 months ago
Normally people get punished for downloading illegal books. Allegedly someone at meta downloaded hella ton of illegal books and taught the LLM on them and they said "oh it was for his/hers private usage". You won't get justice here
- muldvarp2 months ago
  This to me is the most ridiculous thing about the whole AI situation. Piracy is now apparently just okay as long as you do it on an industrial scale and with the expressed intention of hurting the economic prospects of the authors of the pirated work.
  Seems completely ridiculous when compared to the trouble I was in that one time I pirated a single book that I was unable to purchase.
  - Llamamoe2 months ago
    We've essentially given up on pretending that corporations are also held accountable for their crimes in the recent years, and I think that's more worrying than anything.
  - lifestyleguru2 months ago
    Hollywood and media publishers run entire franchises of legal bullies across developed world to harass individuals, and lobby for laws allowing easy prosecution of ISP contract owner. Even Google Books was castrated because of IP rights. Now I have hard time to imagine how this IP+AI cartel operates. Nowadays everyone and their cat throws millions on AI so I imagine IP owners get their share.
  - p0w3n3d2 months ago
    Recently archive.org got into trouble for renting one book (or fixed amount of books) exclusively on the whole world, like in a library. Sad men from law office came and made an example of them, but it seems that if they used those books to teach AI and serve the content in "remembered" way, they would get away with it.
  - pcthrowaway2 months ago
    > Seems completely ridiculous when compared to the trouble I was in that one time I pirated a single book that I was unable to purchase.
    How would one manage to get in trouble for pirating a book? Unless you mean with your employer for doing it on their network or something?
  - Mathnerd3142 months ago
    Well, so what the actual ruling was was that use of the books was okay, but only if they were legally obtained. And so the authors could proceed with a lawsuit for illegally downloading the books. But then presumably compensation for torrenting the books was included as part of the out of court settlement. So the lesson is something like AI is fine, but torrenting books is still not acceptable, m'kay wink wink.
- 2 months ago
  undefined
citizenpaul2 months ago
I'm not sure how this is much different then Amazon which has basically monetized the entire Apache Software Foundation and donates a pittance back to them in the single digit millions when they are profiting in the trillions.
- y0eswddl2 months ago
  It's not different.
  There's also a huge problem with for-profit companies building on the work of FOSS without contributing resources or knowledge back.
  - p0w3n3d2 months ago
    Nor sources
AndrewKemendo2 months ago
This article could just have been a link to the tragedy of the commons Wikipedia page
Humans destroying common resources until depleted is a feature not a bug
- NoraCodes2 months ago
  This is quite literally the opposite of the tragedy of the commons.
1gn152 months ago
This article commits several common and disappointing fallacies:
1. Open weight models exist, guys.
2. It assumes that copyright is stripped when doing essentially Img2Img on code. That's not true. (Also, copyright != attribution.)
3. It assumes that AI is "just rearranging code". That's not true. Speaking about provenance in learning is as nonsensical as asking one to credit the creators of the English alphabet. There's a reason why literally every single copyright-based lawsuit against machine learning has failed so far, around the world.
4. It assumes that the reduction in posts on StackOverflow is due to people no longer wanting to contribute. That's likely not true. Its just that most questions were "homework questions" that didn't really warrant a volunteer's time.
- bicepjai2 months ago
  I love the LLM tech and use them everyday for coding. I don’t like calling them AI. We can definitely argue LLMs are not just rearranging code. But let’s look at some evidence that shows otherwise. Last year NYT lawsuit that show llms has memorized most of the news text, you had see those examples. Recent not-yet peer reviewed academic paper “Language Models are Injective and Hence Invertible “ shows llms just memorized training data. Also this https://youtu.be/O7BI4jfEFwA?si=rjAi5KStXfURl65q recent defcon33 talk shows so much ways you can get training data out. Given all these, it’s hard to believe they are intelligently generating code.
- p0w3n3d2 months ago
  Reg. 3 AI is a lossy compression of text indeed. I recommend youtubing "karpathy deep dive LLM" (/7xTGNNLPyMI) - he shows that the open texts used in the training are regurgitated unchanged when speaking to the raw model. It means that if you say to the model "oh say can you" it will answer "see by the dawn's early light" or something similar like "by the morning's sun" or whatever. So very lossy but compression, which would be something else without the given text that was used in the training
fithisux2 months ago
Personally I view the usage of AI as fencing.
- stuaxo2 months ago
  Thank you for this wonderfully succinct description, I shall steal it.
  - djmips2 months ago
    without attribution?