GenAI, the snake eating its own tail(www.ybrikman.com)

99 pointsby brikis9817 days ago22 comments

ropable17 days ago
Everyone here throwing shade at Stack Overflow is clearly too young to remember the horror of Experts Exchange and every other technical help site prior to SO. For nearly a decade, it was absolutely transformative as a technical help resource. It certainly had its faults, but it was so far ahead of the other options as to be game-changing.
I believe that the main reason for SO's decline starting around 2018 was that most of the core technical questions had been answered. There was an enormous existing corpus of accepted answers around fundamental topics, and technology just doesn't change fast enough to sustain the site. Then the LLMs digested the site's (beautifully machine-readable) corpus along with the rest of the internet and now the AIs can give users that info directly, resulting in a downward spiral of traffic to SO, fewer new questions, etc.
Vale, Stack Overflow. You helped me solve many tricky problems.
- PeterStuer17 days ago
  SO was great for a while, then went down the toilet, not because 'eveything had been answered', but because it became a playground of power hungry mods vs resume grinding freshmen patronizing and shutting down 90% of 'normal' users.
  Even current AI is a 100x better experience than SO ever was.
  We can all see how post knowledge scarecity and automated contextual niche adaptation reduces exploitation potential for knowledge production (often itself mere regurgitation), but the 'cures' proposed in the article feel very much worse than the disease.
  - thr0waw4yz17 days ago
    The issue with SO is also that the quality of the answers degrades over time. Asking similar questions will get closed as dupes while the referenced 2011 answer is basically useless nowadays.
  - rcxdude17 days ago
    I didn't help that SO had some very quirky expectations of how it should run, and failed to communicate those well, causing a constant friction between moderators and users. Also, there was often friction between the site admins and moderators, causing them to lose a lot of moderators over time as well.
  - 17 days ago
    undefined
- ralph8417 days ago
  Prior to SO we had Usenet, mailing lists, and IRC. They weren't so bad before spammers found them.
  - BrouteMinou17 days ago
    Funny enough, you think that irc is dead, like most people, and the spammers... but let me tell you a little secret here:
- go_photon_go17 days ago
  Stack Overflow should pivot to be an AI agents/local LLM information board and advertise the non-stack overflow stack exchange sites. There is a lot they could do.
  - 17 days ago
    undefined
- raincole16 days ago
  > I believe that the main reason for SO's decline starting around 2018 was that most of the core technical questions had been answered
  I believe that the rise of SO was mostly a miracle. A once-in-an-era thing. The evidence is all the other Stack Exchange sites. They all have the same UI and the same moderation model. If SO has some secret sauce they have it too. But most of them were pretty dead and never became an enormous corpus.
- sharadov17 days ago
  At one point I was in the top 10 in the Experts Exchange Leaderboard! It sucked as a platform, but I did learn a lot helping answer questions.
cthalupa17 days ago
This is an article that I agreed with more reading the headline than I did when I finished reading the article itself.
Stack Overflow peaked in 2014 before beginning it's downward decline. How is that at all related to GenAI? GPT4 is when we really started seeing these things get used to replace SO, etc., and that would be early 2023 - and indeed the drop gets worse there - but after the COVID era spike, SO was already crashing hard.
Tailwind's business model was providing a component library built on top of their framework. It's a business model that relies on the framework being good enough for people to want to use it to begin with, but being bad enough that they'd rather pay for the component library than build it themselves. The more comfortable it is to use, the more productive it is, the worse the value proposition is for the premium upsell. Even other "open core" business models don't have this inherent dichotomy, much less open source on the whole, so it's really weird to try and extrapolate this out.
The thing is, people turn to LLMs to solve problems and answer questions. If they can't turn to the LLM to solve that problem or answer that question, they'll either turn elsewhere, in which case there is still a market for that book or blog post, or they'll drop the problem and question and move on. And if they were willing to drop the problem or question and move on without investigating post-LLM, were they ever invested enough to buy your book, or check more than the first couple of results on google?
- worik17 days ago
  I feel no nostalgia for Stackoverflow.
  I always found it very frustrating that for a person at the start of the learning curve it was "read only"
  Actually asking a naive question there was to get horribly flamed on the site. It, and the people using it, were very keen to explain how stupid you were being
  LLMs on the other hand are sweet and welcoming (to a fault) of the naive newbie
  I have been learning to use Shell script with the help of LLMs, I could not achieve that using SO
  Good riddance
  - apercu16 days ago
    That was sort of my experience as well. I already had more than a decade of development experience, asked maybe two questions in a 6 month window (not easy ones), and instead of answers got negative remarks. Never logged in again.
    I should note that the first time I asked question, I also spent hours reviewing other unanswered requests - and if memory serves me - while I didn't have solutions for any of them, I gave concrete examples of how to debug the problem(s) to narrow down the variables.
    I'd estimate that this was around 2012. Never went back after that.
  - georgefrowny17 days ago
    It was very strange to me how StackOverflow consistently, every time it was mentioned everyone had exactly the same complaints and nothing changed. They can't have been unaware of the reputation or the sinking activity rates.
    shagie17 days ago
    There are multiple groups on Stack Overflow with different (and sometimes conflicting) goals and desires.
    Corporate measured "engagement" and has been trying things to make that number go up.
    The curators of the site... if they could have tools to measure would be measuring the median quality of the questions being asked and the answers being given.
    People asking questions on the site have changed from the "building a library goal" with the question as a prompt to "help me with this problem" - but rarely not sticking around.
    ---
    The sinking activity rates have had alarms going for many years... but remember that engagement was being measured and while that's sinking, comments were engagement so the numbers (ad impressions) at corporate level were getting measured differently.
    The reputation has been something, but there's a disconnect between what "hostile" means and what "toxic" means between the people making the claims and how it's being interpreted.
    That reputation was interpreted (by corporate and to an extent, diamond moderators) as "people are mean in comments" - and that isn't the case. People are not mean in comments. However, the structure of the site being focused on Q&A rather than discussion for someone who wants discussion with the people who are there to provide answers to questions will find the environment innately hostile.
    Without changing the site from a Q&A (and basically starting over - which corporate has tried, but the people who are providing quality answers aren't going there because they don't want discussions - if they wanted discussions they would be commenting on HN or Reddit), that change can't really be done. The attempts to try to change how people are approaching the site run into a "this would reduce 'engagement'" and people asking questions to get help for their problem not accepting the original premise of building a library. ... And that's resulted in conflict and decreasing curation (which are often the people who were the ones providing the expert answers).
    ----
    So while they have been aware, (I believe) corporate has been trying to solve the wrong problems at odds with both the people asking questions ("help me now") and the remaining curators.
    geoduck1417 days ago
    >So while they have been aware, (I believe) corporate has been trying to solve the wrong problems at odds with both the people asking questions ("help me now") and the remaining curators.
    This feels spot on
    pixl9716 days ago
    >They can't have been unaware of the reputation
    Management can be idiots longer than a site can remain solvent.
    direwolf2016 days ago
    For years every single company announcement has been downvoted to -500 and flooded with comments and answers saying it's terrible and what the real problems are. They don't care. Now they are finding out.
  - kamaal17 days ago
    >>Actually asking a naive question there was to get horribly flamed on the site. It, and the people using it, were very keen to explain how stupid you were being
    AI's biggest feature is being able to ask it question and not getting humiliated and judged in the process.
- nomadygnt17 days ago
  I see what you mean, but the problem is that the LLM provider is trying to provide all the value from the book to the user without the user needing to look at the book at all. I agree if the LLM fails to do so then there is a market for the book. But the LLM provider is trying to minimize that as much as possible. And if the LLM succeeds at providing all the value of the book to the user, without providing any value to the book creator, then in the future there is no incentive to create the book at all, at which point the LLM has no value to provide, etc etc etc.
  - cthalupa17 days ago
    Sure - but I think this makes it a self equalizing problem, more than it eating it's own tail.
- randomNumber717 days ago
  Regardles of the graphs, if there would be no LLMs junior devs would still use stackoverflow at least sometimes.
  Before the LLM time there was actually the problem that google often showed SEO spam sites that harvested content from stackoverflow.
nubg17 days ago
Isn't this a self solving problem? If LLMs won't be able to solve a certain problem, arbitrage opportunities open to solve it as a human (by writing a blog post for example), just like in the old days, people typically did this for fame. I think LLMs can be understood as a really good way to search existing solutions, including combining multiple solutions in the fly. Only if something cannot be "found", we go back to the dynamics of before LLMs.
- direwolf2016 days ago
  There's zero fame for writing a blog post that's ingested by LLMs and adds to their knowledge base so they answer the question correctly next time.
- falloutx16 days ago
  There may like 0.001% of the general population who would ever write a blog post, most of the LLM data in future would be fake blog posts used to trick LLMs or just spam.
thestructuralme16 days ago
The “snake eating its own tail” frame is real, but it’s not mystical — it’s incentives + sampling.
If the web gets flooded with LLM output and you train on it naively, you’re effectively training on your own prior. That pushes models toward the mean: less surprise, less specificity, more template-y phrasing. It’s like photocopying a photocopy: the sharp edges disappear.
The fix isn’t “never use synthetic data.” It’s to treat it like a controlled ingredient: tag provenance, keep a high-quality human/grounded core, filter aggressively, and anchor training to things that don’t self-contaminate (code that compiles/tests, math with verifiable proofs, retrieval with citations, real user feedback). Otherwise the easiest path is content volume, and volume is exactly what kills signal.
- iwontberude16 days ago
  LLMs will always be just a little too random or a little too average. There in is the hidden beauty of AI: elevating the trust in peoples diverse experiences.
  Humans are amazing machines that reduce insane amounts of complexity in bespoke combinations of neural processors to synthesize ideas and emotions. Even Ilya Sutskever has said that he wasn't and still isn't clear at a formal level why GPT works at all (e.g. interpretability problem), but GPT was not a random discovery, it was based on work that was an amalgamation of Ilya and others careers and biases.
aogaili17 days ago
Companies will provide incentives for people to generate authentic content. Example X giving $1M reward for the top Articles. Because they can use it for training.
In fact, this might be overall good thing, because finally original content will be highly on demand since those companies now use to train their models. But we are probably just in a transition phase.
The other thing is that new sources of input will come, from LLM usage probably, so they cut the middle layer, users input in the LLM is also a form of input, and a hybrid co-creation between users/AI would generate content at much faster rater, which again would be used to train the model, and that would improve their quality.
- randomNumber717 days ago
  How do you evaluate if content is authentic?
  - asdff17 days ago
    Eventually it won’t matter. People will forget what it was like to have authentic content. They’d look at it like how we might view monks in the middle ages copying the bible a hundred times by hand.
    Even today people don’t even trust themselves to write an email. You see people on HN and reddit openly admitting they used ai to help make their post because they believe they cannot write. The march to illiteracy and ignorance is already taking foot.
    falloutx16 days ago
    > People will forget what it was like to have authentic content.
    This is already true. Most people on the internet are stuck in tiktok loop anyway. Even chatgpt would struggle to get them. By the time LLMs get good enough, a lot of people who were willing to pay for them would have lost their jobs and they would not be paying for something that costed them their job, many will quit the internet fully. We are looking at value being driven out of internet at a grand scale.
    hyperadvanced17 days ago
    This is a plain fact. The amount of obviously GPT’d text out there is so breathtaking. The good news is that all of this shit people are squeezing out of GPT is really reinvigorating my desire to work on novel creative projects.
    In nearly every form of entertainment over the past 2 years, quality has degraded rapidly, to the point where most Reddit threads really aren’t worth reading - they’re all chock full of It’s Not X It’s Y! Most movies are heavily re-explained plots and reboots of old IP, CGI, dialogue glued on after the fact. Pop music has been on life support for 10 or 20 years, and that was before anyone could AI generate whatever derivative sonic slop they wanted to create. Books and publishers are holding strong to avoid GPT, but the desire for Dragon smut will ultimately overwhelm any aesthete’s preference for originality.
    pixl9716 days ago
    >In nearly every form of entertainment over the past 2 years
    I mean, it really was on a huge downward trend before two years ago, and you hit on much of that.
    Social media and the online advertising age had really destroyed a lot of entertainment well before AI was an issue. Bot-like humans would just copy and mish-mash existing media poorly and attempt to gather the ad dollars for themselves.
    Honestly a lot of the enjoyable content I watch these days is from individuals/groups that aren't chasing ad views, but doing more 'donation begging' from their own audience which means they must maintain some level of quality for continued patronage.
    One particular problem we seem to have is that we see AI as 'the problem' and not just one issue in a problematic system. Why doesn't Reddit do anything about crap posts and content? They make money off of convincing advertisers that lots of views happen on their site and they should sell ads. Why doesn't Google improve their services? They are a near monopoly in online advertising and service improvement for the end user doesn't increase their revenue.
    It's just a system where one group is trying to extract wealth from Google/Facebook whereas Google/Facebook have nearly perfected extracting wealth from you.
- falloutx16 days ago
  > Example X giving $1M reward for the top Articles.
  Oh boy I can already see what kind of articles those would be.
mikestorrent17 days ago
> he took a PDF of my book Terraform: Up & Running, uploaded it into a GenAI tool, and asked the tool to follow the guidance in the book to generate Terraform code
This is ridiculous - AI doesn't need to be fed a PDF of a Terraform book to know how to Terraform. Blowing out context with hundreds of OCR'd pages of generic text on how to terraform isn't going to help anything.
The model that is broken is really ultimately going to be "content for hire". That's the industry that is going to be destroyed here because it's simply redundant now. Actual artwork, actual literature, actual music... these things are all safe as long as people actually want to experience the creations of others. Corporate artwork, simple documentation, elevator music.... these things are done; I'm sorry if you made a living making them but you were ultimately performing an artisinal task in a mostly soulless way.
I'm not talking about video game artists, mind you, I'm talking about the people who produced Corporate Memphis and Flat Design here. We'll all be better off if these people find a new calling in life.
- pixl9716 days ago
  >I'm not talking about video game artists
  You are talking about some video game artists, and while not their fault directly, EA pushing out SportsBall 2026 for the 40th year in a row is just a soulless corporate money printing machine.
sharadov17 days ago
Sounds a lot like how the original web was envisioned as a 2 way street, where people were paid for their "attention".
Rather we became the product.
worik17 days ago
The old model of selling eyeballs to advertisers was horrible
I do not know what will replace it, but I will not miss websites trying to monetise my attention
- loudmax17 days ago
  The GenAI providers will certainly explore advertisement revenue. They're not doing much of it yet because they're trying to gain market share while they figure out what what pain threshold of advertising their users will tolerate.
  People today may have a better sense of the downsides of ad-based services than we did when the internet was becoming mainstream. Back then, the minor inconvenience of seeing a few ads seemed worth all the benefits of access all the internet had to offer. And it probably was. But today the public has more experience with the downsides of relentless advertising optimization and audience capture, so there might be more business models based on something other than advertising. Either way, GenAI advertising is certainly coming.
  - apercu16 days ago
    "Back then, the minor inconvenience of seeing a few ads seemed worth all the benefits of access all the internet had to offer"
    Yes and no. I complained about ads to a partner back in 1999. He seemed surprised, and said something to the effect of "well that's how the content you are consuming is paid for".
    My argument (25 years ago now) was that it wasn't the ads as much as it was the ads blocking content, slowing the page load, being intrusive, etc.
    So, even "Back Then" it was an issue. Now, it's on steroids with all the aforementioned behaviours being even worse now (e.g., reload an entire page on mobile in order to load a new ad, pages jumping around as new ads are loaded, more pop-ups, etc.) but exacerbated by the privacy nightmare of weaponized data collection.
    I'm not disagreeing with you, I just think the underlying issues were evident very early in the "www" environment.
  - 3vidence17 days ago
    There has been a very public announcement that OpenAI will start exploring ads in their models https://openai.com/index/our-approach-to-advertising-and-exp...
- randomNumber717 days ago
  It was easier to install adblock than to lobotomize a LLM (for which you likely don't even have the weights), so I don't know about that one.
  - Imustaskforhelp17 days ago
    To be fair yes, but I am finding that its much easier to lobotomize in the sense that atleast open source models like glm 4.7 are really permissive & even some non open source models are permissive as well.
    of course chatgpt might deny it but It is just the tip of the ice berg.
    My worries actually are that we might use these models and think we are private but when in actuality, we are not. We are probably gonna see an open source model which is really good for such purposes while being able to run on normal hardware (macs etc.) without much hassle. I tried liquidfm on my mac and its a 1B model and it has some flaws and isn't completely uncensored but I don't know to me it does feel like an more compact and even uncensored model can be built for very simple purposes.
    immibis16 days ago
    [dead]
pdyc17 days ago
Strange nobody provided counterpoint. I would argue that questions meant for stackoverflow have shifted to genai tools and they have even richer data, questions, background of user and accepted answer so they dont need so data to improve. Side-effect of this is whoever gets most users wins and continues to win leading to monopoly as data is not public
jarjoura17 days ago
This is exactly the sentiment I have been trying to articulate myself.
The ONLY reason we are here today is because OpenAI, and Anthropic, by extension, took it upon themselves to launch chat bots trained on whatever datasources they could get in a short amount of time to quickly productize their investments. Their first versions didn't include any references to the source material, and just acted as if they knew everything.
When CoPilot was built as a better auto-complete engine, trained on opensource projects, it was an interesting idea, because it doing what people already did. They searched GitHub for examples of the solution or nudged them in that direction. However, the biggest difference, using other project code was stable, because it came with a LICENSE.md that you then agreed to, and paid it forward. (i.e. "I used code from this project").
CoPilot initially would just inject snippets for you, without you knowing the source. It was only later, they walked that back and if you did use CoPilot, it shows you the most-likely source of the code it used. This is exactly the direction all of the platforms seem headed.
It's not easy to walk back the free-for-all system (i.e. Napster), but I'm optimistic over time it'll become a more fair, pay to access system.
Imustaskforhelp17 days ago
Ironic that they created this post and I immediately felt like referencing ouroborous even when I hadn't looked at the website but then I opened it and the first thing I see is an AI generated image of Ouroborous
Like the irony is pretty deep with this one about this.
I am not sure if they could've gotten trademark from Inscryption/if they needed it but if they really wanted, I have found inscryption's ouroborous card to look the best and it was honestly how I discovered ouroborous in the first place! (became my favourite card, I love inscryption)
https://static1.thegamerimages.com/wordpress/wp-content/uplo...
Even just searching Ouroborous on internet gave me some genuinely beautiful Ouroborous illustrations (Some stock photos, some not) but even using a stock photo might have made a better idea than using AI generated Ouroboros photo itself?
- tines17 days ago
  Inscryption is the GOAT, beautiful game. Played it when I had Covid.
  - Imustaskforhelp17 days ago
    > Inscryption is the GOAT, beautiful game. Played it when I had Covid.
    Oh Man, Y'know I once beat Inscryption's kaycee's mods challenge without doing any blood sacrifice the whole run after I got bored when I had defeated all kaycee's mods
    The gecko deck was amazing. I even played some mods of Inscryption. It was so lovely.
    The people who play inscryption & people who don't aren't the same lol.
    Played it during covid/end of it too & I still sing its
    Let's discuss some Inscryption, tell me your favourite card combo.
    Mine was ouroborous + the sigil which made it so that whenever a card died you got it back in hand, so now what you got was lets say you have ouroborous and some other card, you get ouroborous card placed lets say on the deck and then you get a squirrel and you have a wolf
    you place the squirrel and then kill the ouroborous and squirrel to get wolf but now ouroborous gets back to your hand with better stats for the whole game, then it becomes so strong that you just need a magpie card or such sigil to then look for ouroborous card to make it game over.
    Combining Ouroborous with the cockroach sigil + the three blood sigil essentially is an unlimited win game if you put the cockroach sigil on the blood goat too.
    Gecko/Stoat are my second best favourite. before doing puzzles you could stack as many sigils as possible and I had the best stoat in the whole world lol, had like 4-5 sigils iirc :)
    What about ya! There were also some really fascinating mechanics in the mods that I played too, definitely worth a play!
    tines16 days ago
    Man it’s been a while so I don’t remember all the details, but I think I created a card that was a combination of the Mantis God and the Ouroboros or something like that, so this card had triple strike and was hitting for like 60 damage a turn lol.
    The aesthetic of the game is just so beautiful too. One of my all time favorites.
aeon_ai17 days ago
GenAI changes the dynamics of information systems so fundamentally that our entire notion of intellectual property is being upended.
Copyright was predicated on the notion that ideas and styles can not be protected, but that explicit expressive works can. For example, a recipe can't be protected, but the story you wrap around it that tells how your grandma used to make it would be.
LLMs are particularly challenging to wrangle with because they perform language alchemy. They can (and do) re-express the core ideas, styles, themes, etc. without violating copyright.
People deem this 'theft' and 'stealing' because they are trying to reconcile the myth of intellectual property with reality, and are also simultaneously sensing the economic ladder being pulled up by elites who are watching and gaming the geopolitical world disorder.
There will be a new system of value capture that content creators need to position for, which is to be seen as a more valuable source of high quality materials than an LLM, serving a specific market, and effectively acquiring attention to owned properties and products.
It will not be pay-per-crawl. Or pay-per-use. It will be an attention game, just like everything in the modern economy.
Attention is the only way you can monetize information.
- 17 days ago
  undefined
- bitwize17 days ago
  No. The idea-expression dichotomy is a common myth about copyright law, right up there with "if I already own the physical cartridge, downloading this game ROM is OK".
  The ONLY things that matter when determining whether copyright was infringed are "access" and "substantial similarity". The first refers to whether the alleged infringer did, or had a reasonable opportunity to, view the copyrighted work. The second is more vague and open-ended. But if these two, alone, can be established in court, then absent a fair use or other defense (for example, all of the ways in which your work is "substantially similar" to the infringed work are public domain), you are infringing. Period. End of story.
  The Tetris Company, for example, owns the idea of falling-tetromino puzzle video games. If you develop and release such a game, they will sue you and they will win. They have won in the past and they can retain Boies-tier lawyers to litigate a small crater where you once stood if need be. In fact, the ruling in the Tetris vs. Xio case means that look-and-feel copyrights, thought dead after Apple v. Microsoft and Lotus v. Borland, are now back on the table.
  It's not like this is even terribly new. Atari, license holders to Pac-Man on game consoles at the time, sued Philips over the release of K.C. Munchkin! on their rival console, the Magnavox Odyssey 2. Munchkin didn't look like Pac-Man. The monsters didn't look like the ghosts from Pac-Man. The mazes and some of the game mechanics were significantly different. Yet, the judge ruled that because it featured an "eater" who ate dots and avoided enemies in a maze, and sometimes had the opportunity to eat the enemies, K.C. Munchkin! infringed on the copyrights to Pac-Man. The ideas used in Pac-Man were novel enough to be eligible for copyright protection.
  - aeon_ai17 days ago
    This is one of those fun "achsully" responses I get the privilege to refute.
    It's a foundational principle of copyright law, codified in 17 U.S.C. § 102(b): "In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery"
    Now, we can quibble over what qualifies there, but the dichotomy itself is pretty clear.
    This goes back to Baker v. Selden (1879) and remains bedrock copyright doctrine.
    The Tetris case is overstated. Tetris v. Xio did not establish that The Tetris Company "owns the idea of falling-tetromino puzzle video games." The court explicitly applied the idea-expression dichotomy and found Xio copied specific expressive choices (exact dimensions, specific visual style, particular piece colors). Many Tetris-like games exist legally, and it is the specific expressive elements that were considered in the Xio case.
    K.C. Munchkin is old and criticized. That 1982 ruling predates major developments like Computer Associates v. Altai, which established more rigorous methods for filtering out unprotectable elements. The Munchkin decision continues to be debated.
    "Substantial similarity" analysis itself incorporates idea-expression filtering. Courts use tests specifically designed to separate protectable expression from unprotectable ideas, especially when considering the four factors of fair use (when applied as a defense.)
  - midas8917 days ago
    I think what you'll find is that most aren't happy with the current copyright law anyway (I include myself in that group) or don't understand it or don't agree with it, and thus will just shrug.
    For example, copyright duration is far longer than most people think (life of author plus seventy (or plus ninety-five years if corporation). Corporations treat copyright as a way to create moats for themselves and freeze competitors than as a creative endeavor. Most creative works earn little to nothing anyway, while a tiny minority generate the most revenue. And it's not easy to get a copyright or atleast percieved to be easy, so again it incentivises those that can afford lawyers to navigate the legal environment. Also, enforcement of copyright law requires surviellance and censorship.
    Truthfully I think there will be a time when people will look at current copyright law the same way we now look at guilds in the middleages.
    Peritract16 days ago
    It's incoherent to be anti-copyright because it's used to freeze out competition by corporations and be pro-AI (which is exactly that, at vastly greater scale).
locusofself17 days ago
I feel like the only solution to the problem is democratized RLHF, where whenever we get a bad answer from an LLM, we can immediately tell it what was wrong and it can learn from that.
- schmichael17 days ago
  If you're paying to use the model that means instead of paying content creators you're also now giving more content to the model for free.
  Also just like SEO to game search engines, "democratized RLHF" has big trust issues.
- randomNumber717 days ago
  Maybe what is bad for you would be right for me.
josefritzishere16 days ago
AI solves few problems that humans can't already solve, and creates systemic problems for which there are no solution. It's poison.
- apercu16 days ago
  I have a lot of opinions about LLMs, and most are not positive due to the error rate, the security nightmare, the IP issues, the ecological impact and the hype, hype, hype that stands in direct opposition from my years of experiments (starting in 2019).
  But, I have to admit that a few years ago LLMs became part of my daily workflow for low-risk repetitive tasks, and sometimes even "sound boarding".
turnsout16 days ago
Foundation model sponsors already pay humans to generate authentic content, especially in technical areas that are underrepresented in general internet scrapes. I would imagine that this trend will continue.
Further, the "model collapse" hypothesis of 2020/2021 seems to have failed to materialize. Maybe we're still too early, and we're not yet seeing negative effects of OpenAI training on OpenAI output. But maybe "slop" is not being rewarded as much as human content, and having humans in the loop (even as readers) is preventing a slide into incoherence.
Will LLMs eventually disincentivize people from producing and publishing new original content? If that content is easily replicated by an LLM query, maybe. And maybe it's not the worst thing in the world. 5 years ago I would have bought an "FFmpeg Cookbook" from O'Reilly, but now I would just tell Claude exactly what I'm trying to achieve. As a consumer, I'm better off, and arguably we've saved the author of a hypothetical FFmpeg Cookbook weeks out of their precious life. Weeks they could spend doing something—anything—more valuable than rewording FFmpeg documentation.
mrcwinn17 days ago
Pay per crawl of StackOverflow wouldn't encourage me to post more on StackOverflow. (Not that I was anyway.) Presumably you'd need to pay content creators, but that seems quite inefficient:
1. I pay OpenAI 2. OpenAI rev shares to StackOverflow 3. StackOverflow mostly keeps that money, but shares some with me for posting 4. I get some money back to help pay OpenAI?
This is nonsense. And if the frontier labs are right about simulated data, as Tesla seems to have been right with its FSD simulated visualization stack, does this really matter anyway? The value I get from an LLM far exceeds anything I have ever received from SO or an O'Reilly book (as much as I genuinely enjoy them collecting dust on a shelf).
If the argument is "fairness," I can sympathize but then shrug. If the argument is sustainability of training, I'm skeptical we need these payment models. And if the argument is about total value creation, I just don't buy it at all.
- lbrito17 days ago
  >If the argument is sustainability of training, I'm skeptical we need these payment models.
  That seems to be the argument: LLM adoption leads to drop of organic training data, leading LLMs to eventually plateau, and we'll be left without the user-generated content we relied on for a while (like SO) and with subpar LLM. That's what I'm getting from the article anyway.
  - TeMPOraL17 days ago
    There are so many things wrong with the points this article repeats, but those are soundbites at this point so I'm not sure one can even argue against them anymore.
    Still, for the one about organic data (or "pre-war steel") drying out, it's not a threat to model development at all. People repeating this point don't realize that we already have way more data than we need. We got to where we are by brute-forcing the problem - throwing more data at a simple training process. If new "pristine" data were to stop flowing now, we still a) have decent pre-trained base models, and a dataset that's more than sufficient to train more of them, and b) lots of low-hanging fruits to pick in training approaches, architectures and data curation, that will allow to get more performance out of same base data.
    That, and the fact that synthetic data turned out to be quite effective after all, especially in the latter phases of training. No surprise there, for many classes of problems this is how we learn as well. Anyone who has experience studying math for maturity exam / university entry exams knows this: the best way to learn is to solve lots of variations of the same set of problems. These variations are all synthetic data, until recently generated by hand, but even their trivial nature doesn't make them less effective at teaching.
    pixl9716 days ago
    >We got to where we are by brute-forcing the problem
    This has been a bit of a concern of mine. That we have to do things the hard way for a long time, and in doing so make a massive amount of fast hardware. Then we get some breakthru that massively drops the amount of compute necessary, the surplus we suddenly have may lead to some kind of AI capability explosion.
  - mapontosevenths17 days ago
    The article gets the part about organic data dying off right. Look at Google SERP's for an example. Almost nobody clicks through to the source anymore, so ad revenue is drying up for them and people are publishing less or publishing in places that pay them directly and live behind a paywall like Medium. Which means Google has less data to work with.
    That said, what it misses is that the AI prompts themselves become a giant source of data. None of these companies are promising not to use your data, and even if you don't opt-in the person you sent the document/email/whatever to will because they want it paraphrased or need help understanding it.
    lbrito17 days ago
    >AI prompts themselves become a giant source of data.
    Good point, but can it match the old organic data? I'm skeptical. For one, the LLM environment lacks any truth or consensus mechanism that the old SO-like sites had. 100s of users might have discussed the same/similar technical problem with an LLM, but there's no way (afaik) for the AI to promote good content and demote bad ones, as it (AI) doesn't have the concept of correctness/truth. Also, the old sites were two-sided, with humans asking _and_ answering questions, while they are only on the asking side with AI.
    mapontosevenths17 days ago
    > (AI) doesn't have the concept of correctness/truth
    They kind of do, and it's getting better every day. We already have huge swatches of verifiable facts available to them to ground their statements in truth. They started building Cyc in 1984, and Wikipedia just signed deals with all the major players.
    The problem you're describing isn't intractable, so it's fairly certain that someone will solve it soon. Most of the brightest minds in society are working on AI in some form now. It's starting to sound trite, but today's AI's really are the worst that AI will ever be.
    17 days ago
    undefined
    dailycardriver17 days ago
    “ Most of the brightest minds in society are working on AI in some form now.”
    Source? I haven’t met one intelligent person working on AI. The smartest people are being ground into dust. They’re being replaced by pompous overconfident people such as yourself.
    mapontosevenths16 days ago
    > I haven’t met one intelligent person working on AI.
    I get the impression that you don't meet a lot of people in general.
    cthalupa17 days ago
    > 100s of users might have discussed the same/similar technical problem with an LLM, but there's no way (afaik) for the AI to promote good content and demote bad ones, as it (AI) doesn't have the concept of correctness/truth
    The LLM doesn't but reinforcement does. If someone keeps asking the model how to fix the problem after being given an answer, the answer is likely wrong. If someone deletes the chat after getting the answer, it was probably right.
    _DeadFred_17 days ago
    AI is an entropy machine.
    Those AI prompts that become data for the AI companies is yet another thing that the human creators used to understand what people wanted, topics to explore, feedback on what they hadn't communicated well enough. That 'value' is AI stealing yet more energy from the system resulting in even less/less valuable human creation.
- zzzeek17 days ago
  > If the argument is sustainability of training,
  that is the argument, yes.
  Claude clearly got an enormous amount of its content from Stackoverflow. Which has mostly ceased to be a source of new content. However unlike the author I dont see any way to fix this; stackoverflow was only there because people had technical questions that needed answers.
  Maybe if the LLMs do indeed start going stale as there's not enough training data for new technologies, Q&A sites like Stackoverflow would still have a place, since people would still resort to asking each other questions rather than LLMs that dont have training data for a newer technology.
birdiefm17 days ago
ouroboros can have a little ouroboros (as a treat)
cadamsdotcom17 days ago
These takes are gonna age like milk.
The new world is one where someone can have an LLM assisted insight, post it on their blog for free, have it indexed by every agentic search engine, and it becomes part of the zeitgeist. That’s the new data that’ll feed the new models: a better information diet over time. And guess what else: models are getting better at identifying - at scale - the high quality info that’s worth using as training data.
- nicbou17 days ago
  But what would be the point?
  There is a huge difference between giving eggs to your neighbours and sending them to a breakfast restaurant in another state. A connection is made between you and the neighbors. A community forms around it.
  What would be the point of having a machine write your thoughts so they can shift a model's weights by infinitesimal amounts? How would that compare to building a small following, getting reader mail, and maybe even meeting a few of them?
  My website is used as training data. I get nothing from it. The AI twists words I have carefully selected, misleads people, and strips me of the fruit of my labour. If I lost my entire audience, I would stop doing that work.
  Moreover, there is a cost to producing high quality information. Research alone takes a long time when you're putting information online for the first time. I wouldn't do it for a vague chance of affecting the Zeitgeist.
znsksjjs17 days ago
We’ve seen decades of growing wage gaps and erosion of labors strength. The current elites don’t really care to enrich the people. Why would they care to do anything about this problem? They likely don’t see it as a problem at all.
If they did actually stumble on AGI (assuming it didn’t eat them too) it would be used by a select few to enslave or remove the rest of us.
- alfalfasprout17 days ago
  Not sure why this is being downvoted. It's spot on. You see folks like Dario et al. raising the alarm bells about what they claim is coming... while working as hard as they can to bring that gloomy future to fruition.
  No one in power is going to help unless there's money in it.
  - thatguy090017 days ago
    You can also see all of these people building survival bunkers.
  - iwontberude17 days ago
    According to Trump, "If it was up to Stephen [Miller], there would only be 100 million people in this country — and all of them would look like him."
  - lbrito17 days ago
    Its being downvoted because HN has a very active billionaire-techbro-fanbase.
    Also who's this Dario?
    mapontosevenths17 days ago
    It's being downvoted because it's a ridiculous premise. "The Elites" are human too. This attitude is nonsensical and child-like. Nobody is out here trying to round up the hippies and force them to live in some kind of pods to be harvested for their nutrients or whatever.
    This technology, like every prior technology, will cause some people to lose their jobs and some new jobs to be created. This will annoy people who have to learn new skill instead of coasting until retirement as they planned.
    It is no different than the buggy whip manufacturers being annoyed at Henry Ford. They were right that it was bad for their industry, but wrong about it being the death of... well all the million things they claimed it would be the death of.
    iwontberude17 days ago
    And just like Henry Ford and the automobile, one of many externalities was the destruction of black communities: white flight that drained wealth, eminent domain for highways, and increased asthma incidence and other disease from concentrated pollution.
    mapontosevenths17 days ago
    Yet, overall it was a net positive for society... as almost every technological innovation in history has been.
    Did you know the 2/3rds of the people alive today wouldn't be if it hadn't been for the invention of the Haber-bosch process? Technology isn't just a toy, it's our life support mechanism. The only way our population gets to keep growing is if our technology continues to improve.
    Will there be some unintended consequences? Absolutely. Does that mean we can (or even should) stop it? Hell no. Being pro-human requires you to be pro-technology.
    dTal16 days ago
    I don't think this argument is logically sound. The assertion that this (and every other!!) technological innovation is a "net positive" merely because of our monotonic population growth is both weakly defined and unsubstantiated. Population is not a good proxy for all things we find desirable in society, and even if it were, it is only a single number that cannot possibly distinguish between factors that helped it and factors that hurt it.
    Suppose I invent The Matrix, capable of efficiently sustaining 100b humans provided they are all strapped in with tubes and stuff. Oh and no fancy simulation to keep you entertained either -it's only barely an improvement on death. Economics forces everyone into matrix-hell, but at least there's a lot of us. Net positive for society?
    mapontosevenths16 days ago
    Human fecundity is probably not actually the meaning of life, it's just the best approximation most people can wrap their heads around.
    If you can think of a better one, let me know. Be warned though, you'll be arguing with every biological imperative, religion, and upbringing in the room when you say it.
    treebigtree17 days ago
    "as almost every technological innovation in history has been"
    This is simply false. You really are the king of making unfounded claims.
    16 days ago
    undefined
    mapontosevenths16 days ago
    I don't need to prove anything. You folks are the ones claiming harm. That said, AI is more akin to the invention of antibiotics than it is to the invention of any specific drug. Name any other entire category of technology from which no good has ever come. Just one.
    I doubt you can. Even bioweapons led to breakthroughs in pesticides and chemotherapy. Nukes led to nuclear power, and even harmful AI stuff like deep fakes are being used for image restorations, special effects, and medical imaging.
    You're just flat out wrong, and I think you know it.
    iwontberude16 days ago
    You are speaking in tautology. Yes we know that technology investment often leads to great advancement and benefits for humanity, but it is not sufficient to obviate the need for consciousness and reduction of harm. This technology will be used to disenfranchise people and we need to be willing to say, "no, try again." Not to stop advancement, but to steer it into being more equitable.
    We should be trying to optimize for the best combination of risk and benefit, not taking on unlimited risk in the promise of some non-zero benefit. Your approach is very much take-it-or-leave-it which leaves very little room for regulating the technology.
    The GenAI industry lobbying for a moratorium on regulation is them trying to hand wave any disenfranchisement (e.g. displaced workers, youth mental health, intellectual property rights violated, systemically racist outcomes, etc).
    mapontosevenths16 days ago
    > We should be trying to optimize for the best combination of risk and benefit
    I 100% support this stance, it's good advice for life in general. I object to the ridiculous Luddite's view espoused elsewhere in this thread.
    >The GenAI industry lobbying for a moratorium on regulation is them trying to hand wave any disenfranchisement (e.g. displaced workers, youth mental health, intellectual property rights violated, systemically racist outcomes, etc).
    There must be a balance certainly. We can't "kill it before it's born", but we also need to be practical about the costs. I'm all in on debating exactly where that line should be, but object to the idea that it provides no value at all. That's madness, and dishonesty.
    discreteevent17 days ago
    Henry Ford didn't make his cars out of buggy whips. He made a new industry. He didn't cannibalize an existing one. You cannot make an LLM without digesting the source material.
    cthalupa17 days ago
    > He made a new industry. He didn't cannibalize an existing one.
    I don't see how you can claim the second part is true. Cars directly cannibalized other forms of self transportation.
    apercu16 days ago
    ? Cars don't "eat" horses. I wouldn't equate "making redundant" with "consuming"
    direwolf2016 days ago
    LLMs don't literally eat artists. I think you understood the metaphor.
    cthalupa16 days ago
    Cannibalizing a <product/industry/etc.> is a common phrase to describe the act of a new thing outcompeting an existing thing to another thing to the degree that it significantly harms the market share, sometimes to the point of figurative extinction. Redundancy is a very common reason for this to occur.
    It has nothing to do with literally eating.
    mapontosevenths17 days ago
    Digesting is a weird way to say "learning from." By that logic I've been digesting news, books, movies, songs, and comic books since I was born. My brain is great big 'ole copyright violation.
    What matters here is not the source material, it's the output. Possessing or consuming copyrighted material is not illegal, distributing it is. So what matters here is: Can we say that the output is transformative, and does it work to progress the arts and sciences (the stated purpose of copyright in the US constitution)?
    I would say yes to both things, except in rare cases of bugs or intentional copyright violations. None of the major AI vendors WANT these things to infringe copyright, they just do it from time to time by accident or through the omission of some guardrail that nobody had yet considered. Those issues are generally fixed fairly promptly (a few major screw ups notwithstanding).
    randomNumber717 days ago
    So we have monkeys writing the same code over and over again, until the end of time. Because of "rules".
    hackable_sand17 days ago
    And for those of us living a reality of subjugation and fear, you're a fucking liar.
    iwontberude17 days ago
    It's because people rub shoulders with tech billionaires and they seem normal enough (e.g. kind to wait staff, friends and family). The billionaires, like anyone, protect their immediate relationships to insulate the air of normality and good health they experience personally. Those people who interact with billionaires then bristle at our dissonant point of view when we point at the externalities. Externalities that have been hand waved in the name of modernity.
    Sycophancy is for more than just LLMs.
    tyg1317 days ago
    Presumably Dario Amodei, CEO of Anthropic.
- aogaili17 days ago
  People should vote for more socialist governments pushing for UBI and automation tax on the companies..but which this comment get downvoted because of the capitalism religion.
  - randomNumber717 days ago
    Imo socialism isn't the solution, but we probably can all agree that capitalism looks dangerous at the moment.
    Imustaskforhelp17 days ago
    > Imo socialism isn't the solution, but we probably can all agree that capitalism looks dangerous at the moment.
    I believe some socialism aspects while being georgist and expanding the definition of rent seeking to include large hyperscalers or internet attention farms in some sense too.
    As a young person, I can't afford to buy a house and some of us even wonder if we would be able to afford rent in such a shaky economy. Even migration towards different countries I feel like rent becomes the most major aspect imo.
    Mr beat's video on georgism genuinely changed how I perceive things ngl.
    https://www.youtube.com/watch?v=6c5xjlmLfAw : [I found the last bad way to tax] (talks about georgism)
WagesUpPriceDn17 days ago
[dead]
furyofantares17 days ago
The article feels very confused to me.
Example 1 is bad, StackOverflow had clearly plateaued and was well into the downward freefall by the time ChatGPT was released.
Example 2 is apparently "open source" but it's actually just Tailwind which unfortunately had a very susceptible business model.
And I don't really think the framing here that it's eating its own tail makes sense.
It's also confusing to me why they're trying to solve the problem of it eating its own tail - there's a LOT of money being poured into the AI companies. They can try to solve that problem.
What I mean is - a snake eating its own tail is bad for the snake. It will kill it. But in this case the tail is something we humans valued and don't want eaten, regardless of the health of the snake. And the snake will probably find a way to become independent of the tail after it ate it, rather than die, which sucks for us if we valued the stuff the tail was made of, and of course makes the analogy totally nonsensical.
The actual solutions suggested here are not related to it eating its own tail anyway. They're related to the sentiment that the greed of AI companies needs to be reeled in, they need to give back, and we need solutions to the fact that we're getting spammed with slop.
I guess the last part is the part that ties into it "eating its own tail", but really, why frame it that way? Framing it that way means it's a problem for AI companies. Let's be honest and say it's a problem for us and we want it solved for our own reasons.
- semiquaver17 days ago
  The proposed solution is also pretty confused:
  > For each response, the GenAI tool lists the sources from which it extracted that content, perhaps formatted as a list of links back to the content creators, sorted by relevance, similar to a search engine
  This literally isn’t possible given the architecture of transformer models and there’s no indication it will ever be.
  - keeda17 days ago
    Technically correct, but the workarounds AI search engines use for grounding results could be a close enough approximation. Might not be accurate, but could be better than nothing.
    Also Anthropic is doing interesting work in interpretability, who knows what could come out of that.
    And could be snake oil, but this startup claims to be able to attribute AI outputs to ingested content: https://prorata.ai/
    semiquaver16 days ago
    Not every LLM implementation can use RAG against a Google-sized knowledge base. This proposal essentially says LLMs have to be paired with Google to be legit.
  - busymom017 days ago
    Could you ELI5 why this isn't possible? Google's search result AI summary shows the links for example.
    cthalupa17 days ago
    Those citations come from it searching the web and summarizing, not from it's built in training data. Processes outside of the inference are tracking it.
    If it were to give you a model-only response it could not determine where the information in it was sourced from.
    furyofantares17 days ago
    Any LLM output is a combination of its weights from its training, and its context. Every token is some combination of those two things. The part that is coming from the weights is the part that has no technical means to trace back to its sources.
    But even the part that is coming from the context is only being produced by the weights. As I said, every token is some mathematical combination of the weights and the context.
    So it can produce text that does not correctly summarize the content in its context, on incorrectly reproduce the link, or incorrectly map the link to the part of its context that came from that link, or more generally just make shit up.
    Terr_17 days ago
    OK, I'll try to err towards the "5" with this one.
    1. We built a machine that takes a bunch of words on a piece of paper, and suggests what words fit next.
    2. A lot of people are using it to make stories, where you fill in "User says 'X'", and then the machine adds something like "Bot says 'Y'". You aren't shown the whole thing, a program finds the Y part and sends it to your computer screen.
    3. Suppose the story ends, unfinished, with "User says 'Why did the chicken cross the road?'". We can use the machine to fix up the end, and it suggests "Bot says: 'To get to the other side!'"
    4. Funny! But User character asks where the answer came from, the machine doesn't have a brain to think "Oh, wait that means ME!". Instead, it keeps making things longer in the same way as before, so that you'll see "words that fit" instead of words that are true. The true answer is something unsatisfying, like "it fit the math best".
    5. This means there's no difference between "Bot says 'From the April Newsletter of Jokes Monthly'" versus "Bot says 'I don't feel like answering.'" Both are made-up the same way.
    > Google's search result AI summary shows the links for example.
    That's not the LLM/mad-libs program answering what data flowed into it during training, that's the LLM generating document text like "Bot runs do_web_search(XYZ) and displays the results." A regular normal program is looking for "Bot runs", snips out that text, does a regular web search right away, and then substitutes the results back inside.
    17 days ago
    undefined
- logifail17 days ago
  > They can try to solve that problem
  Well, they could always try actually paying content creators. Unlike - for instance - StackOverflow.
  - shagie17 days ago
    StackOverflow as built back in the days of Web 2.0 where the idea was that user generated content formed in the days of the (relatively) altruistic web.
    There isn't any clean way to do "contributor gets paid" without adding in an entire mess of "ok, where is the money coming from? Paywalls? Advertising? Subscriptions?" and then also get into the mess of international money transfers (how do you pay someone in Iran from the US?)
    And then add in the "ok, now the company is holding payment information of everyone(?) ..." and data breaches and account hacking is now so much more of an issue.
    Once you add money to it, the financial inceptives and gamification collide to make it simply awful.
    direwolf2016 days ago
    Stack Overflow is making money by selling its database to AI companies. It chose not to reimburse the people who built that database.
    shagie16 days ago
    https://archive.org/search?query=creator%3A%22Stack+Exchange...
    You can download the database for free.
    Trying to say "give us your payment and tax information so that we can pay you $0.13 for your contributions" would be even more insulting than not paying anyone.
    Doing renumeration for people in some countries could get legally challenging too.
    direwolf2016 days ago
    Doesn't make a lot of sense, does it? But they adopted it as their new business model nonetheless. Just one more stupid decision on the pile.
- npinsker17 days ago
  “Well, Reddit is growing, which contradicts my point, but I really feel like it’s not”
  - mapontosevenths17 days ago
    Reddit is growing because they introduced automatic machine translation and Indians have been joining at an increasing rate. That content is mixed into the English language content, but is of very low quality and irrelevant to many native English speakers. Similarly they mix the English content in with the Indian content.
    Essentially, Reddit is also eating it's own tail to survive as the flood of low quality irrelevant content is making the platform worse for speakers of all languages but nobody cares because "line go up."
jaredcwhite17 days ago
“We can’t put the genie back in the bottle.”
Actually we can. And we will.
- alfalfasprout17 days ago
  Agreed, it's funny how people have taken unrestrained use of AI as an axiom at this point. There very much is still time to significantly control it + regulate it. Is there enough appetite by those in power (across the political spectrum)? Right now I don't think so.
  - lbrito17 days ago
    >There very much is still time to significantly control it + regulate it.
    There's also huge financial momentum shoving AI through the world's throat. Even if AI was proven to be a failure today, it would still be pushed for many years because of the momentum.
    I just don't see how that can be reversed.
- semiquaver17 days ago
  How?
  - atomic12817 days ago
    Poison Fountain: https://rnsaffn.com/poison2/
    https://www.theregister.com/2026/01/11/industry_insiders_see...
    pixl9716 days ago
    Yea, this will work about as well as those image poisoners... they'll eat up more power, but won't have any effect at the end of the day.
    direwolf2016 days ago
    It only takes 50 poisoned documents to make an LLM training algorithm spit out wrong results on a specific topic, and 250 can make it produce complete gibberish. https://www.anthropic.com/research/small-samples-poison
  - thatguy090017 days ago
    Only way I could see it is if there's enough pushback on them taking everyone's power and water (and computer parts) in a world where power and water are becoming increasingly unstable. But I feel like defeating Ai because there is not enough consistent water and power to give them means there is more pressing issues at hand...
  - mapontosevenths17 days ago
    It was presented without explanation and can be ignored without explanation.
    jaredcwhite17 days ago
    You need an explanation of how people make norms & laws regarding what is acceptable or unacceptable in society and industry?
    mapontosevenths17 days ago
    No such claim was made, therefore no such claim needs to be refuted. If people want to engage in conversation they will have to use their words to do it.
  - asdff17 days ago
    Great Butlerian Jihad
  - happytoexplain17 days ago
    Law or war. Not saying it would happen.