447 pointsby segmenta4 hours ago58 comments
  • j2kun35 minutes ago
    I spot-checked one of the flagged papers (from Google, co-authored by a colleague of mine)

    The paper was https://openreview.net/forum?id=0ZnXGzLcOg and the problem flagged was "Two authors are omitted and one (Kyle Richardson) is added. This paper was published at ICLR 2024." I.e., for one cited paper, the author list was off and the venue was wrong. And this citation was mentioned in the background section of the paper, and not fundamental to the validity of the paper. So the citation was not fabricated, but it was incorrectly attributed (perhaps via use of an AI autocomplete).

    I think there are some egregious papers in their dataset, and this error does make me pause to wonder how much of the rest of the paper used AI assistance. That said, the "single error" papers in the dataset seem similar to the one I checked: relatively harmless and minor errors (which would be immediately caught by a DOI checker), and so I have to assume some of these were included in the dataset mainly to amplify the author's product pitch. It succeeded.

    • nativeit6 minutes ago
      I see your point, but I don’t see where the author makes any claims about the specifics of the hallucinations, or their impact on the papers’ broader validity. Indeed, I would have found the removal of supposed “innocuous” examples to be far more deceptive than simply calling a spade a spade, and allowing the data to speak for itself.
    • davidguetta8 minutes ago
      Yeah even the entire "Jane Doe / Jame Smith" my first thought is that it could have been a latex default value

      There was dumb stuff like this before the GPT era, it's far from convincing

  • cogman104 hours ago
    Yuck, this is going to really harm scientific research.

    There is already a problem with papers falsifying data/samples/etc, LLMs being able to put out plausible papers is just going to make it worse.

    On the bright side, maybe this will get the scientific community and science journalists to finally take reproducibility more seriously. I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".

    • vld_chk2 hours ago
      In my mental model, the fundamental problem of reproducibility is that scientists have very hard time to find a penny to fund such research. No one wants to grant “hey I need $1m and 2 years to validate the paper from last year which looks suspicious”.

      Until we can change how we fund science on the fundamental level; how we assign grants — it will be indeed very hard problem to deal with.

      • parpfish2 hours ago
        In theory, asking grad students and early career folks to run replications would be a great training tool.

        But the problem isn’t just funding, it’s time. Successfully running a replication doesn’t get you a publication to help your career.

        • rtkwe44 minutes ago
          That.. still requires funding. Even if your lab happens to have all the equipment required to replicate you're paying the grad student for their time spent on replicating this paper and you'll need to buy some supplies; chemicals, animal subjects, pay for shared equipment time, etc.
        • goaliecaan hour ago
          Grad students don’t get to publish a thesis on reproduction. Everyone from the undergraduate research assistant to the tenured professor with research chairs are hyper focused on “publishing” as much “positive result” on “novel” work as possible
          • soiltype12 minutes ago
            But that seems almost trivially solved. In software it's common to value independent verification - e.g. code review. Someone who is only focused on writing new code instead of careful testing, refactoring, or peer review is widely viewed as a shitty developer by their peers. Of course there's management to consider and that's where incentives are skewed, but we're talking about a different structure. Why wouldn't the following work?

            A single university or even department could make this change - reproduction is the important work, reproduction is what earns a PhD. Or require some split, 20-50% novel work maybe is also expected. Now the incentives are changed. Potentially, this university develops a reputation for reliable research. Others may follow suit.

            Presumably, there's a step in this process where money incentivizes the opposite of my suggestion, and I'm not familiar with the process to know which.

            Is it the university itself which will be starved of resources if it's not pumping out novel (yet unreproducible) research?

          • Kinranyan hour ago
            Publishing a replication could be a prerequisite to getting the degree

            The question is, how can universities coordinate to add this requirement and gain status from it

            • ihaveajoban hour ago
              I think Arxiv and similar could contribute positively by listing replications/falsifications, with credit to the validating authors. That would be enough of an incentive for aspiring researchers to start making a dent.
        • eks-reighan hour ago
          You may well know this, but I get the sense that it isn’t necessarily common knowledge, so I want to spell it out anyway:

          In a lot of cases, the salary for a grad student or tech is small potatoes next to the cost of the consumables they use in their work.

          For example,I work for a lab that does a lot of sequencing, and if we’re busy one tech can use 10k worth of reagents in a week.

        • coryrcan hour ago
          Enough people will falsify the replication and pocket the money, taking you back to where you were in the first place and poorer for it. The loss of trust is an existential problem for the USA.
        • iugtmkbdfil8342 hours ago
          Yeah, but doesn't publishing an easily falsifiable paper end one?
          • bnchrch2 hours ago
            One, it doesnt damage your reputation as much as one would think.

            But two, and more importantly, no one is checking.

            Tree falls in the forest, no one hears, yadi-yada.

            • iugtmkbdfil834an hour ago
              << no one is checking.

              I think this is the big part of it. There is no incentive to do it even when the study can be reproduced.

          • parpfish2 hours ago
            But the thing is… nobody is doing the replication to falsify it. And if the did, it wouldn’t be published because it’s a null result
          • Telaneoan hour ago
            Not really, since nobody (for values of) ends up actually falsifying it, and if they do, it's years down the line.
          • wizzwizz42 hours ago
            Not in most fields, unless misconduct is evident. (And what constitutes "misconduct" is cultural: if you have enough influence in a community, you can exert that influence on exactly where that definitional border lies.) Being wrong is not, and should not be, a career-ending move.
            • iugtmkbdfil834an hour ago
              If we are aiming for quality, then being wrong absolutely should be. I would argue that is how it works in real life anyway. What we quibble over is what is the appropriate cutoff.
              • rtkwe31 minutes ago
                There's a big gulf between being wrong because you or a collaborator missed an uncontrolled confounding factor and falsifying or altering results. Science accepts that people sometimes make mistakes in their work because a) they can also be expected to miss something eventually and b) a lot of work is done by people in training in labs you're not directly in control of (collaborators). They already aim for quality and if you're consistently shown to be sloppy or incorrect when people try to use your work in their own.

                The final bit is a thing I think most people miss when they think about replication. A lot of papers don't get replicated directly but their measurements do when other researchers try to use that data to perform their own experiments, at least in the more physical sciences this gets tougher the more human centric the research is. You can't fake or be wrong for long when you're writing papers about the properties of compounds and molecules. Someone is going to come try to base some new idea off your data and find out you're wrong when their experiment doesn't work. (or spend months trying to figure out what's wrong and finally double check the original data).

      • godelski20 minutes ago
        Funding is definitely a problem, but frankly reproduction is common. If you build off someone else's work (as is the norm) you need to reproduce first.

        But without repetition being impactful to your career and the pressure to quickly and constantly push new work, a failure to reproduce is generally considered a reason to move on and tackle a different domain. It takes longer to trace the failure and the bar is higher to counter an existing work. It's much more likely you've made a subtle mistake. It's much more likely the other work had a subtle success. It's much more likely the other work simply wasn't written such that a work could be sufficiently reproduced.

        I speak from experience too. I still remember in grad school I was failing to reproduce a work that was the main competitor to the work I had done (I needed to create comparisons). I emailed the author and got no response. Luckily my advisor knew the author's advisor and we got a meeting set up and I got the code. It didn't do what was claimed in the paper and the code structure wasn't what was described either. The result? My work didn't get published and we moved on. The other work was from a top 10 school and the choice was to burn a bridge and put a black mark on my reputation (from someone with far more merit and prestige) or move on.

        That type of thing won't change in a reproduction system but needs an open system and open reproduction system as well. Mistakes are common and we shouldn't punish them. The only way to solve these issues is openness

      • jghnan hour ago
        Partially. There's also the issue that some sciences, like biology, are a lot messier & less predicatble than people like to believe.
      • poszlem2 hours ago
        I often think we should movefrom peer review as "certification" to peer review as "triage", with replication determining how much trust and downstream weight a result earns over time.
    • StableAlkyne3 hours ago
      > I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".

      Most people (that I talk to, at least) in science agree that there's a reproducibility crisis. The challenge is there really isn't a good way to incentivize that work.

      Fundamentally (unless you're independent wealthy and funding your own work), you have to measure productivity somehow, whether you're at a university, government lab, or the private sector. That turns out to be very hard to do.

      If you measure raw number of papers (more common in developing countries and low-tier universities), you incentivize a flood of junk. Some of it is good, but there is such a tidal wave of shit that most people write off your work as a heuristic based on the other people in your cohort.

      So, instead it's more common to try to incorporate how "good" a paper is, to reward people with a high quantity of "good" papers. That's quantifying something subjective though, so you might try to use something like citation count as a proxy: if a work is impactful, usually it gets cited a lot. Eventually you may arrive at something like the H-index, which is defined as "The highest number H you can pick, where H is the number of papers you have written with H citations." Now, the trouble with this method is people won't want to "waste" their time on incremental work.

      And that's the struggle here; even if we funded and rewarded people for reproducing results, they will always be bumping up the citation count of the original discoverer. But it's worse than that, because literally nobody is going to cite your work. In 10 years, they just see the original paper, a few citing works reproducing it, and to save time they'll just cite the original paper only.

      There's clearly a problem with how we incentivize scientific work. And clearly we want to be in a world where people test reproducibility. However, it's very very hard to get there when one's prestige and livelihood is directly tied to discovery rather than reproducibility.

      • soiltype7 minutes ago
        That feels arbitrary as a measure of quality. Why isn't new research simply devalued and replication valued higher?

        "Dr Alice failed to reproduce 20 would-be headline-grabbing papers, preventing them from sucking all the air out of the room in cancer research" is something laudable, but we're not lauding it.

      • gcr2 hours ago
        I'd personally like to see top conferences grow a "reproducibility" track. Each submission would be a short tech report that chooses some other paper to re-implement. Cap 'em at three pages, have a lightweight review process. Maybe there could be artifacts (git repositories, etc) that accompany each submission.

        This would especially help newer grad students learn how to begin to do this sort of research.

        Maybe doing enough reproductions could unlock incentives. Like if you do 5 reproductions than the AC would assign your next paper double the reviewers. Or, more invasively, maybe you can't submit to the conference until you complete some reproduction.

        • azan_2 hours ago
          The problem is that reproducing something is really, really hard! Even if something doesn't reproduce in one experiment, it might be due to slight changes in some variables we don't even think about. There are some ways to circumvent it (e.g. team that's being reproduced cooperating with reproducing team and agreeing on what variables are important for the experiemnt and which are not), but it's really hard. The solutions you propose will unfortunately incentivize bad reproductions and we might reject theories that are actually true because of that. I think that one of the best way to fight the crisis is to actually improve quality of science - articles where authors reject to share their data should be automatically rejected. We should also move towards requiring preregistration with strict protocols for almost all studies.
        • dataflowan hour ago
          Is it time for some sort of alternate degree to a PhD beyond a Master's? Showing, essentially, "this person can learn, implement, validate, and analyze the state of the art in this field"?
          • gogopromptlessan hour ago
            Thats what we call a Staff level engineer. Proven ability to learn, implement and validate is basically the "it factor" businesses are looking for.

            If you are thinking about this from an academic angle then sure its sounds weird to say "Two Staff jobs in a row from the University of LinkedIn" as a degree. But I submit this as basically the certificate you desire.

      • MetaWhirledPeas2 hours ago
        > Eventually you may arrive at something like the H-index, which is defined as "The highest number H you can pick, where H is the number of papers you have written with H citations."

        It's the Google search algorithm all over again. And it's the certificate trust hierarchy all over again. We keep working on the same problems.

        Like the two cases I mentioned, this is a matter of making adjustments until you have the desired result. Never perfect, always improving (well, we hope). This means we need liquidity with the rules and heuristics. How do we best get that?

        • sroussey2 hours ago
          Incentives.

          First X people that reproduce Y get Z percent of patent revenue.

          Or something similar.

          • rtkwe27 minutes ago
            Most papers generate zero patent revenue or even lead to patents at all. For major drugs maybe that works but we already have clinical trials before the drug goes to market that validate the efficacy of the drugs.
          • jltsirenan hour ago
            Patent revenue is mostly irrelevant, as it's too unpredictable and typically decades in the future. Academics rarely do research that can be expected to produce economic value in the next 10–20 years, because the industry can easily outspend the academia in such topics.
          • wizzwizz42 hours ago
            I'm delighted to inform you that I have reproduced every patent-worthy finding of every major research group active in my field in the past 10 years. You can check my data, which is exactly as theory predicts (subject to experimental noise). I accept payment in cash.
      • maerF0x03 hours ago
        > The challenge is there really isn't a good way to incentivize that work.

        What if we got Undergrads (with hope of graduate studies) to do it? Could be a great way to train them on the skills required for research without the pressure of it also being novel?

        • StableAlkyne3 hours ago
          Those undergrads still need to be advised and they use lab resources.

          If you're a tenure-track academic, your livelihood is much safer from having them try new ideas (that you will be the corresponding author on, increasing your prestige and ability to procure funding) instead of incrementing.

          And if you already have tenure, maybe you have the undergrad do just that. But the tenure process heavily filters for ambitious researchers, so it's unlikely this would be a priority.

          If instead you did it as coursework, you could get them to maybe reproduce the work, but if you only have the students for a semester, that's not enough time to write up the paper and make it through peer review (which can take months between iterations)

        • rtkwe22 minutes ago
          Most interesting results are not so simple to recreate that would could reliably expect undergrads to do perform the replication even if we ignore the cost of the equipment and consumables that replication would need and the time/supervision required to walk them through the process.
        • suddenlybananas3 hours ago
          Unfortunately, that might just lead to a bunch of type II errors instead, if an effect requires very precise experimental conditions that undergrads lack the expertise for.
          • retsibsian hour ago
            Could it be useful as a first line of defence? A failed initial reproduction would not be seen as disqualifying, but it would bring the paper to the attention of more senior people who could try to reproduce it themselves. (Maybe they still wouldn't bother, but hopefully they'd at least be more likely to.)
      • poulpy1233 hours ago
        > I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".

        But nobody want to pay for it

      • geokon3 hours ago
        usually you reproduce previous research as a byproduct of doing something novel "on top" of the previous result. I dont really see the problem with the current setup.

        sometimes you can just do something new and assume the previous result, but thats more the exception. youre almost always going to at least in part reproducr the previous one. and if issues come up, its often evident.

        thats why citations work as a good proxy. X number of people have done work based around this finding and nobody has seen a clear problem

        theres a problem of people fabricating and fudging data and not making their raw data available ("on request" or with not enough meta data to be useful) which wastes everyones time and almost never leads to negative consequences for the authors

        • gcr2 hours ago
          It's often quite common to see a citation say "BTW, we weren't able to reproduce X's numbers, but we got fairly close number Y, so Table 1 includes that one next to an asterisk."

          The difficult part is surfacing that information to readers of the original paper. The semantic scholar people are beginning to do some work in this area.

          • geokonan hour ago
            yeah thats a good point. the citation might actually be pointing out a problem and not be a point in favor. its a slog to figure out... but seems like the exact type of problem an LLM could handle

            give it a published paper and it runs through papers that have cited it and give you an evaluation

      • 3 hours ago
        undefined
      • graemep2 hours ago
        > you have to measure productivity somehow,

        No, you do not have to. You give people with the skills and interest in doing research the money. You need to ensure its spent correctly, that is all. People will be motivated by wanting to build a reputation and the intrinsic reward of the work

      • warkdarrior3 hours ago
        > If you measure raw number of papers (more common in developing countries and low-tier universities), you incentivize a flood of junk.

        This is exactly what rewarding replication papers (that reproduce and confirm an existing paper) will lead to.

        • pixl973 hours ago
          And yet if we can't reproduce an existing paper, it's very possible that existing paper is junk itself.

          Catch-22 is a fun game to get caught in.

      • jimbokun3 hours ago
        > The challenge is there really isn't a good way to incentivize that work.

        Ban publication of any research that hasn't been reproduced.

        • wpollock2 hours ago
          > Ban publication of any research that hasn't been reproduced.

          Unless it is published, nobody will know about it and thus nobody will try to reproduce it.

          • sroussey2 hours ago
            Just have a new journal of only papers that have been reproduced, and include the reproduction papers.
        • gcr2 hours ago
          lol, how would the first paper carrying some new discovery get published?
    • mike_hearn3 hours ago
      Reproducibility is overrated and if you could wave a wand to make all papers reproducible tomorrow, it wouldn't fix the problem. It might even make it worse.

      https://blog.plan99.net/replication-studies-cant-fix-science...

      • biophysboy2 hours ago
        ? More samples reduces the variance of a statistic. Obviously it cannot identify systematic bias in a model, or establish causality, or make a "bad" question "good". Its not overrated though -- it would strengthen or weaken the case for many papers.
        • mike_hearnan hour ago
          If you have a strong grip on exactly what it means, sure, but look at any HN thread on the topic of fraud in science. People think replication = validity because it's been described as the replication crisis for the last 15 years. And that's the best case!

          Funding replication studies in the current environment would just lead to lots of invalid papers being promoted as "fully replicated" and people would be fooled even harder than they already are. There's got to be a fix for the underlying quality issues before replication becomes the next best thing to do.

          • biophysboy12 minutes ago
            > look at any HN thread on the topic of fraud in science.

            HN is very tedious/lazy when it comes to science criticism -- very much agree with you on this.

            My only point is replication is necessary to establish validity, even if it is not sufficient. Whether it gives a scientist a false sense of security doesn't change the math of sampling.

            I also agree with you on quality issues. I think alternative investment strategies (other than project grants) would be a useful step for reducing perverse incentives, for example. But there's a lot of things science could do.

          • doctorpangloss43 minutes ago
            while i agree that "reproducibility is overrated", i went ahead and read your medium post. my feedback to you is, my summary of that writing: "mike_hearn's take on policy-adjacent writing conducted by public health officials and published in journals that interacted with mike_hearn's valid and common but nonetheless subjective political dispute about COVID-19."

            i don't know how any of that writing generalizes to other parts of academic research. i mean, i know that you say it does, but i don't think it does. what exactly do you think most academic research institutions and the federal government spend money on? for example, wet lab research. you don't know anything about wet lab research. i think if you took a look at a typical e.g. basic science in immunology paper, built on top of mouse models, you would literally lose track of any of its meaning after the first paragraph, you would feed it into chatgpt, and you would struggle to understand the topic well enough to read another immunology paper, you would have an immense challenge talking about it with a researcher in the field. it would take weeks of reading. you have no medicine background, so you wouldn't understand the long horizon context of any of it. you wouldn't be able to "chatbot" your way into it, it would be a real education. so after all of that, would you still be able to write the conclusion you wrote in the medium post? i don't think so, because you would see that by many measures, you cannot generalize a froo-froo policy between "subjective political dispute about COVID-19" writing and wet lab research. you'd gain the wisdom to see that they're different things, and you lack the background, and you'd be much more narrow in what you'd say.

            it doesn't even have to be in the particulars, it's just about wisdom. that is my feedback. you are at once saying that there is greater wisdom to be had in the organization and conduct of research, and then, you go and make the highly low wisdom move to generalize about all academic research. which you are obviously doing not because it makes sense to, you're a smart guy. but because you have some unknown beef with "academics" that stems from anger about valid, common but nonetheless subjective political disputes about COVID-19.

            • mike_hearn19 minutes ago
              Thanks for reading it, or scan reading it maybe. Of the 18 papers discussed in the essay here's what they're about in order:

              - Alzheimers

              - Cancer

              - Alzheimers

              - Skin lesions (first paper discussed in the linked blog post)

              - Epidemiology (COVID)

              - Epidemiology (COVID, foot and mouth disease, Zika)

              - Misinformation/bot studies

              - More misinformation/bot studies

              - Archaeology/history

              - PCR testing (in general, discussion opens with testing of whooping cough)

              - Psychology, twice (assuming you count "men would like to be more muscular" as a psych claim)

              - Misinformation studies

              - COVID (the highlighted errors in the paper are objective, not subjective)

              - COVID (the highlighted errors are software bugs, i.e. objective)

              - COVID (a fake replication report that didn't successfully replicate anything)

              - Public health (from 2010)

              - Social science

              Your summary of this as being about a "valid and common but subjective political dispute" I don't agree is accurate. There's no politics involved in any of these discussions or problems, just bad science.

              Immunology has the same issues as most other medical fields. Sure, there's also fraud that requires genuinely deep expertise to find, but there's plenty that doesn't. Here's a random immunology paper from a few days ago identified as having image duplications, Photoshopping of western blots, numerous irrelevant citations and weird sentence breaks all suggestive that the paper might have been entirely faked or at least partly generated by AI: https://pubpeer.com/publications/FE6C57F66429DE2A9B88FD245DD...

              The authors reply, claiming the problems are just rank incompetence, and each time someone finds yet another problem with the paper leading to yet another apology and proclamation of incompetence. It's just another day on PubPeer, nothing special about this paper. I plucked it off the front page. Zero wet lab experience is needed to understand why the exact same image being presented as two different things in two different papers is a problem.

              And as for other fields, they're often extremely shallow. I actually am an expert in bot detection but that doesn't help at all in detecting validity errors in social science papers, because they do things like define a bot as anyone who tweets five times after midnight from a smartphone. A 10 year old could notice that this isn't true.

    • godzillabrennus4 hours ago
      Have they solved the issue where papers that cite research already invalidated are still being cited?
      • cogman104 hours ago
        AFAIK, no, but I could see there being cause to push citations to also cite the validations. It'd be good if standard practice turned into something like

        Paper A, by bob, bill, brad. Validated by Paper B by carol, clare, charlotte.

        or

        Paper A, by bob, bill, brad. Unvalidated.

        • gcr4 hours ago
          Academics typically use citation count and popularity as a rough proxy for validation. It's certainly not perfect, but it is something that people think about. Semantic Scholar in particular is doing great work in this area, making it easy to see who cites who: https://www.semanticscholar.org/

          Google Scholar's PDF reader extension turns every hyperlinked citation into a popout card that shows citation counts inline in the PDF: https://chromewebstore.google.com/detail/google-scholar-pdf-...

          • rtkwe16 minutes ago
            That is a factor most people miss when thinking about the replication crisis. For the harder physical sciences a wrong paper will fairly quickly be found because as people go to expand on the ideas/use that data and get results that don't match the model informed by paper X they're going to eventually figure out that X is wrong. There might be issues with getting incentives to write and publish that negative result but each paper where the results of a previous paper are actually used in the new paper is a form of replication.
      • reliabilityguy4 hours ago
        Nope.

        I am still reviewing papers that propose solutions based on a technique X, conveniently ignoring research from two years ago that shows that X cannot be used on its own. Both the paper I reviewed and the research showing X cannot be used are in the same venue!

        • b00ty4breakfast3 hours ago
          does it seem to be legitimate ignorance or maybe folks pushing ahead regardless of x being disproved?
          • freedomben3 hours ago
            IMHO, It's mostly ignorance coming a push/drive to "publish or perish." When the stakes are so high and output is so valued, and when reproducability isn't required, it disincentivizes thorough work. The system is set up in a way that is making it fail.

            There is also the reality that "one paper" or "one study" can be found contradicted almost anything, so if you just went with "some other paper/study debunks my premise" then you'd end up producing nothing. Plus many inside know that there's a lot of slop out there that gets published, so they can (sometimes reasonably IMHO) dismiss that "one paper" even when they do know about it.

            It's (mostly) not fraud or malicious intent or ignorance, it's (mostly) humans existing in the system in which they must live.

          • reliabilityguy2 hours ago
            Poor scholarship.

            However, given the feedback by other reviewers, I was the only one who knew that X doesn’t work. I am not sure how these people mark themselves as “experts” in the field if they are not following the literature themselves.

    • Sparkytean hour ago
      If there is one thing which scientific reports must require is not using AI to produce the documentation. They can be of the data but not of the source or anything else. AI is a tool, not a replacement for actual work.
    • f311a4 hours ago
      For ML/AI/Comp sci articles, providing reproducible code is a great option. Basically, PoC or GTFO.
      • 3 hours ago
        undefined
      • StableAlkyne3 hours ago
        The most annoying ones are those which discuss loosely the methodology but then fail to publish the weights or any real algorithms.

        It's like buying a piece of furniture from IKEA, except you just get an Allen key, a hint at what parts to buy, and blurry instructions.

    • andai24 minutes ago
      I heard that most papers in a given field are already not adding any value. (Maybe it depends on the field though.)

      There seems to be a rule in every field that "99% of everything is crap." I guess AI adds a few more nines to the end of that.

      The gems are lost in a sea of slop.

      So I see useless output (e.g. crap on the app store) as having negative value, because it takes up time and space and energy that could have been spent on something good.

      My point with all this is that it's not a new problem. It's always been about curation. But curation doesn't scale. It already didn't. I don't know what the answer to that looks like.

    • lxgr2 hours ago
      > LLMs being able to put out plausible papers is just going to make it worse

      If correct form (LaTeX two-column formatting, quoting the right papers and authors of the year etc.) has been allowing otherwise reject-worthy papers to slip through peer review, academia arguably has bigger problems than LLMs.

      • LPisGoodan hour ago
        Correct form and relevant citations have been, for generations up to a couple of years ago, mighty strong signals that a work is good and done by a serious and reliable author. This is bo longer the case and we are worse off for it.
    • lallysinghan hour ago
      On the bright side, an LLM can really help set up a reproduction environment.

      Perhaps repro should become the basis of peer review?

      • mort96an hour ago
        No, it can't. No LLM can purchase the equipment and chemicals and machinery you need to reproduce experiments, nor should you want it.
    • agumonkey3 hours ago
      I think, at least I hope, that a part of the LLM value will be to create their retirement for specific needs. Instead of asking it to solve any problem, restrict the space to a tool that can help you then reach your goal faster without the statistical nature of LLMs.
    • benob2 hours ago
      Maybe it will also change the whole publication as evaluation of science.
    • colechristensenan hour ago
      Reading the article, this is about CITATIONS which are trivially verifiable.

      This is just article publishers not doing the most basic verification failing to notice that the citations in the article don't exist.

      What this should trigger is a black mark for all of the authors and their institutions, both of which should receive significant reputational repercussions for publishing fake information. If they fake the easiest to verify information (does the cited work exist) what else are they faking?

    • j453 hours ago
      It will better expose the behaviour of false scientists.
    • godelski32 minutes ago

        > to finally take reproducibility more seriously
      
      I've long argued for this, as reproduction is the cornerstone of science. There's a lot of potential ways to do this but one that I like is linking to the original work. Suppose you're looking at the OpenReview page and they have a link for "reproduction efforts" and with at minimum an annotation for confirmation or failure.

      This is incredibly helpful to the community as a whole. Reproduction failures can be incredibly helpful even when the original work has no fraud. In those cases a reprising failure reveals important information about the necessary conditions that the original work relies on.

      But honestly, we'll never get this until we drop the entire notion of "novel" or "impact" and "publish or perish". Novel is in the eye of the reviewer and the lower the reviewer's expertise the less novel a work seems (nothing is novel as a high enough level). Impact can almost never be determined a priori, and when it can you already have people chasing those directions because why the fuck would they not? But publish or perish is the biggest sin. It's one of those ideas that looks nice on paper, like you are meaningfully determining who is working hard and who is hardly working. But the truth is that you can't tell without being in the weeds. The real result is that this stifles creativity, novelty, and impact as it forces researchers to chase lower hanging fruit. Things you're certain will work and can get published. It creates a negative feedback loop as we compete: "X publishes 5 papers a year, why can't you?" I've heard these words even when X has far fewer citations (each of my work had "more impact").

      Frankly, I believe fraud would dramatically reduce were researchers not risking job security. The fraud is incentivized by the cutthroat system where you're constantly trying to defend your job, your work, and your grants. They'll always be some fraud but (with a few exceptions) researchers aren't rockstar millionaires. It takes a lot of work to get to point where fraud even works, so there's a natural filter.

      I have the same advice as Mervin Kelly, former director of Bell Labs:

        How do you manage genius?
        You don't
    • CamperBob22 hours ago
      I'd need to see the same scrutiny applied to pre-AI papers. If a field has a poor replication rate, meaning there's a good chance that a given published paper is just so much junk science, is that better or worse than letting AI hallucinate the data in the first place?
  • pacbardan hour ago
    The ironic part about these hallucinations is that a research paper includes a literature review because the goal of the research is to be in dialogue with prior work, to show a gap in the existing literature, and to further the knowledge that this prior work has built.

    By using an LLM to fabricate citations, authors are moving away from this noble pursuit of knowledge built on the "shoulders of giants" and show that behind the curtain output volume is what really matters in modern US research communities.

    • andy_xor_andrewan hour ago
      I guess that makes this "standing on the shoulders of fabrications"
      • stogot38 minutes ago
        Fabrication should be immediate academic ban for life
  • gcr4 hours ago
    NeurIPS leadership doesn’t think hallucinated references are necessarily disqualifying; see the full article from Fortune for a statement from them: https://archive.ph/yizHN

    > When reached for comment, the NeurIPS board shared the following statement: “The usage of LLMs in papers at AI conferences is rapidly evolving, and NeurIPS is actively monitoring developments. In previous years, we piloted policies regarding the use of LLMs, and in 2025, reviewers were instructed to flag hallucinations. Regarding the findings of this specific work, we emphasize that significantly more effort is required to determine the implications. Even if 1.1% of the papers have one or more incorrect references due to the use of LLMs, the content of the papers themselves are not necessarily invalidated. For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex (a formatted reference). As always, NeurIPS is committed to evolving the review and authorship process to best ensure scientific rigor and to identify ways that LLMs can be used to enhance author and reviewer capabilities.”

    • jklinger4103 hours ago
      > the content of the papers themselves are not necessarily invalidated. For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex (a formatted reference)

      Maybe I'm overreacting, but this feels like an insanely biased response. They found the one potentially innocuous reason and latched onto that as a way to hand-wave the entire problem away.

      Science already had a reproducibility problem, and it now has a hallucination problem. Considering the massive influence the private sector has on the both the work and the institutions themselves, the future of open science is looking bleak.

      • orbital-decay3 hours ago
        The wording is not hand-wavy. They said "not necessarily invalidated", which could mean that innocuous reason and nothing extra.
        • mikkupikku2 hours ago
          Even if some of those innocuous mistakes happen, we'll all be better off if we accept people making those mistakes as acceptable casualties in an unforgiving campaign against academic fraudsters.

          It's like arguing against strict liability for drunk driving because maybe somebody accidentally let their grape juice sit to long and they didn't know it was fermented... I can conceive of such a thing, but that doesn't mean we should go easy on drunk driving.

        • jklinger4102 hours ago
          I really think it is. The primary function of these publications is to validate science. When we find invalid citations, it shows they're not doing their job. When they get called on that, they cite the volume of work their publication puts out and call out the only potential not-disqualifying outcome.

          Seems like CYA, seems like hand wave. Seems like excuses.

      • paulmist3 hours ago
        Isn't disqualifying X months of potentially great research due to a misformed, but existing reference harsh? I don't think they'd be okay with references that are actually made up.
        • jklinger4102 hours ago
          When your entire job is confirming that science is valid, I expect a little more humility when it turns out you've missed a critical aspect.

          How did these 100 sources even get through the validation process?

          > Isn't disqualifying X months of potentially great research due to a misformed, but existing reference harsh?

          It will serve as a reminder not to cut any corners.

          • paulmistan hour ago
            > When your entire job is confirming that science is valid, I expect a little more humility when it turns out you've missed a critical aspect.

            I wouldn't call a misformed reference a critical issue, it happens. That's why we have peer reviews. I would contend drawing superficially valid conclusions from studies through use of AI is a much more burning problem that speaks more to the integrity of the author.

            > It will serve as a reminder not to cut any corners.

            Or yet another reason to ditch academic work for industry. I doubt the rise of scientific AI tools like AlphaXiv [1], whether you consider them beneficial or detrimental, can be avoided - calling for a level pragmatism.

        • zipy1242 hours ago
          Science relies on trust.. a lot. So things which show dishonesty are penalised greatly. If we were to remove trust then peer reviewing a paper might take months of work or even years.
          • paulmistan hour ago
            And that timeline only grows with the complexity of the field in question. I think this is inherently a function of the complexity of the study, and rather than harshly penalizing such shortcomings we should develop tools that address them and improve productivity. AI can speed up the verification of requirements like proper citations, both on the author's and reviewer's side.
        • suddenlybananas3 hours ago
          It's a sign of dishonesty, not a perfect one, but an indicator.
    • derf_3 hours ago
      This will continue to happen as long as it is effectively unpunished. Even retracting the paper would do little good, as odds are it would not have been written if the author could not have used an LLM, so they are no worse off for having tried. Scientific publications are mostly a numbers game at this point. It is just one more example of a situation where behaving badly is much cheaper than policing bad behavior, and until incentives are changed to account for that, it will only get worse.
    • mlmonkey3 hours ago
      Why not run every submitted paper through GPTZero (before sending to reviewers) and summarily reject any paper with a hallucination?
      • gcr2 hours ago
        That's how GPTZero wants to situate themselves.

        Who would pay them? Conference organizers are already unpaid and undestaffed, and most conferences aren't profitable.

        I think rejections shouldn't be automatic. Sometimes there are just typos. Sometimes authors don't understand BibTeX. This needs to be done in a way that reduces the workload for reviewers.

        One way of doing this would be for GPTZero to annotate each paper during the review step. If reviewers could review a version of each paper with yellow-highlighted "likely-hallucinated" references in the bibliography, then they'd bring it up in their review and they'd know to be on their guard for other probably LLM-isms. If there's only a couple likely typos in the references, then reviewers could understand that, and if they care about it, they'd bring it up in their reviews and the author would have the usual opportunity to rebut.

        I don't know if GPTZero is willing to provide this service "for free" to the academic community, but if they are, it's probably worth bringing up at the next PAMI-TC meeting for CVPR.

        • zipy1242 hours ago
          Most publication venues already pay for a plagiarism detection service, it seems it would be trivial to add it on as a cost. Especially given APCs for journals are several thousand dollars, what's a few dollars more per paper.
    • Aurornis3 hours ago
      > Even if 1.1% of the papers have one or more incorrect references due to the use of LLMs, the content of the papers themselves are not necessarily invalidated.

      This statement isn’t wrong, as the rest of the paper could still be correct.

      However, when I see a blatant falsification somewhere in a paper I’m immediately suspicious of everything else. Authors who take lazy shortcuts when convenient usually don’t just do it once, they do it wherever they think they can get away with it. It’s a slippery slope from letting an LLM handle citations to letting the LLM write things for you to letting the LLM interpret the data. The latter opens the door to hallucinated results and statistics, as anyone who has experimented with LLMs for data analysis will discover eventually.

    • empath753 hours ago
      I think a _single_ instance of an LLM hallucination should be enough to retract the whole paper and ban further submissions.
      • gcr3 hours ago
        Going through a retraction and blacklisting process is also a lot of work -- collecting evidence, giving authors a chance to respond and mediate discussion, etc.

        Labor is the bottleneck. There aren't enough academics who volunteer to help organize conferences.

        (If a reader of this comment is qualified to review papers and wants to step up to the plate and help do some work in this area, please email the program chairs of your favorite conference and let them know. They'll eagerly put you to work.)

        • pessimizer3 hours ago
          That's exactly why the inclusion of a hallucinated reference is actually a blessing. Instead going back and forth with the fraudster, just tell them to find the paper. If they can't, case closed. Massive amount of time and money saved.
          • gcr3 hours ago
            Isn't telling them to find the paper just "going back and forth with a fraudster"?

            One "simple" way of doing this would be to automate it. Have authors step through a lint step when their camera-ready paper is uploaded. Authors would be asked to confirm each reference and link it to a google scholar citation. Maybe the easy references could be auto-populated. Non-public references could be resolved by uploading a signed statement or something.

            There's no current way of using this metadata, but it could be nice for future systems.

            Even the Scholar team within Google is woefully understaffed.

            My gut tells me that it's probably more efficient to just drag authors who do this into some public execution or twitter mob after-the-fact. CVPR does this every so often for authors who submit the same paper to multiple venues. You don't need a lot of samples for deterrence to take effect. That's kind of what this article is doing, in a sense.

      • andy993 hours ago

           For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex
        
        This is equivalent to a typo. I’d like to know which “hallucinations” are completely made up, and which have a corresponding paper but contain some error in how it’s cited. The latter I don’t think matters.
        • burkaman3 hours ago
          If you click on the article you can see a full list of the hallucinations they found. They did put in the effort to look for plausible partial matches, but most of them are some variation of "No author or title match. Doesn't exist in publication."

          Here's a random one I picked as an example.

          Paper: https://openreview.net/pdf?id=IiEtQPGVyV

          Reference: Asma Issa, George Mohler, and John Johnson. Paraphrase identification using deep contextual- ized representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 517–526, 2018.

          Asma Issa and John Johnson don't appear to exist. George Mohler does, but it doesn't look like he works in this area (https://www.georgemohler.com/). No paper with that title exists. There are some with sort of similar titles (https://arxiv.org/html/2212.06933v2 for example), but none that really make sense as a citation in this context. EMNLP 2018 exists (https://aclanthology.org/D18-1.pdf), but that page range is not a single paper. There are papers in there that contain the phrases "paraphrase identification" and "deep contextualized representations", so you can see how an LLM might have come up with this title.

      • wing-_-nuts3 hours ago
        I dunno about banning them, humans without LLMs make mistakes all the time, but I would definitely place them under much harder scrutiny in the future.
        • pessimizer3 hours ago
          Hallucinations aren't mistakes, they're fabrications. The two are probably referred to by the same word in some languages.

          Institutions can choose an arbitrary approach to mistakes; maybe they don't mind a lot of them because they want to take risks and be on the bleeding edge. But any flexible attitude towards fabrications is simply corruption. The connected in-crowd will get mercy and the outgroup will get the hammer. Anybody criticizing the differential treatment will be accused of supporting the outgroup fraudsters.

          • gcr3 hours ago
            Fabrications carry intent to decieve. I don't think hallucinations necessarily do. If anything, they're a matter of negligence, not deception.

            Think of it this way: if I wanted to commit pure academic fraud maliciously, I wouldn't make up a fake reference. Instead, I'd find an existing related paper and merely misrepresent it to support my own claims. That way, the deception is much harder to discover and I'd have plausible deniability -- "oh I just misunderstood what they were saying."

            I think most academic fraud happens in the figures, not the citations. Researchers are more likely to to be successful at making up data points than making up references because it's impossible to know without the data files.

            • direwolf202 hours ago
              Generating a paper with an LLM is already academic fraud. You, the fraudster, are trying to optimize your fraud-to-effort ratio which is why you don't bother to look for existing papers to mis-cite.
    • Analemma_4 hours ago
      Kinda gives the whole game away, doesn’t it? “It doesn’t actually matter if the citations are hallucinated.”

      In fairness, NeurIPS is just saying out loud what everyone already knows. Most citations in published science are useless junk: it’s either mutual back-scratching to juice h-index, or it’s the embedded and pointless practice of overcitation, like “Human beings need clean water to survive (Franz, 2002)”.

      Really, hallucinated citations are just forcing a reckoning which has been overdue for a while now.

      • fc417fc8023 hours ago
        > Most citations in published science are useless junk:

        Can't say that matches my experience at all. Once I've found a useful paper on a topic thereafter I primarily navigate the literature by traveling up and down the citation graph. It's extremely effective in practice and it's continued to get easier to do as the digitization of metadata has improved over the years.

      • jacquesm3 hours ago
        There should be a way to drop any kind of circular citation ring from the indexes.
        • gcr3 hours ago
          It's tough because some great citations are hard to find/procure still. I sometimes refer to papers that aren't on the Internet (eg. old wonderful books / journals).
          • jacquesm3 hours ago
            But that actually strengthens those citiations. The I scratch your back you scratch mine ones are the ones I'm getting at and that is quite hard to do with old and wonderful stuff, the authors there are probably not in a position to reciprocate by virtue of observing the grass from the other side.
            • gcr3 hours ago
              I think it's a hard problem. The semanticscholar folks are doing the sort of work that would allow them to track this; I wonder if they've thought about it.

              A somewhat-related parable: I once worked in a larger lab with several subteams submitting to the same conference. Sometimes the work we did was related, so we both cited each other's paper which was also under review at the same venue. (These were flavor citations in the "related work" section for completeness, not material to our arguments.) In the review copy, the reference lists the other paper as written by "anonymous (also under review at XXXX2025)," also emphasized by a footnote to explain the situation to reviewers. When it came time to submit the camera-ready copy, we either removed the anonymization or replaced it with an arxiv link if the other team's paper got rejected. :-) I doubt this practice improved either paper's chances of getting accepted.

              Are these the sorts of citation rings you're talking about? If authors misrepresented the work as if it were accepted, or pretended it was published last year or something, I'd agree with you, but it's not too uncommon in my area for well-connected authors to cite manuscripts in process. I don't think it's a problem as long as they don't lean on them.

              • jacquesm3 hours ago
                No, I'm talking about the ones where the citation itself is almost or even completely irrelevant and used as a way to inflate the citation count of the authors. You could find those by checking whether or not the value as a reference (ie: contributes to the understanding of the paper you are reading) is exceeded by the value of the linkage itself.
              • zipy1242 hours ago
                The flavour citations in related work is the best place to launder citations.
  • currymj38 minutes ago
    Especially for your first NeurIPS paper as a PhD student, getting one published is extremely lucrative.

    Most big tech PhD intern job postings have NeurIPS/ICML/ICLR/etc. first author paper as a de facto requirement to be considered. It's like getting your SAG card.

    If you get one of these internships, it effectively doubles or triples your salary that year right away. You will make more in that summer than your PhD stipend. Plus you can now apply in future summers and the jobs will be easier to get. And it sets your career on a good path.

    A conservative estimate of the discounted cash value of a student's first NeurIPS paper would certainly be five figures. It's potentially much higher depending on how you think about it, considering potential path dependent impacts on future career opportunities.

    We should not be surprised to see cheating. Nonetheless, it's really bad for science that these attempts get through. I also expect some people did make legitimate mistakes letting AI touch their .bib.

  • direwolf204 hours ago
    Wow! They're literally submitting references to papers by Firstname Lastname, John Doe and Jane Smith and nobody is noticing or punishing them.
    • emil-lp4 hours ago
      They might (I hope) still be punished after discovery.
    • an0malous4 hours ago
      It’s the way of the future
    • heliumtera3 hours ago
      [flagged]
      • sigbottle2 hours ago
        I'm a feyerabend sympathizer, but even he wouldn't have gone this far.

        He was against establishment dogma, not pro-anti intellectualism.

      • azan_3 hours ago
        Yes, it only led to all advancements in the history of humanity, what a joke!
        • heliumtera2 hours ago
          I am sure all advancements in the history of humanity was properly peer reviewed!

          Including coca cola and Linux!

          • azan_2 hours ago
            If you wanted to attack peer review you should've attacked peer review, not entire science. And if "muh science" was some kind of code for peer review then it's not my fault that you are awful at articulating your point. It's still not clear what the hell do you mean.
            • heliumteraan hour ago
              Science funding is strictly tied to publication on "respected journals" which is strictly tied to this gatekeeping horrendous process.

              I won't deny I am terrible at articulating my point, but I will maintain it. We can undeniably say that science, scientific institutions, scientific periodic journals, funding and any other financial instrument constructed to promote scientific advancements is rotten by design and should be abandoned immediately. This joke serves no good.

              "But what about muh scientific method?" Yeah yeah yeah, whoever thinks modern science honors logic and reason is part of the problem and has being played, and forever will be

      • biophysboy2 hours ago
        [flagged]
      • Sharlin3 hours ago
        [flagged]
  • gcr4 hours ago
    I was getting completely AI-generated reviews for a WACV publication back in 2024. The area chairs are so overworked that authors don't have much recourse, which sucks but is also really hard to handle unless more volunteers step up to the bat to help organize the conference.

    (If you're qualified to review papers, please email the program chair of your favorite conference and let them know -- they really need the help!)

    As for my review, the review form has a textbox for a summary, a textbox for strengths, a textbox for weaknesses, and a textbox for overall thoughts. The review I received included one complete set of summary/strengths/weaknesses/closing thoughts in the summary text box, another distinct set of summary/strengths/weaknesses/closing thoughts in the strengths, another complete and distinct review in the weaknesses, and a fourth complete review in the closing thoughts. Each of these four reviews were slightly different and contradicted each other.

    The reviewer put my paper down as a weak reject, but also said "the pros greatly outweigh the cons."

    They listed "innovative use of synthetic data" as a strength, and "reliance on synthetic data" as a weakness.

  • rfreyan hour ago
    There's a lot of good arguments in this thread about incentives: extremely convincing about why current incentives lead to exactly this behaviour, and also why creating better incentives is a very hard problem.

    If we grant that good carrots are hard to grow, what's the argument against leaning into the stick? Change university policies and processes so that getting caught fabricating data or submitting a paper with LLM hallucinations is a career ending event. Tip the expected value of unethical behaviours in favour of avoiding them. Maybe we can't change the odds of getting caught but we certainly can change the impact.

    This would not be easy, but maybe it's more tractable than changing positive incentives.

  • ctothan hour ago
    The innumeracy is load-bearing for the entire media ecosystem. If readers could do basic proportional reasoning, half of health journalism and most tech panic coverage would collapse overnight.

    GPTZero of course knows this. "100 hallucinations across 53 papers at prestigious conference" hits different than "0.07% of citations had issues, compared to unknown baseline, in papers whose actual findings remain valid."

    • MeetingsBrowseran hour ago
      I’m not sure that’s fair in this context.

      In the past, a single paper with questionable or falsified results at a top tier conference was big news.

      Something that casts doubt on the validity of 53 papers at a top AI conference is at least notable.

      > whose actual findings remain valid

      Remain valid according to who? The same group that missed hundreds of hallucinated citations?

      • ctoth31 minutes ago
        Which of these papers had falsified results and not bad citations?

        What is the base rate of bad citations pre-AI?

        And finally yes. Peer review does not mean clicking every link in the footnotes to make sure the original paper didn't mislink, though I'm sure after this bruhaha this too will be automated.

  • smallpipe4 hours ago
    Could you run a similar analysis for pre-2020 papers? It'd be interesting to know how prevalent making up sources was before LLMs.
    • tasuki3 hours ago
      Also, it'd be interesting how many pre-2020 papers their "AI detector" marks as AI-generated. I distrust LLMs somewhat, but I distrust AI detectors even more.
    • theptip3 hours ago
      Yeah, it’s kind of meaningless to attribute this to AI without measuring the base rate.

      It’s for sure plausible that it’s increasing, but I’m certain this kind of thing happened with humans too.

  • neom2 hours ago
    I wrote before about my embarrassing time with ChatGPT during a period (https://news.ycombinator.com/item?id=44767601) - I decided to go back through those old 4o chats with 5.2 pro extended thinking, the reply was pretty funny because it first slightly ridiculed me, heh - but what it showed was: basically I would say "what 5 research papers from any area of science talk to these ideas" and it would find 1 and invent 4 if it didn't know 4 others, and not tell me, and then I'd keep working with it and it would invent what it thought might be in the papers long the way, making up new papers in it's own work to cite to make it's own work valid, lol. Anyway, I'm a moron, sure, and no real harm came of it for me, just still slightly shook I let that happen to me.
  • mat_b9 minutes ago
    > we discovered 100s of hallucinated citations missed by the 3+ reviewers who evaluated each paper.

    This says just as much about the humans involved.

  • doug_durham3 hours ago
    Getting papers published is now more about embellishing your CV versus a sincere desire to present new research. I see this everywhere at every level. Getting a paper published anywhere is a checkbox in completing your resume. As an industry we need to stop taking this into consideration when reviewing candidates or deciding pay. In some sense it has become an anti-signal.
    • biophysboy2 hours ago
      I think its fairer to say that perverse incentives have added more noise to the publishing signal. Publishing 0 times is not better than 100 times, even if 90% of those are Nth author formality/politeness citations.
    • autoexec2 hours ago
      It'd be nice if there were a dedicated journal for papers published just because you have to publish for your CV or to get your degree. That way people can keep publishing for the sake of publishing, but you could see at a glance what the deal was.
    • londons_explore2 hours ago
      I'd like to see a financial approach to deciding pay by giving researchers a small and perhaps nonlinear or time bounded share of any profits that arise from their research.

      Then peoples CV's could say "My inventions have led to $1M in licensing revenue" rather than "I presented a useless idea at a decent conference because I managed to make it sound exciting enough to get accepted".

      • autoexec2 hours ago
        A lot of good research isn't ever going to make anyone a single dime, but that doesn't mean it doesn't matter.
      • direwolf202 hours ago
        That's what patents do.
  • cyber_kinetist27 minutes ago
    It has been several years since the reviewing process for top AI conferences have been broken as hell, due to having too many submissions and only a few reviewers (up to the point that Masters students are reviewing the papers). It was only a matter of time before these conferences will be filled with AI-written papers.
  • Nevermark2 hours ago
    With regard to confabulating (hallucinating) sources, or anything else, it is worth noting this is a first class training requirement imposed on models. Not models simply picking up the habit from humans.

    When training a student, normally we expect a lack of knowledge early, and reward self-awareness, self-evaluation and self-disclosure of that.

    But the very first epoch of a model training run, when the model has all the ignorance of a dropped plate of spaghetti, we optimize the network to respond to information, as anything from a typical human to an expert, without any base of understanding.

    So the training practice for models is inherently extreme enforced “fake it until you make it”, to a degree far beyond any human context or culture.

    (Regardless, humans need to verify, not to mention read, the sources they site. But it will be nice when models can be trusted to accurately access what they know/don’t-know too.)

  • gtirlonian hour ago
    Why focus on hallucinations/LLMs and not on the authors? There are rules for submitting papers.

    If I drop a loaded gun and it fires, killing someone, we don't go after the gun's manufacturer in most cases.

    • phyzome34 minutes ago
      This isn't directly to your point, but: A civil suit for such an incident would generally name both the weapon owner (for negligence, etc.) and the manufacturer (for dangerous design).
  • leggerss3 hours ago
    I don't understand: why aren't there automated tools to verify citations' existence? The data for a citation has a structured styling (APA, MLA, Chicago) and paper metadata is available via e.g. a web search, even if the paper contents are not

    I guess GPTZero has such a tool. I'm confused why it isn't used more widely by paper authors and reviewers

    • gh02t3 hours ago
      Citations are too open ended and prone to variation, and legitimate minor mistskes that wouldn't bother a human verifier but would break automated tools to easily verify in their current form. DOI was supposed to solve some of the literal mechanical variation of the existence of a source, but journal paywalls and limited adoption mean that is not a universal solution. Plus DOI still doesn't easily verify the factual accuracy of a citation, like "does the source say what the citation says it does," which is the most important part.

      In my experience you will see considerable variation in citation formats, even in journals that strictly define it and require using BibTex. And lots of journals leave their citation format rules very vague. Its a problem that runs deep.

      • leggerss2 hours ago
        Thanks for the thoughtful reply!
    • eichin3 hours ago
      Looks like GPTZero Source Finder was only released a year ago - if anything, I'm surprised slop-writers aren't using it preemptively, since they're "ahead of the curve" relative to reviewers on this sort of thing...
  • Molitor59014 hours ago
    AI might just extinguish the entire paradigm of publish or perish. The sheer volume of papers makes it nearly impossible to properly decide which papers have merit, which are non-replicate and suspect, and which are just a desperate rush to publish. The entire practice needs to end.
    • SJC_Hacker2 hours ago
      Its not publish or perish so much as get grant money or perish.

      Publishing is just the way to get grants.

      A PI explained it to me once, something like this

      Idea(s) -> Grant -> Experiments -> Data -> Paper(s) -> Publication(s) -> Idea(s) -> Grant(s)

      Thats the current cycle ... remove any step and its a dead end

    • shermantanktop3 hours ago
      But how could we possibly evaluate faculty and researcher quality without counting widgets on an assembly line? /s

      It’s a problem. The previous regime prior to publishing-mania was essentially a clubby game of reputation amongst peers based on cocktail party socialization.

      The publication metrics came out of the harder sciences, I believe, and then spread to the softest of humanities. It was always easy to game a bit if you wanted to try, but now it’s trivial to defeat.

  • CGMthrowaway4 hours ago
    Which is worse:

    a) p-hacking and suppressing null results

    b) hallucinations

    c) falsifying data

    Would be cool to see an analysis of this

    • amitav12 hours ago
      I'm doing some research, and this is something I'm unsure of. I see that "suppressing null results" is a bad thing, and I sort of agree, but for me personally, a lot of the null results are just the result of my own incompetence and don't contain any novel insights.
    • Proziam4 hours ago
      All 3 of these should be categorized as fraud, and punished criminally.
      • internetter4 hours ago
        criminally feels excessive?
        • jacquesm3 hours ago
          You could make a good case for a white collar crime here, fraud for instance.
        • Proziam4 hours ago
          If I steal hundreds of thousands of dollars (salary, plus research grants and other funds) and produce fake output, what do you think is appropriate?

          To me, it's no different than stealing a car or tricking an old lady into handing over her fidelity account. You are stealing, and society says stealing is a criminal act.

          • WarmWash3 hours ago
            We have a civil court system to handle stuff like this already.
            • wat100003 hours ago
              We also have a criminal court system to handle stuff like this.
              • WarmWash3 hours ago
                No we don't. I've never seen a private contract dispute go to criminal court, probably because it's a civil matter.

                If they actually committed theft, well then that already is illegal too.

                But right now, doing "shitty research" isn't illegal and it's unlikely it ever will be.

                • wat100002 hours ago
                  The claim is that this would qualify as fraud, which is also illegal.

                  If you do a search for "contractor imprisoned for fraud" you'll find plenty of cases where a private contract dispute resulted in criminal convictions for people who took money and then didn't do the work.

                  I don't know if taking money and then merely pretending to do the research would rise to the level of criminal fraud, but it doesn't seem completely outlandish.

            • Proziam3 hours ago
              Stealing more than a few thousand dollars is a felony, and felonies are handled in criminal court, not civil.

              EDIT - The threshold amount varies. Sometimes it's as low as a few hundred dollars. However, the point stands on its own, because there's no universe where the sum in question is in misdemeanor territory.

              • WarmWash3 hours ago
                It would fall under the domain of contract law, because maybe the contract of the grant doesn't prohibit what the researcher did. The way to determine that would be in court - civil court.

                Most institutions aren't very chill with grant money being misused, so we already don't need to burden then state with getting Johnny muncipal prosecutor to try and figure out if gamma crystallization imaging sources were incorrect.

                • Proziam2 hours ago
                  Fraud implies intent, either intent to deceive or intentionally negligent.

                  If you're taking public funds (directly or otherwise) with the intent to either:

                  A) Do little to no real work, and pass of the work of an AI as being your own work, or

                  B) Knowingly publish falsified data

                  Then you are, without a single shred of doubt, in criminal fraud territory. Further, the structural damage you inflict when you do the above is orders of magnitude greater than the initial fraud itself. That is a matter for civil courts ("Our company based on development on X fraudulent data, it cost us Y in damages").

                  Whether or not charges are pressed is going to happen way after all the internal reviews have demonstrated the person being charged has gone beyond the "honest mistake" threshold. It's like Walmart not bothering to call the cops until you're into felony territory, there's no point in doing so.

  • armcat3 hours ago
    This is awful but hardly surprising. Someone mentioned reproducible code with the papers - but there is a high likelihood of the code being partially or fully AI generated as well. I.e. AI generated hypothesis -> AI produces code to implement and execute the hypothesis -> AI generates paper based on the hypothesis and the code.

    Also: there were 15 000 submissions that were rejected at NeurIPS; it would be very interesting to see what % of those rejected were partially or fully AI generated/hallucinated. Are the ratios comperable?

    • blackbear_3 hours ago
      Whether the code is AI generated or not is not important, what matters is that it really works.

      Sharing code enables others to validate the method on a different dataset.

      Even before LLMs came around there were lots of methods that looked good on paper but turned out not to work outside of accepted benchmarks

  • 3 hours ago
    undefined
  • theptip3 hours ago
    This is mostly an ad for their product. But I bet you can get pretty good results with a Claude Code agent using a couple simple skills.

    Should be extremely easy for AI to successfully detect hallucinated references as they are semi-structured data with an easily verifiable ground truth.

  • mt_3 hours ago
    It would be ironic if the very detection of hallucinations contained hallucinations of its own.
  • 4 hours ago
    undefined
  • londons_explore2 hours ago
    And this is the tip of the iceberg, because these are the easy to check/validate things.

    I'm sure plenty of more nuanced facts are also entirely without basis.

  • alcasa38 minutes ago
    Didn't know the L in Samuel L Jackson was for LeCun.
  • yobbo3 hours ago
    As long as these sorts of papers serve more important purposes for the careers of the authors than anything related to science or discovery of knowledge, then of course this happens and continues.

    The best possible outcome is that these two purposes are disconflated, with follow-on consequences for the conferences and journals.

  • rabbitlord2 hours ago
    You will find out that Top CS conference is never scientific, if you really go to their GitHub and run their code.
  • teekert2 hours ago
    We have the h score and such, can we have something similar that goes down when you pull stunts like these? Preferably link it to people’s orcid ids.
  • dtartarotti4 hours ago
    It is very concerning that these hallucinations passed through peer review. It's not like peer review is a fool-proof method or anything, but the fact that reviewers did not check all references and noticed clearly bogus ones is alarming and could be a sign that the article authors weren't the only ones using LLMs in the process...
    • amanaplanacanal4 hours ago
      Is it common for peer reviewers to check references? Somehow I thought they mostly focused on whether the experiment looked reasonable and the conclusions followed.
      • emil-lp4 hours ago
        In journal publications it is, but without DOIs it's difficult.

        In conference publications, it's less common.

        Conference publications (like NEURips) is treated as announcement of results, not verified.

        • empiko3 hours ago
          Nobody in ML or AI is verifying all your references. Reviewers will point out if you miss a super related work, but that's it. This is especially true with the recent (last two decades?) inflation in citation counts. You regularly have papers with 50+ references for all kinds of claims and random semirelated work. The citation culture is really uninspiring.
  • abktowaan hour ago
    Implicitly this makes sense but the amount cited in this article is still hard for me to grasp. Wow.
  • nospice3 hours ago
    We've been talking about a "crisis of reproducibility" for years and the incentive to crank out high volumes of low-quality research. We now have a tool that brings down the cost of producing plausibly-looking research down to zero. So of course we're going to see that tool abused on a galactic scale.

    But here's the thing: let's say you're an university or a research institution that wants to curtail it. You catch someone producing LLM slop, and you confirm it by analyzing their work and conducting internal interviews. You fire them. The fired researcher goes public saying that they were doing nothing of the sort and that this is a witch hunt. Their blog post makes it to the front page of HN, garnering tons of sympathy and prompting many angry calls to their ex-employer. It gets picked up by some mainstream outlets, too. It happened a bunch of times.

    In contrast, there are basically no consequences to institutions that let it slide. No one is angrily calling the employers of the authors of these 100 NeurIPS papers, right? If anything, there's the plausible deniability of "oh, I only asked ChatGPT to reformat the citations, the rest of the paper is 100% legit, my bad".

  • not2b43 minutes ago
    This is going to be a huge problem for conferences. While journals have a longer time to get things right, as a conference reviewer (for IEEE conferences) I was often asked to review 20+ papers in a short time to determine who gets a full paper, who gets to present just a poster, etc. There was normally a second round, but often these would just look at submissions near the cutoff margin in the rankings. Obvious slop can be quickly rejected, but it will be easier to sneak things in.
    • cyber_kinetist16 minutes ago
      AI conferences are already fucked. Students who are doing their Master's degrees are reviewing those top-tier papers, since there are just too many submissions for existing reviewers.
  • bonsai_spool4 hours ago
    This suggests that nobody was screening this papers in the first place—so is it actually significant that people are using LLMs in a setting without meaningful oversight?

    These clearly aren't being peer-reviewed, so there's no natural check on LLM usage (which is different than what we see in work published in journals).

    • emil-lp4 hours ago
      As one who reviews 20+ papers per year, we don't have time to verify each reference.

      We verify: is the stuff correct, and is it worthy of publication (in the given venue) given that it is correct.

      There is still some trust in the authors to not submit made-up-stuff, albeit it is diminishing.

      • paulmist3 hours ago
        I'm surprised the conference doesn't provide tooling to validate all references automatically.
        • Sharlin3 hours ago
          How would you do that? Even in cases where there's a standard format, a DOI on every reference, and some giant online library of publication metadata, including everything that only exists in dead tree format, that just lets you check whether the cited work exists, not whether it's actually a relevant thing to cite in the context.
      • its_ethan2 hours ago
        Sorry, but if someone makes a claim and cites a reference, how do you verify "is the stuff correct" without checking that reference?
        • emil-lp2 hours ago
          Those are typically things you are familiar with or can easily check.

          Fake references are more common in the introduction where you list relevant material to strengthen your results. They often don't change the validity of the claim, but the potential impact or value.

    • alain940403 hours ago
      When I was reviewing such papers, I didn't bother checking that 30+ citations were correctly indexed. I focused on the article itself, and maybe 1 or 2 citations that are important. That's it. For most citations, they are next to an argument that I know is correct, so why would I bother checking. What else do you expect? My job was to figure out if the article ideas are novel and interesting, not if they got all their citations right.
    • gcr3 hours ago
      Academic venues don't have enough reviewers. This problem isn't new, and as publication volumes increase, it's getting sharply worse.

      Consider the unit economics. Suppose NeurIPS gets 20,000 papers in one year. Suppose each author should expect three good reviews, so area chairs assign five reviewers per paper. In total, 100,000 reviews need to be written. It's a lot of work, even before factoring emergency reviewers in.

      NeurIPS is one venue alongside CVPR, [IE]CCV, COLM, ICML, EMNLP, and so on. Not all of these conferences are as large as NeurIPS, but the field is smaller than you'd expect. I'd guess there are 300k-1m people in the world who are qualified to review AI papers.

      • khuey3 hours ago
        Seems like using tooling like this to identify papers with fake citations and auto-rejecting them before they ever get in front of a reviewer would kill two birds with one stone.
        • gcr3 hours ago
          It's not always possible to distinguish between fake citations and citations that are simply hard to find (e.g. wonderful old books that aren't on the Internet).

          Another problem is that conferences move slowly and it's hard to adjust the publication workflow in such an invasive way. CVPR only recently moved from Microsoft's CMT to OpenReview to accept author submissions, for example.

          There's a lot of opportunity for innovation in this space, but it's hard when everyone involved would need to agree to switch to a different workflow.

          (Not shooting you down. It's just complicated because the people who would benefit are far away from the people who would need to do the work to support it...)

          • khuey2 hours ago
            Sure, I agree that it's far from trivial to implement.
  • 3 hours ago
    undefined
  • trash_cat2 hours ago
    Clearly there is some demand for those papers, and research, to exist. Good opportunity to fill the gaps.
  • dev_l1x_be2 hours ago
    I am wondering if we are going to reach hallucination collapse sooner than we reach AGI.
  • ctoth3 hours ago
    How you know it's really real is that they clearly tell the FPR, and compare against a pre-llm baseline.

    But I saw it in Apple News, so MISSION ACCOMPLISHED!

  • 4 hours ago
    undefined
  • geremiiah4 hours ago
    A lot of research in AI/ML seems to me to be "fake it and never make it". Literally it's all about optics, posturing, connections, publicity. Lots of bullshit and little substance. This was true before AI slop, too. But the fact that AI slop can make it pass the review really showcases how much a paper's acceptance hinges on things, other than the substance and results of the paper.

    I even know PIs who got fame and funding based on some research direction that supposedly is going to be revolutionary. Except all they had were preliminary results that from one angle, if you squint, you can envision some good result. But then the result never comes. That's why I say, "fake it, and never make it".

  • nerdjon3 hours ago
    The downstream effects of this are extremely concerning. We have already seen the damage caused by human written research that was later retracted like the “research” on vaccines causing autism.

    As we get more and more papers that may be citing information that was originally hallucinated in the first place we have a major reliability issue here. What is worse is people that did not use AI in the first place will be caught in the crosshairs since they will be referencing incorrect information.

    There needs to be a serious amount of education done on what these tools can and cannot do and importantly where they fail. Too many people see these tools as magic since that is what the big companies are pushing them as.

    Other than that we need to put in actual repercussions for publishing work created by an LLM without validating it (or just say you can’t in the first place but I guess that ship has sailed) or it will just keep happening. We can’t just ignore it and hope it won’t be a problem.

    And yes, humans can make mistakes too. The difference is accountability and the ability to actually be unsure about something so you question yourself to validate.

  • godelskian hour ago
    Given that many of these detections are being made from references, I don't understand why we're not using automatic citation checkers.

    Just ask authors to submit their bib file so we don't need to do OCR on the PDF. Flag the unknown citations and ask reviewers to verify their existence. Then contact authors and ban if they can't produce the cited work.

    This is low hanging fruit here!

    Detecting slop where the authors vet citations is much harder. The big problem with all the review rules is they have no teeth. If it were up to me we'd review in the open, or at least like ICLR. Publish the list of known bad actors and let is look at the network. The current system is too protective of egregious errors like plagiarism. Authors can get detected in one conference, pull, and submit to another, rolling the dice. We can't allow that to happen and we should discourage people from associating with these conartists.

    AI is certainly a problem in the world of science review, but it's far from the only one and I'm not even convinced it's the biggest. The biggest is just that reviewers are lazy and/or not qualified to review the works they're assigned. It takes at least an hour to properly review a paper in your niche, much more when it's outside. We're over worked as is, with 5+ works to review, not to mention all the time we got to spend reworking our own works that were rejected due to the slot machine. We could do much better if we dropped this notion of conference/journal prestige and focused on the quality of the works and reviews.

    Addressing those issues also addresses the AI issues because, frankly, *it doesn't matter if the whole work was done by AI, what matters is if the work is real.*

  • fulafel4 hours ago
    Is there a comparison to rate of reference errors in other forums?
  • meindnoch3 hours ago
    Jamie, bring up their nationalities.
  • brador3 hours ago
    The problem isn’t scale.

    The problem is consequences (lack of).

    Doing this should get you barred from research. It won’t.

  • captainbland2 hours ago
    What's wild is so many of these are from prestigious universities. MIT, Princeton, Oxford and Cambridge are all on there. It must be a terrible time to be an academic who's getting outcompeted by this slop because somebody from an institution with a better name submitted it.
    • cflewis2 hours ago
      I'm going to be charitable and say that the papers from prestigious universities were honest mistakes rather than paper mill university fabrications.

      One thing that has bothered me for a very long time is that computer science (and I assume other scientific fields) has long since decided that English is the lingua franca, and if you don't speak it you can't be part of it. Can you imagine if being told that you could only do your research if you were able to write technical papers in a language you didn't speak, maybe even using glyphs you didn't know? It's crazy when you think about it even a little bit, but we ask it of so many. Let's not include the fact that 90% of the English-speaking population couldn't crank out a paper to the required vocabulary level anyway.

      A very legitimate, not trying to cheat, use for LLMs is translation. While it would be an extremely broad and dangerous brush to paint with, I wonder if there is a correlation between English-as-a-Second (or even third)-Language authors and the hallucinations. That would indicate that they were trying to use LLMs to help craft the paper to the expected writing level. The only problem being that it sometimes mangles citations, and if you've done good work and got 25+ citations, it's easy for those errors to slip through.

  • poulpy1233 hours ago
    All papers proved to have used a LLM beyond writing improvement should be automatically retracted
  • pandemic_region3 hours ago
    What if they would only accept handwritten papers? Basically the current system is beyond repair, so may as well go back to receiving 20 decent papers instead of 20k hallucinated ones.
  • techIA2 hours ago
    They will turn it into a party drug.
  • CrzyLngPwd3 hours ago
    This is not the AI future we dreamed of, or feared.
  • yepyeaisntityea3 hours ago
    No surprises. Machine learning has, at least since 2012, been the go-to field for scammers and grifters. Machine learning, and technology in general, is basically a few real ideas, a small number of honest hard workers, and then millions of fad chasers and scammers.
  • qwertox4 hours ago
    It would be great if those scientists who use AI without disclosing it get fucked for life.
    • bwfan1233 hours ago
      > It would be great if those scientists who use AI without disclosing it get fucked for life.

      There need to be dis-incentives for sloppy work. There is a tension between quality and quantity in almost every product. Unfortunately academia has become a numbers-game with paper-mills.

    • direwolf204 hours ago
      "scientists" FYI. Making shit up isn't science.
    • oofbey4 hours ago
      Harsh sentiment. Pretty soon every knowledge worker will use AI every day. Should people disclose spellcheckers powered by AI? Disclosing is not useful. Being careful in how you use it and checking work is what matters.
      • geremiiah4 hours ago
        What they are doing is plain cheating the system to get their 3 conference papers so they can get their $150k+ job at FAANG. It's plain cheating with no value.
        • WarmWash3 hours ago
          We are only looking at one side of the equation here, in this whole thread.

          This feels a bit like the "LED stoplights shouldn't be used because they don't melt snow" argument.

          • mikkupikku2 hours ago
            Confront the culprit and ask for their side; you'll just get some sob story about how busy they are and how they were only using the AI to check their grammar and they just don't know how the whole thing ended up fabricated... Waste of time. Just blacklist these people, they're no better than any other scammer.
        • barbazoo4 hours ago
          People that cheat with AI now probably found ways to cheat before as well.
        • shermantanktop3 hours ago
          Cheating by people in high status positions should get the hammer. But it gets the hand-wringing what-have-we-come-to treatment instead.
      • ambicapter4 hours ago
        > Should people disclose spellcheckers powered by AI?

        Thank you for that perfect example of a strawman argument! No, spellcheckers that use AI is not the main concern behind disclosing the use of AI in generating scientific papers, government reports, or any large block of nonfiction text that you paid for that is supposed to make to sense.

      • fisf4 hours ago
        People are accountable for the results they produce using AI. So a scientist is responsible for made up sources in their paper, which is plain fraud.
        • eichin3 hours ago
          "responsible for made up sources" leads to the hilarious idea that if you cite a paper that doesn't exist, you're now obliged to write that paper (getting it retroactively published might be a challenge though)
        • oofbey4 hours ago
          I completely agree. But “disclosing the use of AI” doesn’t solve that one bit.
          • barbazoo3 hours ago
            I don’t disclose what keyboard I use to write my code or if I applied spellcheck afterward. The result is 100% theirs.
      • Sharlin3 hours ago
        In general we're pretty good at drawing a line between purely editorial stuff like using a spellchecker, or even the services a professional editor (no need to acknowledge), and independent intellectual contribution (must be acknowledged). There's no slippery slope.
      • duskdozer3 hours ago
        >Pretty soon every knowledge worker will use AI every day.

        Maybe? There's certainly a push to force the perception of inevitability.

      • Proziam4 hours ago
        False equivalence. This isn't about "using AI" it's about having an AI pretend to do your job.

        What people are pissed about is the fact their tax dollars fund fake research. It's just fraud, pure and simple. And fraud should be punished brutally, especially in these cases, because the long tail of negative effects produces enormous damage.

        • freedomben3 hours ago
          I was originally thinking you were being way too harsh with your "punish criminally" take, but I must admit, you're winning me over. I think we would need to be careful to ensure we never (or realistically, very rarely) convict an innocent person, but this is in many cases outright theft/fraud when someone is making money or being "compensated" for producing work that is fraudulent.

          For people who think this is too harsh, just remember we aren't talking about undergrads who cheat on a course paper here. We're talking about people who were given money (often from taxpayers) that committed fraud. This is textbook white collar crime, not some kid being lazy. At a minimum we should be taking all that money back from them and barring them from ever receiving grant money again. In some cases I think fines exceeding the money they received would be appropriate.

      • vimda4 hours ago
        "Pretty soon every knowledge worker will use AI every day" is a wild statement considering the reporting that most companies deploying AI solutions are seeing little to no benefit, but also, there's a pretty obvious gap between spell checkers and tools that generate large parts of the document for you
      • PunchyHamster3 hours ago
        nice job moving the goalpost from "hallucinated the research/data" to "spellchecker error"
      • jsksdkldld4 hours ago
        [dead]
    • pandemic_region3 hours ago
      Instead of publishing their papers in the prestigious zines - which is what they're after - we will publish them in "AI Slop Weekly" with name and picture. Up the submission risk a bit.
    • yesitcan4 hours ago
      One fuck seems appropriate.
  • Tom13804 hours ago
    No ETH Zurich, let's go
  • jordanpg4 hours ago
    If these are so easy to identify, why not just incorporate some kind of screening into the early stages of peer review?
    • tossandthrow4 hours ago
      What makes you believe that are easy to identify?
      • emil-lp4 hours ago
        One could require DOIs for each reference. That's both realistic to achieve and easy to verify.

        Although then why not just cite existing papers for bogus reasons?

      • jordanpgan hour ago
        Isn't that what GPTZero does?
    • DetectDefect4 hours ago
      Because real work takes time and effort, and there is no real incentive for it here.
  • MORPHOICES3 hours ago
    [dead]
  • GrowingSideways3 hours ago
    [dead]
  • TAULIC153 hours ago
    [flagged]
  • depressionalt3 hours ago
    This is nice and all, but what repercussion does GPTZero get when their bullshit AI detection hallucinates a student using AI? And when that student receives academic discipline because of it?

    Many such cases of this. More than 100!

    They claim to have custom detection for GPT-5, Gemini, and Claude. They're making that up!

    • freedomben3 hours ago
      Indeed. My son has been accused by bullshit AI detection as having used AI, and it has devastated his work quality. After being "disciplined" for using AI (when he didn't), he now intentionally tries to "dumb down" his writing so that it doesn't sound so much like AI. The result is he writes much worse. What a shitty, shitty outcome. I've even found myself leaving typos and things in (even on sites like HN) because if you write too well, inevitably some comment replier will call you out as being an LLM even when you aren't. I'm as annoyed by the LLM posts as everybody else, but the answer surely is not to dumb us down into Idiocracy.
      • Sharlin3 hours ago
        It's almost as if this whole LLM stuff wasn't a net benefit to the society after all.