53 pointsby neilv7 hours ago8 comments
  • TomasBM2 hours ago
    The reasons listed in TFA - "confidentiality, sensitive data and compromising authors’ intellectual property" - make sense to discourage reviewers from using cloud-based LLMs.

    There are also reasons for discouraging the use LLMs in peer review at all: it defeats the purpose of peer in the peer review; hallucinations; criticism not relevant to the community; and so on.

    However, I think it's high time to reconsider what scientific review is supposed to be. Is it really important to have so-called peers as gatekeepers? Are there automated checks we can introduce to verify claims or ensure quality (like CI/CD for scientific articles), and leave content interpretation to the humans?

    Let's make the benefits and costs explicit: what would we be gaining or losing if we just switched to LLM-based review, and left the interpretation of content to the community? The journal and conference organizers certainly have the data to do that study; and if not, tool providers like EasyChair do.

    • i_am_proteusan hour ago
      Yes, there are often strong reasons to have peers as gatekeepers. Scientific writing is extremely information-dense. Consider a niche technical task that you work on -- now consider summarizing a day's worth of work in one or two sentences, designed to be read by someone else with similar expertise. In most scientific fields, the niches are pretty small, The context necessary to parse that dense scientific writing into a meaningful picture of the research methods is often years/decades of work in the field. Only peers are going to have that context.

      There are also strong reasons why the peers-as-gatekeepers model is detrimental to the pursuit of knowledge, such as researchers forming semi-closed communities that bestow local political power on senior people in the field, creating social barriers to entry or critique. This is especially pernicious given the financial incentives (competition for a limited pool of grant money; award of grant money based on publication output) that researchers are exposed to.

      • godelski31 minutes ago
        I think if you leave authors alone they will be more likely to write in the first category rather than the second. After all, papers are mainly written to communicate your findings to your direct peers. So information dense isn't bad because the target audience understands.

        Of course that makes it harder for people outside to penetrate but this also depends on the culture of the specific domain and there's usually people writing summaries and surveys. Great task for grad students tbh (you read a ton of papers, summarize, and by that point you should have a good understanding of what needs to be worked on in the field and not just dragged through by your advisor)

    • godelskian hour ago

        > However, I think it's high time to reconsider what scientific review is supposed to be
      
      I've been arguing for years we should publish to platforms like OpenReview and that basically we check for plagiarism and obvious errors but otherwise publish.

      The old days the bottleneck was the physical sending out of papers. Now that's cheap. So make comments public. We're all on the same side. The people that will leave reviews are more likely to actually be invested in the topic rather than doing review as purely a service. It's not perfect but no system will be and we currently waste lots of time chasing reviewers

    • pastagean hour ago
      When you measure something people will start to game it. CI/CD needs a firm hand to work.
    • 2 hours ago
      undefined
    • conartist6an hour ago
      Better to completely drop review, don't you think?

      LLMs simply don't do science. They have no integrity.

      There is no "bullshit me" step in the scientific process

  • kachapopopow5 hours ago
    I think it's interesting that AI is probably unintuitively good at spotting fraud in papers due to their ability to hold more context than majority of humans. I wish someone explored this to see if it can spot academic fraud that isn't in their training data already.
    • BDPW3 hours ago
      LLM's still make stuff up routinely about things like this so no there's no way this is a reliable method.
      • kachapopopow3 hours ago
        It doesn't have to be reliable! It just has to flag things: "hey these graphs look like they were generated using (formula)" or "these graphs do not seem to represent realistic values / real world entrophy" - it just has to be a tool that stops very advanced fraud from slipping through when it already passed human peer review.

        The only reason why this is helpful is because humans have natural biases and/or inverse of AI biases which allow them to find patterns that might just be the same graph being scaled up 5 to 10 times.

        • BDPW42 minutes ago
          I hope I'm wrong but I haven't seen anything like this in practice. I would imagine we have the same problem as before where we could use it as an extra filter but the amount of shit that comes out makes the process not actually any more accurate, just faster.

          Having seen from close-up how these reviews go, I get why people use tools like this unfortunately. it doesn't make me very hopeful for the near future of reviewing.

      • ratg133 hours ago
        Nobody should be using AI as the final arbiter of anything.

        It is a tool, and there always needs to be a user that can validate the output.

    • conartist6an hour ago
      It sounds like it's better at putting the fraud into science than at getting it out
  • D-Machine6 hours ago
    • croes6 hours ago
      6 points no comments vs 18 points and 2 comments.

      Faster isn’t the metric here

      • D-Machine5 hours ago
        Am I missing something here? I am new to posting at HN, despite being a long-time reader.

        I get that HN has a policy to allow duplicates so that duplicates that were missed for arbitrary timing reasons can still gain traction at later times. I've seen plenty of "[Duplicate]" tagged posts, and have just seen this as a sort of useful thing for readers (duplicates may have interesting info, or seeing that the dupe did or did not gain traction also gives me info). But maybe I am missing something here, particularly etiquette-wise?

        • layer85 hours ago
          It’s certainly okay to link to a previous discussion, but “duplicate” implies that you think the present submission shouldn’t exist, and the previous submission doesn’t actually provide any discussion.

          The fact that a previous submission didn’t gain traction isn’t usually interesting, because it can be pretty random whether something gains traction or not, depending on time of day and audience that happens to be online.

          • D-Machine4 hours ago
            Okay, I don't in general see "duplicate" as implying this, but I take your point, and was wondering if that might be the etiquette here.

            I also think, on reflection, that you are right in this particular case (given there are no comments on the previous duplicate) so, thank you also for clarifying.

            I suppose in the future an e.g. "[Previous discussion]" tag would be more appropriate, providing comments were made, otherwise, just say nothing and leave it to HN.

        • kachapopopow5 hours ago
          better title is most often the reason for it, looking at it the em-dash probably caused people to dismiss it as an AI bot.
          • D-Machine5 hours ago
            If that's the simplistic heuristic people here are using...
            • kachapopopow5 hours ago
              if you turn on "show dead" you will see a lot of spam / AI comments
              • D-Machine5 hours ago
                Yeah I have that on so see those all the time, I was more wondering why I got a strange comment about tagging a duplicate, and was wondering if I was breaching some kind of etiquette.
                • kachapopopow5 hours ago
                  people with 100k+ karma often breach the etiquitte they preach so I wouldn't worry too much about it, worse case you get downvoted to -5 and it'll become dead.
                  • D-Machine5 hours ago
                    Ok, figured basically that, but very much appreciate the confirmation. Thanks!
  • D-Machine5 hours ago
    Guidance needs to be more specific. Failing to use AI for search often means you are wasting a huge amount of time, ChatGPT 5.2 Extended Thinking with search enabled speeds up research obscenely, and I'd be more concerned if reviewers were NOT making use of such tools in reviews.

    Seeing the high percentage of usage of AI for composing reviews is concerning, but, also, peer review is an unpaid racket which seems basically random anyway (https://academia.stackexchange.com/q/115231), and probably needs to die given alternatives like ArXiV and OpenPeerReview and etc. I'm not sure how much I care about AI slop contaminating an area that already might be mostly human slop in the first place.

    • jltsiren5 hours ago
      That's a wrong way of using AI in peer review. A key part of reviewing a paper is reading it without preconceptions. After you have done the initial pass, AI can be useful for a second opinion, or for finding something you may have missed.

      But of course, you are often not allowed to do that. Review copies are confidential documents, and you are not allowed to upload them to random third-party services.

      Peer review has random elements, but thats true for all other situations (such as job interviews), where the final decision is made using subjective judgment. There is nothing wrong in that.

      • D-Machine4 hours ago
        > A key part of reviewing a paper is reading it without preconceptions

        I get where you are coming from here, but, in my opinion, no, this is not part of peer review (where expertise implies preconceptions), nor for really anything humans do. If you ignore your pre-conceptions and/or priors (which are formed from your accumulated knowledge and experience), you aren't thinking.

        A good example in peer review (which I have done) would be: I see a paper where I have some expertise of the technical / statistical methods used in a paper, but not of the very particular subject domain. I can use AI search to help me find papers in the subject domain faster than I can on my own, and then I can more quickly see if my usual preconceptions about the statistical methods are relevant on this paper I have to review. I still have to check things, but, previously, this took a lot more time and clever crafting of search queries.

        Failing to use AI for search in this way harms peer review, because, in practice, you do less searching and checking than AI does (since you simply don't have the time, peer review being essentially free slave labor).

        • jltsiren3 hours ago
          By "without preconceptions", I mean that your initial review should not be influenced by anyone else's opinions. In CS, conference management software often makes this explicit by requiring you to upload your review before you can see other reviews. (You can of course revise your review afterwards.)

          You are also supposed to review the paper and not just check it for correctness. If the presentation is unclear, or if earlier sections mislead the reader before later sections clarify the situation, you are supposed to point that out. But if you have seen an AI summary of the paper before reading it, you can no longer do that part. (And if a summary helps to interpret the paper correctly, that summary should be a part of the paper.)

          If you don't have sufficient expertise to review every aspect of the paper, you can always point that out in the review. Reading papers in unfamiliar fields is risky, because it's easy to misinterpret them. Each field has its own way of thinking that can only be learned by exposure. If you are not familiar with the way of thinking, you can read the words but fail to understand the message. If you work in a multidisciplinary field (such as bioinformatics), you often get daily reminders of that.

    • hurturue5 hours ago
      Researchers use it to write the papers themselves: https://www.science.org/content/article/far-more-authors-use...
    • Animats3 hours ago
      Then on top of that there's the slop that comes from the university's PR department, where they turn "New possibly-interesting lab result in surface chemistry" into "Trillion dollar battery technology launched".

      (Now that I think about it, I haven't seen much battery hype lately. The battery hype people may have pivoted to AI. Lots of stuff is going on in batteries, but mostly by billion-dollar companies in China quietly building plants and mostly shutting up about what's going on inside.)

  • baalimago5 hours ago
    They should do a study on this.
  • zeofig3 hours ago
    This is because peer review has become a bullshit mill and AI is good at churning through/out bullshit.
  • bpodgursky5 hours ago
    Journals need to find a way to give guidance on what is and isn't appropriate and to let reviewers explain how they used AI tools... because like, you aren't going to nag people out of using AI to do UNPAID work 90% faster and produce results that are 90+th percentile of review quality (let's be real, there are a lot of bad flesh and blood reviewers).
  • N_Lens6 hours ago
    News: Half of researchers lied on this survey