AI Slop vs. OSS Security(devansh.bearblog.dev)

191 pointsby mooreds4 days ago30 comments

goalieca4 days ago
> This is the fundamental problem: AI can generate the form of security research without the substance.
I think this is the fundamental problem of LLMs in general. Some of the time looks just enough right to seem legitimate. Luckily the rest of the time it doesn’t.
- Sharlin4 days ago
  Unfortunately, to a majority of the population approximately 100% of LLM output seems entirely legitimate.
  - dafelst4 days ago
    I agree wholeheartedly, and this is the core problem - many of the people evangelizing LLMs for a particular task (especially investors and AI gold rush "entrepreneurs") do not have enough expertise in that particular field to effectively evaluate the quality of the output. It sure looks the part though, and for those with a shallow understanding, it is often enough.
  - captainkrtek4 days ago
    That, combined with the confidence any of its output is communicated back to the user.
  - acchow3 days ago
    I’ve been trying ChatGPT for transit directions on Shanghai’s metro and it has been absolutely terrible. Hallucinating connections and routes.
    But all of it’s responses definitely seem convincing (as it has been trained to do)
  - jimbokun4 days ago
    Except for things they happen to know something about.
    gdulli4 days ago
    Unfortunately, too few people are making the obvious leap from "LLMs aren't great for topics I have expertise in" to "maybe that means LLMs aren't actually great for the other topics either."
    dtech4 days ago
    We as humans aren't good at it. Before AI it was already coined as the "Gell-Mann Amnesia" effect
    3 days ago
    undefined
    4 days ago
    undefined
  - Rebuff50074 days ago
    And a sizable portion of the population believe vaccines don't work and/or have 5G!
    I feel like I'm watching a tsunami about to hit while literally already drowning from a different tsunami.
- godelski3 days ago
  There's another term for this that I think should catch on: Cargo Culting
  Everything looks right but misses the underlying details that actually matter.
  There is a larger problem that I think we like to pretend that everything is so simple you don't need expertise. This is especially bad in our CS communities where there's a tendency of thinking intelligence in one domain cleanly transfers to others. In this respect I generally advise people not to first ask LLMs what they don't know but what they are experts in. That way they can properly evaluate their responses. Least we all fall for Murry Gelmann amnesia lol
  https://en.wikipedia.org/wiki/Cargo_cult
- lukev4 days ago
  "Form without substance" is quite possibly the most accurate way to describe LLMs in general. Extremely powerful (and extremely useful) pattern finders and replicators... and little else.
  - 8note3 days ago
    its certianly why they pair well with tool calls - substance without form
- jsheard4 days ago
  The other fundamental problem is that to a grifter, it's not a fundamental problem for the output to be plausible but often wrong. Plausible is all they need.
  - gdulli4 days ago
    That's an important one. Another fundamental problem with plausible output is that it makes a manager, or a junior, or some other unsophisticated end user think the technology is almost there, and a reliably correct version is just around the corner.
  - jacquesm4 days ago
    Indeed. The elderly in my family are seeing a substantial uptick of AI generated stuff that looks extremely plausible. Fortunately they're old but not stupid, so far nobody has fallen for any of these but I have to admit: they look good enough to pass a first casual inspection.
- fisf4 days ago
  It's also a fundamental problem of security research. Lot's of irrelevant, highly contextual "vulnerabilities", submitted to farm internet points (driven by a broken cve system). AI only amplifies this.
- seanmcdirmid3 days ago
  No, it is the problem of any ceremonial barrier in existence. If substance wasn’t required in the first place, people were faking it already.
- everybodyknows4 days ago
  A parallel to AI-slop has existed for generations now out here in meatspace: Administrative/legal people on the periphery of a technical field (though possibly alas, at the top of the org's command chain) who do not at all understand what technical terms signify, but having seen hundreds of sentences produced by real experts, become able to themselves string together plausible-looking assertions.
  - breakpointalpha4 days ago
    We call these people "compliance flies".
    Any large enough organization gathers them en mass to cloud real development work with "compliance."
- spwa44 days ago
  But the problem is fundamentally slop, and grifters. It is possible to generate valid security bug reports with an AI agent, and there certainly is value in that. I'd even say a lot of value.
  But of course producing fake ones is far easier and cheaper.
Sharlin4 days ago
> The incentive is to submit as many reports as possible and see what sticks, because even a 5% hit rate on a hundred submissions is better than the effort of manually verifying five findings.
As I just commented in the other AI trust thread on the front page, this dynamic is funnily enough what any woman using online dating services has always been very familiar with. With the exact same tragedy of the commons that results. Except for the important difference that terrible profiles and intro messages have traditionally usually been very short and easily red-flagged. But that is, of course, now also changing or already changed due to LLMs.
(Someone I follow on a certain social media platform just remarked that she got no less than fifty messages within a single hour of marking herself as "single". And she's just some average person, not a "star" of any sort.)
Jean-Papoulos4 days ago
The solution isn't to block aggressively or to allow everything, but to prioritize. Put accounts older than the AI boom at the top, and allow them to give "referrals", ie stake a part of their own credibility to boost another account on the priority ladder.
Referral systems are very efficient at filtering noise.
- skydhash4 days ago
  It only works when it’s a true stake. Like loosing privileges when a referral is a dunce. The downside is tribalism.
- bluGill4 days ago
  Who will refer me when I find my first issue. this might be a critical thing that should be top priority, but because nobody knows me and I don't know anyone else you just put me to the bottom. Meanwhile someone evil figures out the same problem and a week latter you discover the "zero day" that was causing so many problems for you was in fact reported earlier.
  There is also the possibility that in trying to get someone to refer me I give enough details that the trusted person can submit instead of me and claim credit.
- postexitus3 days ago
  Works only for old projects. What happens with new ones?
- nixpulvis4 days ago
  This is very short sighted.
cheesecompiler4 days ago
> The problem isn't knowledge—it's incentives.
> When you're volunteering out of love in a market society, you're setting yourself up to be exploited.
I sound like a broken record but there's unifying causes to most issues I observe in the world.
None of the proposed solutions address the cause (and they can't of course): public scrutiny doesn't do anything if account creation is zero-effort; monetary penalization will kill the submissions entirely.
In a perfect world OSS maintainers would get paid properly. But, we've been doing this since the 90s, and all that's happened is OSS got deployed by private companies, concentrating the wealth and the economic benefits. When every hour is paid labour, you pick the AWS Kafka over spinning up your own cluster, or you run Linux in the cloud instead of your own metal. This will always keep happening so long as the incentives are what they are and survival hinges on capital. That people still put in their free time speaks to the beautiful nature of humans, but it's in spite of the current systems.
supriyo-biswas4 days ago
Ironically, even this piece is significantly AI-generated:
- Primarily relies on a single piece of evidence from the curl project, and expands it into multiple paragraphs
- "But here's the gut punch:", "You're not building ... You're addressing ...", "This is the fundamental problem:" and so many other instances of Linkedin-esque writing.
- The listicle under "What Might Actually Work"
- thadt4 days ago
  Yesterday my wife burst into my office: "You used AI to generate that (podcast) episode summary, we don't sound like that!"
  In point of fact, I had not.
  After the security reporting issue, the next problem on the list is "trust in other people's writing".
  - bob10294 days ago
    I think one potential downside of using LLMs or exposing yourself to their generated content is that you may subconsciously adopt their quirks over time. Even if you aren't actively using AI for a particular task, prior exposure to their outputs could be biasing your thoughts.
    This has additional layers to it as well. For example, I actively avoid using em dash or anything that resembles it right now. If I had no exposure to the drama around AI, I wouldn't even be thinking about this. I am constraining my writing simply to avoid the implication.
    jerf4 days ago
    I didn't make heavy use of it, but I did sometimes use "It's not X, it's Y" or some closely related variant. I've had to strike that from my writing, because whether or not it makes anyone else cringe, it's making me cringe now. My usage doesn't even match the ones the LLMs favor, my X & Y were typically full clauses with many words rather than the LLM's use of short, punchy X & Ys... but still. Close enough. Can't write it anymore.
    I'm still using bullet lists sometimes, as they have their place, and I'm hoping LLMs don't totally nuke them.
    ziml773 days ago
    I wasn't too averse to using the construction until I saw the latest Eddy Burback video where he had an AI encouraging the delusions he was faking for the sake of the video. The sheer number of times the AI said "it's not just X, it's Y" absolutely broke any willingness I had to ever say anything like that again.
    code514 days ago
    Exactly and this is hell for programming.
    You don't know whose style the LLM would pick for that particular prompt and project. You might end up with Carmack or maybe that buggy, test-failing piece of junk project on Github.
    Taek4 days ago
    You can tell it who's style to copy, it's actually decent at following instructions like that.
    noduerme4 days ago
    It's not bad at following my own style. I have longstanding quirks like naming any string that will end up in a DB query with a "q_" in front of the variable name, and shockingly Claude picks up on those and mimicks them. Wouldn't trust it to write anything without thorough review, but it's great at syntax.
    dingnuts4 days ago
    this isn't shocking, they are very good at repeating patterns in the immediate context. they're just not very good at anything else. your quirk is part of the immediate pattern
    alchemism4 days ago
    My first experiments with LLM chat was to ask to produce text mimicking the style of a distinct, well-known author. It was also quite good at producing hybrid fusions of unique fictional styles, A + B = AB.
    taneq3 days ago
    Can you just tell it it’s Carmack? :P
    imiric4 days ago
    Isn't the alternative far more likely? These tools were trained on the way people write in certain settings, which includes a lot of curated technical articles like this one, and we're seeing that echoed in their output.
    There's no "LLM style". There's "human style mimicked by LLMs". If they default to a specific style, then that's on the human user who chooses to go with it, or, likely, doesn't care. They could just as well make it output text in the style of Shakespeare or a pirate, eschew emojis and bulleted lists, etc.
    If you're finding yourself influenced by LLMs—don't be. Here's why:
    • It doesn't matter.
    • Keep whatever style you had before LLMs.
    :tada:
    jerf4 days ago
    There is no "LLM style".
    There is a "default LLM style", which is why I call it that. Or technically, one per LLM, but they seem to have converged pretty hard since they're all convergently evolving in the same environment.
    It's trivial to prompt it out of that style. Word about how to do it and that you should do it has gotten around in the academic world where the incentives to not be caught are high. So I don't call it "the LLM style". But if you don't prompt for anything in particular, yes, there is a very very strong "default LLM style".
    dingnuts4 days ago
    the default LLM style is corporate voice lol
    keybored4 days ago
    Out of the mountains of content, one single symbol would provoke the ire of non-ascii reactionaries.
    https://news.ycombinator.com/item?id=44072922
    https://news.ycombinator.com/item?id=45766969
    https://news.ycombinator.com/item?id=45073287
    riskable4 days ago
    I suddenly have the urge to reply to this with a bulleted list where the bullets are emoji.
  - jobigoud4 days ago
    Already a big problem in art, people go on witch hunt over what they think are signs of AI use.
    It's sad because people that are ok with AI art are still enjoying the human art just the same. Somehow their visceral hate of AI-art managed to ruin human art for themselves as well.
    whywhywhywhy4 days ago
    This ultimately will only ever harm human artists accused of it. AI artists can just say “yeah, I did, so what” defusing the criticism.
    robby_w_g4 days ago
    If there wasn't global-scale theft of art and content or if LLMs could produce something better than an inferior facsimile, I bet there would be less backlash.
    But instead we had a 'non-profit' called 'Open'AI that irresponsibly unleashed this technology on the world and lied about its capabilities with no care of how it would affect the average person.
    dingnuts4 days ago
    AI visual output mimicks art sufficiently that it is now more difficult to identify authenticity and humanity, which are important for the human connection audiences want from art.
    AI outputs mimicking art rob audiences of the ability to appreciate art on its own in the wild without further markers of authenticity, which steals joy from a whole generation of digital artists that have grown up sharing their creativity with each other
    If you lack the empathy to understand why AI art-like outputs are abhorrent, I hope someone wastes a significant portion of your near future with generated meaningless material presented to you as something that is valuable and was time consuming to make, and you gain nothing from it, so that you can understand the problem for yourself first hand.
  - acedTrex4 days ago
    I blogged about this fundamental demolition of trust a few months ago.
    HN discussed it here https://news.ycombinator.com/item?id=44384610
    The responses were a surprisingly mixed bag. What I thought was a very common sense observation had some heavy detractors in those threads.
    gdulli4 days ago
    You're on a forum full of people trying to profit from this tech. In that context the pushback is obvious.
  - riskable4 days ago
    Exposure to AI leads to people writing like AI. Just like when you're hanging out in certain circles, you start to talk like those people. It's human nature.
- Cthulhu_4 days ago
  Using the word is implies you have definite, conclusive proof, but the only one is a number of phrases that you believe are tells for AI generated stuff, but is it really or are you only now paying extra attention to it?
  It's better to stay neutral and say you suspect it may be AI generated.
  And for everyone else, responsible disclosure of using AI tools to write stuff would be appreciated.
  (this comment did not involve AI. I don't know how to write an emdash)
  - ndiddy4 days ago
    > Using the word is implies you have definite, conclusive proof, but the only one is a number of phrases that you believe are tells for AI generated stuff, but is it really or are you only now paying extra attention to it? It's better to stay neutral and say you suspect it may be AI generated.
    Literally the first two sentences on the linked article:
    > Disclosure: Certain sections of this content were grammatically refined/updated using AI assistance, as English is not my first language. Quite ironic, I know, given the subject being discussed.
    Personally, I've read enough AI generated SEO spam that anything with the AI voice comes off as being inauthentic and spammy. I would much rather read something with the mistakes a non-native English speaker would make than something AI written/edited.
  - __zack4 days ago
    It looks like the author did in fact update their article to disclose AI use:
    > Certain sections of this content were grammatically refined/updated using AI assistance
    > I don't know how to write an emdash
    Same here, and at this point I don’t think I will ever learn
- cheesecompiler4 days ago
  My least favourite part of this timeline: anyone who writes well gets classified as AI. Some of us press Option+- to insert an em dash and have been for years.
  - progbits4 days ago
    GP did not use emdash as evidence of AI but rather the verbose style with little signal/noise ratio and the typical phrases most LLMs like to use.
  - watwut4 days ago
    AI does not write well. People who write as AI are not the people writing well.
  - riskable4 days ago
    I've been using the compose key for over a decade but only recently discovered you can type compose <dash><dash><dash> to produce an em dash. Before then, I always just typed a double-dash (--) to simulate it.
- 4 days ago
  undefined
- Der_Einzige4 days ago
  Pretty sad that they didn't even try our antislop sampler. Slop in LLM outputs is a choice: https://arxiv.org/abs/2510.15061
- nixpulvis4 days ago
  I'm so sick of people claiming things sound like AI, when it's so easily not true.
  Between this and the flip side of AI-slop it's getting really frustrating out here online.
  - progbits4 days ago
    From the article itself (presumably added later):
    Disclosure: Certain sections of this content were grammatically refined/updated using AI assistance
  - chemotaxis4 days ago
    I think people sometimes jump the gun over small things (emdashes, etc). That said, in this instance, your anger is very likely misdirected. The article is almost certainly substantially AI-generated.
    nixpulvis4 days ago
    I mean, I guess I just don't care as much, as long as the author proof read it and it says what they want it to say. Just like I don't care if an AI submits a vulnerability report that is actually real.
  - bluGill4 days ago
    Problem is AI sounds like normal people. There are a few clues, but good writers have always sounded like AI - except good writers are not making things up.
    dec0dedab0de4 days ago
    AI text often sounds like Corporate/PR nonsense to me. I was already convinced that people speaking like that were robots 30 years ago.
    bluGill3 days ago
    True, but at least 30 years ago once you filtered through you could get what you needed.
    Today there are scams that look just like real companies trying to get you to buy from them instead. Who knows what happens if you put your money down. (Scams were of course always a problem, but there is much less cost to create a scam)
- 4 days ago
  undefined
- 4 days ago
  undefined
nixpulvis4 days ago
I believe "Economic Friction" is truly the best and potentially strongest way to deal with this.
It's good for the site collecting the fee, it's good for the projects being reported on and it doesn't negatively affect valid reports.
It does exactly what we want by disincentivizing bad reports, either AI generated or not.
Cthulhu_4 days ago
Very blunt maybe, but if individuals try to get internet points by doing frivolous security reports under their own name, should they be loudly pinned to a Wall of Shame to discourage the practice?
samuelknight4 days ago
You can address the issue by putting the report and the code base in a sandbox with an agent that tries to reproduce it. If it can't reproduce it then that should be a strike against the reporter. OSS projects should absolutely ban accounts that repetitively create reports that are of such low quality that it can't be recreated. IMO the Hacker One reputation mechanism is a good idea because it incentives users who operate in good faith and can serially produce findings.
- mayhemducks4 days ago
  And who pays for the tokens?
  - noduerme4 days ago
    Sandbox a third AI that just bets on AI stocks and crypto. Add a fourth AI to check the third AI's bets, and a fifth one to go on forums and pump the relevant equities. A sixth AI can short sell when the fourth AI gets overheated.
beeburrt4 days ago
I didn't realize how bleak the future looks, wrt CVE infastructure both MITRE and at National Vuln. Database.
What do other countries do for their stuff like this?
dvt4 days ago
> Requiring technical evidence such as screencasts showing reproducibility, integration or unit tests demonstrating the fault, or complete reproduction steps with logs and source code makes it much harder to submit slop.
If this isn't already a requirement, I'm not sure I understand what even non-AI-generated reports look like. Isn't the bare-minimum of CVE reporting a minimally reproducible example? Like, even if you find some function, that for example doesn't do bounds-checking on some array, you can trivially write some unit testing code that's able to break it.
- bawolff4 days ago
  As someone who worked on the recieving end of security reports, often not. They can be surprisingly poorly written.
  You sort of want to reject them all, but ocassionally a gem gets submitted which makes you reluctant.
  For example, years ago i was responsible for triaging bug bounty reports at a SaaS company i worked at at the time. One of the most interesting reports was that someone found a way to bypass our oauth thing by using a bug in safari that allowed them to bypass most oauth forms. The report was barely understandable written in broken english. The impression i got was they tried to send it to apple but apple ignored them. We ended up rewriting the report and submitting it to apple on there behalf (we made sure the reporter got all credit).
  If we ignored poorly written reports we would have missed that. Is it worth it though? I dont know.
  - fisf4 days ago
    I do not understand. If auth is bypassable, this is not a browser issue, right?
    bawolff3 days ago
    It was a long time ago so i might be misremembering, but i think the idea was that safari would leak the target of redirects cross domain, which allowed the attacker to capture some of the oauth tokens.
    So safari was not following the web browser specs in a way that compromised oauth in a common mode of implementation.
  - hshdhdhehd4 days ago
    In the AI age I'd prefer poorly written reports in broken English. Just as long as that doesnt become a known bypass and so the AI is instructed to sound broken.
- noirscape4 days ago
  The problem that is that a lot of CVEs often don't represent "real" vulnerabilities, but merely theoretical ones that could hypothetically be combined to make a real exploit.
  Regex exploitation is the forever example to bring up here, as it's generally the main reason that "autofail the CI system the moment an auditing command fails" doesn't work on certain codebases. The reason this happens is because it's trivial to make a string that can waste significant resources to try and do a regex match against, and the moment you have a function that accepts a user-supplied regex pattern, that's suddenly an exploit... which gets a CVE. A lot of projects then have CVEs filed against them because internal functions rely on Regex calls as arguments, even if they're in code the user is flat-out never going to be able interact with (ie. Several dozen layers deep in framework soup there's a regex call somewhere, in a way the user won't be able to access unless a developer several layers up starts breaking the framework they're using in really weird ways on purpose).
  The CVE system is just completely broken and barely serves as an indicator of much of anything really. The approval system from what I can tell favors acceptance over rejection, since the people reviewing the initial CVE filing aren't the same people that actively investigate if the CVE is bogus or not and the incentive for the CVE system is literally to encourage companies to give a shit about software security (at the same time, this fact is also often exploited to create beg bounties). CVEs have been filed against software for what amounts to "a computer allows a user to do things on it" even before AI slop made everything worse; the system was questionable in quality 7 years ago at the very least, and is even worse these days.
  The only indicator it really gives is that a real security exploit can feel more legitimate if it gets a CVE assigned to it.
  - wahnfrieden4 days ago
    Aren't there regex libraries that aren't susceptible to that
theptip4 days ago
It seems pretty obvious that the bar needs to be raised.
> A security report lands in your inbox. It claims there's a buffer overflow in a specific function. The report is well-formatted, includes CVE-style nomenclature, and uses appropriate technical language.
Given how easy it is to generate a POC these days, I wonder if HackerOne needs to be pivoting hard into scaffolding to help bug hunters prove their vulns.
- Claude skills/MCP for OSS projects
- Attested logging/monitoring for API investigations (eg hosted BURP)
hacb4 days ago
I don't know why submitting a vulnerability on those platforms is still free. If reporters had to pay a little amount of money (let's say, 20-50$, or indexed to the maximum gain of a vulnerability in a given category) when submitting their report, maybe those would be of better quality.
I know that this poses new problems (some people can't afford to spend this money), but it would be better than just wasting people's time.
- Chilinot4 days ago
  This point is brought up and discussed in the linked article.
wwfn4 days ago
Wealth generated on top of underpaid labor is a reoccurring theme -- and in this case maybe surprisingly exacerbated by LLMs.
Would this be different if the underlying code had a viral license? If google's infrastructure was built on a GPL'ed libcurl [0], would they have investment in the code/a team with resources to evaluate security reports (slop or otherwise)? Ditto for libxml.
Does GPL help the linux kernel get investment from it's corporate users?
[0] Perhaps an impossible hypothetical. Would google have skipped over the imaginary GPL'ed libcurl or libxml for a more permissively licensed library? And even if they didn't, would a big company's involvement in an openly developed ecosystem create asymmetric funding/goals, a la XMPP or Nix?
- big-and-small4 days ago
  Copyleft licenses are made to support freedom for everyone and particularly end-users. They only limit freedom of developers / maintainers to exploit the code and users.
  > Does GPL help the linux kernel get investment from it's corporate users?
  GPL has helped "linux kernel the project" greatly, but companies invest in it out of their self-interest. They want to benefit from upstream improvements and playing nicely by upstreaming changes is just much cheaper than maintaining own kernel fork.
  On other side you have companies like Sony that used BSD OS code for their game consoles for decades and contributed shit.
  So... Two unrelated things.
  - wwfn4 days ago
    I would have thought supporting libcurl and libxml would also be in a company's self-interest. Is that companies do this for GPL'ed linux kernel but not BSD evidence that strong copyleft licensing limits the extent to which OSS projects are exploited/under-resourced?
    big-and-small4 days ago
    > I would have thought supporting libcurl and libxml would also be in a company's self-interest.
    Unfortunately majority of companies don't have something special they really need to add to cURL. They okay using it as is - so they have no reason to pay salary to cURL developers regardless of licensing.
    Yes they want it to be secure, but as always nobody except few very large orgs care about security for real.
    > Is that companies do this for GPL'ed linux kernel but not BSD evidence that strong copyleft licensing limits the extent to which OSS projects are exploited/under-resourced?
    It certainly helped with "under-resourced" part. Whatever you considered "exploited" is up to discussion. From project perspective ofc copyleft licensing benefited the project.
    Linus Torvalds end up with a good amount of publicity and is now somewhat well set-off, but almost all other kernel developers live in obscurity earning somewhat average salaries. I pretty sure we can all agree that Linux Kernel made a massive positive impact on whole humanity and compared to that payoff to stakeholders is rather small IMO.
ok1234564 days ago
Require a docker/script that provides for the necessary conditions for the exploit, along with a POC. If something is impossible to provide a POC for, as it's more of a speculative attack, require vetting like arxiv.
Most people's initial contributions are going to be more concrete exploits.
heymax0544 days ago
Semi-related: I posted this yesterday: https://news.ycombinator.com/item?id=45828917
Different models perform differently when it comes to catching/fixing security vulnerabilities.
breakpointalpha4 days ago
Open Source maintainers should require a $10 deposit to submit a pull request.
Even more so when there is a bounty payout.
Refundable if the PR/report is accepted.
DrewADesign4 days ago
> By flooding bug bounty programs with AI-generated reports, they feel productive and entrepreneurial. Some genuinely believe the AI has found something real. Others know it's questionable but figure they'll let the maintainers sort it out.
I’m not saying that AI hasn’t already given us useful things, but this is a symptom of one very negative change that’s drowned a lot of the positive out for many people: the competence gap used to be an automatic barrier for many things. For example, to get hired as a freelance developer, you had to have at least cargo-culted something together once or twice, and even if you were way overconfident in your capability, you probably knew you weren’t the real thing. However, the AI tools industry essentially markets free competence, and out of context for any given topic, that little disclaimer is meaningless. It’s essentially given people climbing the Dunning-Krueger Mt. Stupid the agency to produce garbage at a damaging volume that’s too plausible looking for them (or laypeople, for that matter) to realize it’s garbage. I also think somewhat nihilist people prone to get-rich-quick schemes (e.g. drop-shipping, NFTs) play these workflows like lottery tickets while remaining deliberately ignorant of their dubious value.
pksebben4 days ago
> The model has no concept of truth—only of plausibility.
This is such an important problem to solve, and it feels soluble. Perhaps a layer with heavily biased weights, trained on carefully curated definitional data. If we could train in a sense of truth - even a small one - many of the hallucinatory patterns disappear.
Hats off to the curl maintainers. You are the xkcd jenga block at the base.
- jcattle4 days ago
  I am assuming that millions of dollars have already been spent trying to get LLMs to hallucinate less.
  Even if Problems feel soluble, they often aren't. You might have to invent an entirely new paradigm of text generation to solve the hallucination problem. Or it could be the Collatz Conjecture of LLMs, that it "feels" so possible, but you never really get there.
  - big-and-small4 days ago
    Nuclear fusion was always 30 years away (c)
    quikoa4 days ago
    It would be nice if nuclear fusion had the AI budget.
    Cthulhu_4 days ago
    Fusion will at best have a few dozen sales once it's commercially viable and then take decades to realise, but you can sell AI stuff to millions of customers for $20 / month each and do it today.
- pjc504 days ago
  The "fact database" is the old AI solution, e.g. Cycorp; it doesn't quite work either. Knowing what is true is a really hard, unsolved problem in philosophy, see e.g. https://en.wikipedia.org/wiki/Gettier_problem . The secret to modern AI is just to skip that and replace unsolvable epistemology with "LGTM", then sell it to investors.
  - pksebben4 days ago
    There are some things that we can define as "definitely true as close as makes no difference" in the context of an LLM:
    - dictionary definitions - stable apis for specific versions of software - mathematical proofs - anything else that is true by definition rather than evidence-based
    (i realize that some of these are not actually as stable over time as they might seem, but they ought to do good enough with the pace that we train new models at).
    If you even just had an MOE component whose only job was verifying validity against this dataset in chain-of-thought I bet you'd get some mileage out of it.
- wongarsu4 days ago
  Truth comes from being able to test your assertions. Without that they remain in the realm of plausibility. You can't get from plausibility to truth with better training data, you need to give LLMs better tools to test the truth of their plausible statements before spewing them to the user (and train the models to use them, obviously. But that's not the hard part).
mmsc4 days ago
>First, the typical AI-powered reporter, especially one just pasting GPT output into a submission form, neither knows enough about the actual codebase being examined nor understands the security implications well enough to provide insight that projects need.
How ironic, considering every time I've reported a complicated issue to a program on HackerOne, the triggers have completely rejected them because they do not understand the complicated codebase that they are triaging for.
Also the curl examples given in TFA completely ignore recent developments, where curl's maintainers welcomed and fixed literally hundred of AI-found bugs: https://www.theregister.com/2025/10/02/curl_project_swamped_...
stackedinserter4 days ago
So use AI to check AI reports, problem solved.
- aidenn04 days ago
  I cannot tell if you are being serious or sarcastic.
  - 8note3 days ago
    i dunno about problem solved, but id expect to be able to put together an agent that helps requesters format their reports well, and that could run the reproduction steps in a sandbox.
  - stackedinserter4 days ago
    50/50
noduerme4 days ago
>> some actors with misaligned incentives interpret high submission volume as achievement
Welcome to the Internet.
danieltk764 days ago
At Vulnetic ai we deal with this by having a separate validator agent attack the vulnerability reported using a novel perspective separate of the hacking agent.
fleventynine4 days ago
"My attention is a limited resource. In order to prove that you're a serious human, please donate exactly $157.42 to the Rust maintainers fund and paste the receipt link here".
stuxf4 days ago
I agree with a lot of things said in this article, I also think some sort of centralized trust system for OSS bug bounty would be a really good solution to this problem
> The downside is that it makes it harder for new researchers to enter the field, and it risks creating an insider club.
I also think this concern can be largely mitigated or reduced to a nonissue. New researchers would have a trust score of zero for example, but people who consistently submit AI slop will have a very low score and can be filtered out fairly easily.
ceroxylon4 days ago
Surely there can be a workflow created to "fight fire with fire" and have an AI that reads reports, trained on the code base with explicit instructions to verify all of the telltale signs of slop...? If AI services can handle the nightmare of parsing emails and understanding the psychology of phishing, I am optimistic it can be done for OSS reports.
It doesn't have to make the final judgement, just some sort of filter that automatically flags things like function calls that don't exist in the code.
progbits4 days ago
```
  Certain sections of this content were grammatically refined/updated using AI assistance, as English is not my first language.
```
OP: I sympathize, but I would much rather read your original text, with typos and grammatical errors. By feeding it through the LLM you fix issues that are not really important but remove your own voice and get a bland slop identical to 90% of these slopblogs (which your's isn't!)
samlinnfer4 days ago
Just add a country IP ban, we all know who is submitting these reports. Remember Hacktoberfest?
- Cthulhu_4 days ago
  That's a game of whack-a-mole, they'd just use a VPN. And besides, no we don't "all know" who is submitting these reports, that's a generalization.
  - samlinnfer4 days ago
    It doesn’t have to be perfect, it just has add friction so they stop spamming.
- mschuster914 days ago
  > Just add a country IP ban, we all know who is submitting these reports.
  As much as I'd like to see Russia, China and India disconnected off of the wide Internet until they clean up shop with abusive actors, the Hacktoberfest stuff you're likely referring to doesn't have anything to do with your implication - that was just a chance at a free t-shirt [1] that caused all the noise.
  In ye olde times, you'd need to take care how you behaved in public because pulling off a stunt like that could reasonably lead to your company going out of business - but even a "small" company like DO is too big to fail from FAFO, much less ultra large corporations like Google that just run on sheer moat. IMHO, that is where we have to start - break up the giants, maybe that's enough of a warning signal to also alert "smaller" large companies to behave like citizens again.
  [1] https://domenic.me/hacktoberfest/
  - samlinnfer4 days ago
    IP banning India would have solved hacktoberfest spam 100% and maybe 99% of fake AI bug reports.
Maro4 days ago
Manufacturing vulnerability submissions that look like real vulnerability submissions, but the vulnerability isn't there and the submitter doesn't understand what it's saying.
It's a cargo cult. Maybe the airplanes will land and bring the goodies!
https://en.wikipedia.org/wiki/Cargo_cult
bklosky4 days ago
Companies soliciting big bounties should charge a fee to submit, making slop costly. Steam, the video game publisher, does this. Developers need to pay a fee to list their game, and if their game sells sufficient volume, the fee is returned. Creating a separating equilibrium here is not hard, the hand wringing is weird to me.
- 8note3 days ago
  adding cost to accounts is handy, but its not like that has solved steam's scam/slop problems.
  fake games by fake studios played by fake players is still a thing
HiPhish4 days ago
We need a name for the type of attack where maintainers get overwhelmed by a massive amount of low-effort and false bug reports and contributions. I propose "Slop Overflow". That's right, you heard it here first, I came up with it :)