I haven't tested this again on the latest models though, so not sure if there's been an improvement.
A lot of people don't realize this because the work that they are having the AI do does not need to be either true or false. It just has to output media that seems like it fits. The system probably took many shortcuts to keep the resource use low while outputting something plausible but false.
And frankly this is sort of fine as long as you know what it's doing and what the limitations are. Hypothetically if you broke up the task into multiple steps that the system can actually ingest properly it might reduce the time that the task took overall, maybe even significantly, but not down to one prompt.
(I'm not saying "you're holding it wrong", I'm asking "how were you holding it"?
Did you tell it to pull in the sources, did it do so automatically, or were you working from just the base weights? )
Just as an aside jumping off this sentence from the article, I am far less tolerant of the practice of naming countries of origin or general locales rather than specific organizations in headlines and stories.
Name the organization, and if you want to in the body, name where they’re from/located/operating as it pertains to the organization. For that matter, if you can offer information on the specific locale (Sweden is a big place after all), you should also do that unless it really is something more national/international.
"The US did X" The president? The senate? A federal, or municipal body? etc..
But there's arguments against, if "The US bans automatic rifles" then to some extent it's clear what part of the US did it, to some other extent, it doesn't matter, and to some other extent, the part of the country that did the thing represents the whole country by corporization or democritazation.
In History it's very common to say Country did thing, "Germany invaded Poland", "Argentina signed the Roca-Runceman pact" and so on... Possibly because (in addition to the reasons stated above) information needs to be compressed more for the past, we have less space and priority for details of the past than we do for the present, a kind of cold-hot storage mechanism
In addition, most mainstream[1] journalists cite sources in a more liberal way than a scientist should so the source might not say what the journalist reports. The Atlantic has a bit on Waymo’s poor detection of minorities[2], e.g.
0: https://wiki.roshangeorge.dev/w/Blog/2026-01-17/Citogenesis
1: Some independent reporters like Matt Yglesias are more rigorous, though their direct reporting can still be bogus
What was/is being done about it?
Just this week I read a "study" because someone claimed on social media that it was made by (Public, famous) Unis A, B and C and reported as an effect an increase in 30% of revenue for the companies that participated in the experiment.
The "study" was commissioned by an interest group (bad sign). It was conducted by people associated with said unis (I didn't check their credentials), and it did report in its headline the 30% revenue increase.
Said study was about an experiment that ran for a few months. Within these months, the revenue was flat (which could be considered good enough for the cause). The 30% was the revenue of this period against the same period the previous year. So somehow the experiment affected the companies retroactively! Not to mention that the researchers were able to find a group of companies that were, on average, growing 30% YoY. Surprising indeed.
So even if you check your sources, it may still be bullshit science or bullshit reporting from well-credentialed sources.
"Find me research on code reviews, their size, and quality" would give you more than enough reading. Yet, if you start with a claim, like "Longer PRs mean worse defect detection," the relevant data points fall to few enough for AI to start hallucinating.
You get "something, something, PR length, defect detection, IDK, I don't read research papers." Such output is fine as long as the author cares to validate it.
Skip the second step, and you might be good if you ask about something generic, like "What's the Slack story?" or "How did Blockbuster go bust?" Ask about some specific details, though, and you're bound to end up with made-up stuff that sounds just about right, while it's actually wrong.
"Follow each link in this document. Read each link's contents against the contents in this document. Create a report: for each link list a working hyperlink, whether it exists, what claim it supports, whether it supports or fails to support it, and why"
If it returns a report claiming all correct? That's promising, but human verification is important. You've got a list of hyperlinks, and a list of claims; so you can click each with middle-mouse, Ctrl-F 'till you find the point, and close the tab when you do.
If you find any discrepancies ? Your initial prompt was malformed and/or you picked the wrong LLM, the wrong human, or possibly all three. Whatever the way, the results are built on quicksand; you'll need to start over.
If no sources are provided? Well now: "If there ain't no sources it never happened."
Compare double-entry bookkeeping. It needs to all add up. If you're 1 cent off, that means something is broken. Idem if a single reference is off, it polluted the context. (This works for human-generated and hybrid documents too. Polluted reasoning is polluted reasoning. The process is what counts.)
Gemini: The article focuses on the environmental and human labor costs of scaling Artificial Intelligence, specifically focusing on water usage, electricity, and "ghost work."
Which is hilarious, since the article doesn't even mention the words "water" or "electricity." Gemini remains unfazed, reporting the links that are not in the article (some don't exist at all) to make the final ruling: "The Tech Trenches document is highly accurate in its citations."
Now, I know. Had I used Claude Code with relevant skills, it would have done better. But would it be good?
* https://gemini.google.com/share/6bd33176b27c
Right, so https://techtrenches.dev/p/the-human-cost-of-10x-how-ai-is-p... is actually a substack, gemini is blocked from accessing it, and is bouncing off and hallucinating instead. Ok, that's an actual bug, that should not lead to the model starting to hallucinate. Imo the correct response should have been to fail loudly; which would have been a verification signal of its own.
ps: See also: https://news.ycombinator.com/item?id=48087485 ... I'm starting to think of it as "english is a new scripting language". Clearly the downside is that certain "runtime environments" are not compatible. %-/
And now it errors out on gemini.google.com. . This is like early days unix scripting; I didn't add the equivalent of "#!/bin/bash -euo pipefail" ; and I didn't catch it because most systems already include something like it in their ".bashrc" (system prompt or weights) anyway.
This is so frustrating. I'm sorry. It's like the 1980's 8 bit era again, some systems actually work, others are terrible, and I didn't realize it can be like this for some folks. You could come away with the conclusion that this whole "computer" thing is all just a fad that'll never amount to anything. (meanwhile , the program works perfectly on my own machine, right over here of course %-) )
It's more like a small script, and it's supposed to extract urls and generate a table.
Here's my result in Claude Web for comparison:
https://claude.ai/public/artifacts/d76936f2-c97b-4bff-9205-2...
Claude web finds a number of small discrepancies in the sources, which I manually crosschecked and seem consistent with a human mixing things up slightly.
+ I also tested in gemini 3 flash preview, which generates an actual table (twice). It doesn't flag any discrepancies, which is consistent with it being a weaker model. But the urls and claims are listed and line up, so you've got your verification table to work with. (it's a semantic formatting task, so that part would be hard to mess up)
+ Gemini 3.1 pro yields a fairly aggressive report. https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
+ ChatGPT free (specific model not listed) needed 2 tries, didn't properly follow the prompt even then. I guess I got what I paid for, and I needed to download; https://vps.kimbruning.nl/productivity/Ai%20Productivity%20A... (pdf), https://vps.kimbruning.nl/productivity/ai_productivity_artic... (md)
+ Kimi K2.6 instant: https://www.kimi.com/share/19e2cc40-d012-89bf-8000-00006267f...
+ Summary of results. https://claude.ai/public/artifacts/10a42111-a0ee-42f3-b6d2-a... All of the models extracted the URLs into a table just fine, and that part at least is a lot easier than writing a perl script used to be in the '90s ;-) . The first part is the important bit so you as a human can "check your fucking sources". The second part the models handle variously, each does find discrepancies. None of them find all of them, but that makes sense: this is a fairly polished piece and it ideally shouldn't have discrepancies at all to begin with.
So: it worked as a smoke check just fine in the above. Doing more than a quick smoke check obviously requires a somewhat more involved procedure.
So, LLMs are inherently bad at citing sources. A lot of effort has been put in to improve this behavior, but it's compensating for an inherent flaw.
If they weren't, ignore I said the following, and please tell me what else was going wrong (and with what models and harnesses!).
Models weights are like Wikipedia. A nice starting point, but should never be referenced directly. You need to have your agent actually go out onto the internet and do the research. Now the actual references will be in your agent's actual Context (memory), so then it'd at least be rather more surprising if they don't cite correctly.
I do realize there's still corner cases even in the best setups though; So a final crosscheck sweep is never not a good idea.
AI is quite good when grounded in a source.
I asked it a question I knew the answer to. It searched the web, and told me the opposite of the truth. (Not nonsense, but a logical inversion of the actual fact. A common failure mode with earlier LLMs.)
Puzzled, I checked the sources. It cited two. Both AI SEO slop.
Bizarrely, I Googled it myself and couldn't even find those pages on Google. Maybe it was using a different search engine? ;)
BTW, as critical as I can be to AI, using an argument that something didn't work 3 years ago, so it must be crap, doesn't work in this context. 3 years ago, AI could barely generate several lines of consistent code. Now, it generates working apps with a prompt (it's another discussion how good the code is, but still).
I guess 3 years ago, Gemini couldn't tell how many r's are in the word refrigerator.
Same for research. At some point, I switched from ChatGPT and Gemini to Perplexity as it promised AI-powered search. It worked visibly better. Until it didn't, as GPT and Gemini models made a leap.
Back to the point, as long as we understand that, for now, it's all just a probabilistic machine generating the most likely output, no one should expect bulletproof answers. Search was/is way more deterministic than LLMs.
They never did!
You'd assume an outgoing link from a CNN website has more credibility than one from an anonymous blog. That is, I reckon, still true. Although the credibility either link conveys is degrading. Again, it has been so since we started playing the game of SEO, yet AI-generated content in this context is basically a weapon of mass destruction. The deterioration has sped up dramatically.
I have found the single best way to avoid being pissed off by this shit is to just avoid Facebook. It dramatically cuts down on the amount I am exposed to.
I also run with adblockers, and consume news via brutalist.report, which also helps. (I avoid the Fox News section at the bottom)
I would say save your time and energy, and invest that into something else - forget all this social media.
The only obvious tell is the eyes don't track right. But once they fix that, it's really going to be hard to know.
All the comments are how great her advice is, etc.
Every video has a link to a book she "wrote" on amazon. I didn't waste my time trying to figure out what the scam is.
...If your feed wasn't full of goth girls selling the same exact items with the same exact story. It's a drop-shipping scam. The same kind as before, but now with AI goth girls.
Youtube shorts also seem OK for me, but of course definitely elevated compared to regular videos recommended to me.
Lastly:
> I would say save your time and energy, and invest that into something else - forget all this social media.
Agree. The promise of social media hasn't worked out. It was nice during the early Netflix streaming days, but has gotten progressively worse since then.
Actually I checked some sources, and I found some for three-legged crows:
https://en.wikipedia.org/wiki/Kojiki#The_Nakatsumaki_(%E4%B8...
https://en.wikipedia.org/wiki/Three-legged_crow#/media/File:...
https://en.wikipedia.org/wiki/File:Douze_emblemes_des_rites_...
https://en.wikipedia.org/wiki/File:Chengdu_2007_341.jpg
And by refuting this article, I thereby prove that which it sought to refute.
Which is to say: pretty good so far, in their case. For the future? Who knows. But they've done well up to now, at least.
Most of the 2010s section is about some drama about managers/hosters. The only thing that is even remotely applicable is they fact checking a satirical website, and needing to add a "Labeled Satire" tag to clear up confusion around the intensions of the linked site (as opposed to combating people who use the article as an argument without labeling it as such).
It was far more than drama about managers / hosters.
The only thing in the Wikipedia section you linked that's actually about the content on the Snopes website is the thing where they had to create a label for "Satire" after people got mad that a right-wing satire site (literal, actual, intentional "fake news" but for comedy purposes) had its knowingly-false stories labeled as "false".
(don't come at me with "it was bias"; I lived in the right-wing evangelical bubble through my whole childhood and young adulthood all the way through to the early 2010s; I know the boy-who-cried-persecution complex that lives there, and I also know what the Babylon Bee both was and is quite well; they were never trying to be a real news source, so getting mad that their comedic fiction was labeled "false" is really a stretch).
You haven't exactly shaken my faith in their ability to do the thing they do: find primary sources, present them, and give a verdict based on those primary sources.
What's amazing is that people think Snopes or other fact-checkers are automatically wrong. I assume this comes from people who make a habit of believing bullshit and can't handle being corrected.
https://fair.org/home/the-digital-media-oligarchy-who-owns-o...
https://foodbabe.com/do-you-trust-snopes-you-wont-after-read...
There's a plethora of examples on the internet of Snopes engaging in this type of behavior, if you're actually interested in learning about their problematic approach to their work.
That example seems extremely weak. Is that all there is?
I could give plenty of examples, but you'd likely turn around and visit your favorite re-affirming search engine or fact-checker to refute them. You're claiming that there is some arbiter of truth out there that is immune to bias, which is completely nonsensical. Bias creeps in everywhere because at the end of the day someone has to pay the salaries of these "fact-checkers" and the people paying them want to see a certain narrative upheld. Pretending that isn't the case is absurd.
The internet isn't some place where all perspectives on an issue are weighed against one another and the truthful ones are the ones that prevail and are returned by search engines. It was developed by DARPA and is effectively controlled by corporations like Google and Meta that partner with / receive funding from intelligence agencies and the military industrial complex.
I could share sources with you like these -
https://en.wikipedia.org/wiki/PolitiFact#Funding
https://en.wikipedia.org/wiki/FactCheck.org
https://en.wikipedia.org/wiki/Snopes#Funding
Which demonstrate that these organizations receive funding from the organizations and people whose claims they are propping up as truth, but you'd readily dismiss that connection. You're not here to debate in good faith, and that's quite obvious.
I already shared a link that proves the founder of Snopes is a liar and fraudster and engaged in rampant plagiarism. You're still going to trust the company he founded to tell you the truth about the world. Engaging with you in this back and forth is asinine. You're obviously not after the truth, otherwise you'd do your own research into Snopes and other fact-checking organizations, instead of asking me to do it for you. Have a nice day.
This is just epistemological nihilsm.
Maybe it should've been clear from your username, but it doesn't seem like you believe in the concept of truth itself in any useful way.
Consider: perhaps this is the product of your own biases? What then? Does that invalidate or prove your theory of the world? Or is it impossible to tell once you've adopted the notion that nothing can be verified (because that includes the claim that nothing can be verified)?
In any case, I'm sorry. That sounds like a really stressful way to live a life.
I don't need your pity - I'd much rather be a discerning individual that questions mainstream narratives than one who blindly accept what some "fact-checker" tells me is true because they've been deemed reputable by the organizations that pay their salaries, and provide the propaganda to them that will reaffirm the narrative they want to be perceived as true.
As Gerald Massey said -
They must find it difficult... those who have taken authority as the truth, rather than the truth as authority.
So far what you've provided me has zero informational content. You've given me links, but you've also said that none of that stuff is believable, so all that's left is a puddle of goo.
Meanwhile, all I said is that these sites have a good track record and that they give you enough information to do your own research to check their conclusions. But apparently you’re opposed to that when I’m doing it and not reaching the conclusions you want me to reach.
This is a common, infuriating practice: provides a veneer of authoritativeness and credibility to newspaper articles, and who is ever going to click on the links that support those very cogent claims? Nobody of course, so they just link to another article with more vague claims, and at any further level deep your willingness to verify that information evaporates at the same rate as the information itself.
But hey, in the meanwhile the author has managed to sneak in that "scientists have found" and that if you don't believe it you must be anti-science.
Incidentally, highlighting this abuse (together with a bunch of other quality and fact-checking) would be a great use of AI on online news publication.
Also, thorough validation would cost a ton in tokens. So it would be both expensive from the tech perspective (AI bills) and labor. Now, whose interest would be to fund such a product? I don't see too many takers...
People just have a fundamentally misguided idea of what they should be doing. You just quote. That is all. Nobody needs your originality as a writer. They just want the quotes, the sources, and, optionally, a synthesis, conclusion, and summary. That is the "work" you need to do and if you do just this, that is enough value already even if it feels like plagiarism by people who don't know what the word plagiarism means.
Everyone knows information came from somewhere. Where did you read it? We all know you didn't just wake up one day and remembered it from a past life or something like that. Why are you trying to pretend you just know it? Where are the links? Where are the screenshots?
If you are giving people the sources that you should have been giving all along, in the correct way, then you don't need to "check your sources." Because "your sources" are literally where you got that information from in first place, so you have already checked them.
Hence, if ChatGPT gave you a source, then your source isn't the source that ChatGPT gave you, because you didn't read it. Your source is, literally, ChatGPT. You should be writing "Smith, 2015, as cited by ChatGPT." Because you didn't read Smith, 2015. You read ChatGPT!
I read “Hillbilly Elegy” and wondered why it wasn’t in there. Snopes cleared it up in a matter of minutes. Why he hasn’t sued people into oblivion is his prerogative, but it’s a fascinating case study that we are, indeed, living in a Post-Truth environment.
And then, one day, the politicians started saying it...
https://youtu.be/NtRPLCso0Sw?t=14m09s
Makes me believe that you're really not commenting in good faith here.
Unless he's repeating Trump's lies, then 77M people apparently believe it.
"Rachel Maddow Wins in 9th Circuit; OAN Loses Appeal in Defamation Case"
https://timesofsandiego.com/business/2021/08/17/rachel-maddo...
-----
> Maddow’s show is different than a typical news segment where anchors inform viewers about the daily news. The point of Maddow’s show is for her to provide the news but also to offer her opinions as to that news. Therefore, the Court finds that the medium of the alleged defamatory statement makes it more likely that a reasonable viewer would not conclude that the contested statement implies an assertion of objective fact.
https://timesofsandiego.com/wp-content/uploads/2020/05/MADDO...
-----
The statement in queston: "[...T]he most obsequiously pro-Trump right wing news outlet in America really literally is paid Russian propaganda."
Glad to learn, but, that's zero percent of the reason anyone shouldn't like the guy, zero percent of the reason he's a bad person, etc.
Who cares if someone fucks couches? Apparently that kind of stuff doesn't end your political career anymore anyway.
It's not like he shot a puppy!
I don't think the right answer to widespread disinformation campaigns is retaliatory disinformation campaigns (even if they're couched – pun not intended – in a just-barely-thin-enough veil of "wink wink we know this is a joke").
The right answer is to create systems and measures that actually limit disinformation.