https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
Another one that seems impossible for LLMs to avoid: breaking article into a title and a subtitle, separated by a colon. Even if you explicitly tell it not to, it'll do it.
If anyone who works on LLMs is reading, a question: When we've tried base models (no instruction tuning/RLHF, just text completion), they show far fewer stylistic anomalies like this. So it's not that the training data is weird. It's something in instruction-tuning that's doing it. Do you ask the human raters to evaluate style? Is there a rubric? Why is the instruction tuning pushing such a noticeable style shift?
[1] https://www.pnas.org/doi/10.1073/pnas.2422455122, preprint at https://arxiv.org/abs/2410.16107. Working on extending this to more recent models and other grammatical features now
Interestingly, because perplexity is the optimization objective, the pretrained models should reflect the least surprising outputs of all.
> Why is the instruction tuning pushing such a noticeable style shift?
Gwern Branwen has been covering this: https://gwern.net/doc/reinforcement-learning/preference-lear....
Wonder how they can avoid the trop while not censoring themselves out.
> Honestly? We should address X first. It's a genuine issue and we've found a real bug here.
Honorable mention: "no <thing you told me not to do>". I guess this helps reassure adherence to the prompt? I see that one all the time in vibe coded PRs.
But I feel like I’ve noticed an uptick in people using the adverb “genuinely” in what I genuinely believe to not be AI generated comments, articles, etc. Maybe it’s just me, I got similar vibes about the word efficacy a few years ago, before the ascent of GenAI (but after the pandemic — again, maybe just me).
I see this so often. Sometimes it’s just “no react hooks”, other times it gets literal and extra unnatural, like: “here’s <your thing>, no unnecessary long text explanation”. Perhaps we’re past AGI and this is passive aggressiveness ;)
It makes a tremendous difference. Almost everything on this list is the emotional fluff ChatGPT injects to simulate a personality.
This one hit home... the first time I ever saw Claude do it I really liked it. It's amazing how quickly it became the #1 most aggravating thing it does just through sheer overuse. And of course now it's rampant in writing everywhere.
"No rough handling. No struggles to accelerate. Just pure performance. The new Toyota GT. It's not just a car—it's a revolution."
Most of the tropes listed on this page give text a more "car ad" (or sometimes "movie trailer") quality. I wonder if magazine scans and press releases unduly weighted the training set.
With this I am able to get all my favorite subs onto my actual hard drive, with some extra awesome features as a result: I vibe coded a little helper app that lets me query the transcript of the video and ask questions about what they say, using cheap haiku queries. I can also get my subs onto my jellyfin server and be able to view it in there on any device. Even comments get downloaded.
All these streamers have gone too far trying to maximize engagement and have broken the social contract, so I see this as totally fair game.
One I've seen Gemini using a lot is the "I'll shoot straight with you" preamble (or similar phrasing), when it's about to tell me it can't answer the question.
It does not seem like there are lots of people who are perversely inclined to write a story with all these tropes and words in it, but surely there must be some, because if you make something that beats the LLM (by being creatively good) using all the crap the LLM uses, it would seem some sort of John Henry triumph (discounting the final end of John Henry of course, which is a real downer)
If AI finally gets rid of the thing that drove me nuts for years: "leverage" as a verb mean roughly "to use"—when no human intervention seems to work, then I shall be over-the-moon happy. I once worked at a place where this particular word was lever—er, used all the damn time and I'd never encountered something so NPC-ish. I felt like I was on The Twilight Zone. I could've told you way back then that you sounded like a bot doing that, now people might actually believe me and thank god.
I will stick by the em dashes however. And I might just start using arrows too. Compose - > → right arrow. Not even difficult.
I hadn't noticed this - great point. To be fair the "home cooked meal" metaphor comes from 2020, predating genAI coding[1]. But even then, CPUs themselves are so normalised that we just kind of... forget how vertiginously complex the entire supply chain is.
I mean, "tapestry" is a great word for something that is interconnected. Why not use it?
Negative parallelism is a staple of briefs. "This case is not about free speech. It is about fraud." It does real work when you're contesting the other side's framing.
Tricolons and anaphora are used as persuasion techniques for closing arguments and appellate briefs.
Short punchy fragments help in persuasive briefs where judges are skimming. "The statute is unambiguous."
As with the em dash - let's not throw the baby out with the bath water.
No thanks, I hate this large scale social experiment
I understand the sentiment. Meaning I think I understand some of the underlying frustration. But I don't care for the tone or the framing or the depth of analysis (for there isn't much there; I've seen the "if you didn't write it, why should I read it" cliché before *, and it ain't the only argument in town). Now for my detailed responses:
1. In the same way the author wants people to respect other people, I want the author to respect the complexity of the universe. I'm not seeing that.
2. If someone says "I wrote this without any LLM assistance" but do so anyway, THAT is clearly deceptive.
3. If you read a page that was created with LLM assistance, it isn't reasonable for you to say the creator was being deceptive just because you assumed. It takes two to achieve deception: both the sender and the receiver.
4. If you read a page on the internet, it is increasingly likely there was no human in the loop for the article at all. Good luck tracing the provenance of who made the call to make it happen. It might well be downstream of someone's job. (Yes, we can talk about diffusion of responsibility, etc., that's fair game -- but if you want to get into the realm of moral judgments, this isn't going to be a quick and tidy conversation)
5. I think the above comment puts too much of a "oh the halcyon days!" spin on this. Throughout history, many humans, much of the time, are largely repackaging things we had heard before. Unfortunately (or just "in reality") more of us are catching on to just how memetically-driven people are. We are both individuals and cogs. It is an uncomfortable truth. That brainwashed uncle you have is almost certainly a less reliable source of information than Claude.
6. The web has crappy incentives. It sucks. Yes, I want people to behave better. That would be nice, but I can't realistically expect people to behave better on the web unless there are incentives and consequences that align with what I want. The Web is a dumpster fire, not because of bad individuals, but because of system dynamics. Incentives. Feedback.
7. If people communicate more clearly, with fewer errors, that's at least a narrow win. One has to at least factor this in.
8. People accusing other people of being LLMs has a cost. Especially when people do it overconfidently or in a crude or mean manner. I've been on the receiving end. Why? Because I write in a way that sometimes triggers people because it resembles how LLMs write.
* I want to read high quality things. I actually care less if you wrote it as bullet points, with the help of an LLM, on a napkin, on a posterboard ... my goal is to learn from something suited to some purpose. I'm happy reading a computer-generated chart. I don't need a human to do that by hand.
The previous paragraph attempts to gesture at some of the conceptual holes in the common arguments behind "if you want a human to read it, a human should right it": they aren't systematically nor rigorously "wargamed" or "thought-experimented"; they are mostly just "knee-jerked".
I am quite interested in many things, including: (1) connecting with real people; (2) connecting with real people that don't merely regurgitate an information source they just ingested; (3) having an intelligent process generating the things I read. As an example of the third, I want "intelligent" organizations that synthesize contributions from their constituent parts. I want "intelligent" algorithms to help me focus on what matters to me. &c.
If a machine does that well, I'm not intrinsically bothered. If a human collaborates with an LLM to do that, fine. Whatever. We have bigger problems! Much bigger ones.
Yes, I want to live in a world where humans are valued for what they write and their intrinsic qualities, even as machines encroach on what used to be our biggest differentiator: intelligence itself. But wanting this and morally shaming people for not doing it doesn't seem like a good way to actually make it happen. Getting to that world, to my eye, requires public sense-making, grappling with the reality of how the world works, forming coalitions, organizing society, and passing laws.
Yes, I understand that HN has a policy that people write their own stuff, and I do. (See #8 above as well as my about page.)
Thank you to the approximately zero or maybe one person who made it this far. I owe you a beer. You can easily find me. I'm serious. But then we have to find a way to have a discussion while enjoying a beer on a video call. Alas.
I expect better from people -- and unfortunately a lot of people's output is lower quality than what I get from Claude. THIS is what pisses me off: that a machine-curated output is actually more useful to me than a vast majority of what people say, at least when I have particular questions to ask. This is one or many uncomfortable realities I would like people need to not flinch away from. As far as intelligent output is concerned, humans are losing a lot of ground. And fast. Don't shoot the messenger. If you don't recognize this, you might have a rather myopic view of intelligence that somehow assumes it must be biological or you just keep moving the goalposts. Or that somehow (but how?) humans "have it" but machines can't.