> In the end, shaming people for writing that gets flagged as AI can lead people to sidestep structures the model has learned from us: structures that are effective tools for argumentation. We take the tools of critical thinking out of the kit at the time we most need them.
Now we’re going to stop using effective rhetorical methods because they imply AI, even if we know we’re not using AI?
It reminds me of, as a teenager, asking my dad if he ever saw Led Zeppelin live. He hadn’t, because he didn’t really like fans of Led Zeppelin and didn’t want to be associated with them.
As an ashamed fan of certain bands I get this instinct but I also promised to myself when I heard this that I would do my best to not allow other people to influence how I thought about things I enjoyed.
On the same note I’m trying to be “braver” about things like em-dashes, though my personally style has always been to use them as I did in this comment- like this, which I guess distinguishes me, until an LLM picks that up too…
That's really unfortunate though. It's like Michael Bolton from Office Space: "No way! Why should I change? He's the one who sucks."
And you can pry my em dashes from my cold, dead hands.
The "AI Detection" tools employed by schools also regularly flag writing from those with Autism, ADHD, and non-native English speakers as being AI generated as well.
So, naturally, I can't stand the phrase "write like AI" when these things tend to come up because no, there are no humans that "write like AI" it's the models that have stolen the literary devices from us and now have poisoned them.
For better or worse (and pretty much for worse), these usages have become AI idioms. Language evolves over time, things that used to be harmless become offensive, certain terms end up taking on the complete opposite meaning than their original meaning, and we are watching certain language patterns and idioms become watermarks for AI and while it sucks, it doesn't make it false.
"We create a culture of self-censorship and AI-detector-pressured rewriting and paraphrasing as people strive to avoid these witch hunts. That is the opposite of protecting human expression. We should resist normalizing a trust in any machine's ability to determine matters of guilt. If using AI to write is, at its worst, an industrialization of the mind, then AI detection, at its worst, becomes a surveillance system for thought."
And, I'm sorry (I'm not), but I am not going to just roll over and shrug and say "welp, guess we all need to dumb our writing down to keep well-meaning idiots from screeching 'AI! AI! AI! WHOOP! WHOOP WHOOP WHOOP!' at us." That isn't the evolution of language. It's Idiocracy.
Now I'll have to find something else to overuse: maybe sentences structures around colons, or use of Japanese 「hook brackets」.
This is honestly both terrifying and well articulated.
High praise to the blog author.
I agree, but it’s worth noting that that has been done since long before LLMs. Fifteen years ago, I used to teach a graduate course on academic writing pedagogy. The students and I would read research papers on the teaching of academic writing; we also analyzed textbooks and course syllabuses to get an idea about what was actually being done in classrooms. While phrases like “critical thinking” did come up, the overall focus was clearly on language patterns: sentence and paragraph structure, the use of transition words, vocabulary for hedging and boosting (i.e., making assertions seem weaker or stronger), etc.
In a university context, it can be very difficult to evaluate student writing based on its content. In humanities-focused and creative writing, what the student decides to say can be seen as an extension of the student’s personality, identity, and individual experience; if a teacher evaluates the content, including the reasoning, it can seem that the teacher is evaluating the student as a person. And if the students are in the sciences, especially at the graduate level, the writing teacher often won’t even understand what the students write because it is too technical. Teaching and evaluating language patterns, not content, is often the only option.
I think that the current models are still like over-achieving savants rather than true human level because the largest model is only 1/10th the complexity of the human brain. I've recently become fairly convinced that new hardware paradigms (like types of CIM) are about to move from research into real-world development and scaling. So I believe within a few years, the model sizes will increase by another 10 times.
Compared to upcoming 100 trillion parameter models, humans will obviously be _much_ dumber/slower than AI in all fields. Already with the 10T models, some LLMs beat 99.9% of humans in competitive programming.
The AI hatred from many may actually continue to increase, but in cases where the bottom line matters, we are rapidly approaching the point where writing or work product that looks like it is human-authored will be suspect just on that basis. In other words, for some people it will be the reverse -- "this work looks like it was created by a human" could be devastating for your businesses credibility at that point.
Also, the Greeks were worried about rhetoric and, in my opinion, rightly so. The skill to argue a point well is different than those that are needed to be correct. To become a skilled rhetoritician was viewed as dangerous (and right now AIs are only moderately good... though they are improving fast).
This feels like an easy enough hypothesis to verify, for anyone in the business of training LLMs - does the not-X-but-Y rate increase after RLVR?
I don't know if it was written that way to show trust in the reader's intelligence, show disregard for reaching a wide audience, show a demonstration of skill, or was artifact of someone just thinking at that level.
In the interest of not occupying significant page/screen height with LLM output, example prompts+responses here: https://dpaste.com/H9DXKNYQH.txt
> I don't know if it was written that way to show trust in the reader's intelligence, show disregard for reaching a wide audience, show a demonstration of skill, or was artifact of someone just thinking at that level.
This is merely the end-state of industrialization, which is efficient and soulless.
It's interesting why LLMs generate constructions like this more frequently than they presumably exist in the training set. I wonder if this is some sort of mode collapse caused by post training, and/or maybe because they are training on synthetic data so these things become self-perpetuating and self-amplifying (a feedback loop)?
The lesson for humans worried about being falsely identified as AI is just learn to write better! It doesn't matter where your repertoire of phrasing comes from (copying AI or not), but one of the basic rules of writing is not to repeat yourself unless you are doing so deliberately for a purpose. Go ahead and use "It's not just X. It's Y" if you want to, but if you use it multiple times in the same short piece of writing, then you may deserve to be called out for poor style, if not for being an AI.
If LLMs generated text based on training data frequency they'd likely be some of the most vulgar and hostile things ever created. The internet is full of insults, profanity, and low effort content. The repeated phrases are a side effect of reward optimization rather than some kind of model collapse.
It is bad writing.
Sometimes it’s not just about the Ys but also the Qs.
How is this different from humans? When I went to high school, my teachers extorted me too. Especially subjects like English and unlike Math, where evaluation is 100% subjective.
Until 2006, the national Finnish/Swedish (as a native language) exam at the end of high school in Finland consisted of two essays. One was based on materials that were provided, and another on a topic that was given. I believe there were a few options to choose from. Both essays were scored independently, and your score was the better of the two. If you had learned to write essays, it was effectively an intelligence test and a good predictor of future academic success across many fields.
Including CS, as my department found out. In particular, the ability to write essays was a better predictor of success in CS studies than the scores in mathematics and natural sciences. Probably because there has always been a large subset of CS students who are otherwise good at CS but can't handle anything resembling mathematics.
- "No X, No Y, No Z." pattern
- "Here is X - it makes Y"
The worst and most obvious one is the constant over use of emoji ticks and crosses.
*actually a hyphen but it's functioning as an em dash.
This reminds me of another em dash+AI related topic: I've noticed LLMs have an extreme bias towards spaces around the dash while people can go either way with it.
I put to you, if you see a trope in AI writing it's because that trope appeared in the training corpus. Therefore, sure, being predjudice against it lets you catch some AI, but you'll also flag human outout. I think that may not be worth it in the end.
I used one to help me plan a sales route, and it kept fucking it up. Every time I corrected it, it tried that hand wringing vizier sort of ass kissing. It's very off-putting, but I can see how someone struggling with social interaction could be sucked into that nonsense.
And that's why everyone on the receiving end of the AI slop deluge is so paranoid.