You can explore the underlying data using SQL queries in your browser here: https://lite.datasette.io/?url=https%253A%252F%252Fraw.githu... (that's Datasette Lite, my build of the Datasette Python web app that runs in Pyodide in WebAssembly)
Here's a SQL query that shows the users in that data that posted the most comments with at least one em dash - the top ones all look like legitimate accounts to me: https://lite.datasette.io/?url=https%3A%2F%2Fraw.githubuserc...
> select user, source, count(*), ...
it's clear that every single outlier in em-dash use in the data set is a green account.
Ellipses were never part of the analysis.
There is precedent here.
word noob new p-value
----------------------------
ai 14.93% 7.87% p=0.00016
actually 12.53% 5.34% p=1.1e-05
code 11.47% 6.04% p=0.00081
real 10.93% 2.95% p=2.6e-08
built 10.93% 2.11% p=2.1e-10
data 8.93% 3.51% p=6.1e-05
tools 7.6% 2.67% p=5.5e-05
agent 7.47% 2.95% p=0.00024
app 7.2% 3.09% p=0.00078
tool 6.8% 1.83% p=8.5e-06
model 6.8% 2.39% p=0.00013
agents 6.67% 2.11% p=5.2e-05
api 6.53% 1.12% p=2.7e-07
building 6.13% 1.54% p=1.3e-05
full 6.0% 1.97% p=0.00017
across 5.87% 1.4% p=1.3e-05
interesting 5.33% 1.54% p=0.00014
answer 5.2% 1.4% p=9.6e-05
simple 4.93% 1.54% p=0.00043
project 4.8% 1.26% p=0.00015e.g. "The body of the template is parsed, but not actually type-checked until the template is used." -> "but not typechecked until the template is used." The word "actually" here has a pleasant academic tone, but adds no meaning.
I'm totally fine with the word itself, but not with overuse of it or placing it where it clearly doesn't belong. And I did that a lot, I think. I suspect if you reviewed my HN comments, it's littered with 'actually' a ton. Also "I think...", "I feel like..." and other kind of... Passive, redundant, unnecessary noise.
Like, no kidding I think the thing I'm expressing. Why state that?
Another problem with "actually" is that it can seem condescending or unnecessarily contradictory. While I'm often trying to fluff up prose to soften disagreement (not a great habit), I'm inadvertently making it seem more off-putting than direct yet kind statements would.
It can also seem kind of re-directive or evasive at times, like you don't want to get to the point, or you want to avoid the cost of disagreement.
Finally it's another word that tends to hedge statements that shouldn't be hedged. This is what led me to realize I should use it less. I hedge just about everything I say rather than simply state it and own it. When you're a hedger and you embed the odd 'actually' in there, you get a weird mix of evasive or contradictory hedging going on. That's poor and indirect communication.
edit: There's also the aspect of how it can seem to attempt to shift authority to the speaker, if somewhat implicitly. Rather than stating that you disagree along with what you believe or adding information to discourse, you're suggesting that what you're saying deviates from what the person you're speaking to would otherwise believe. That's kind of weird to do, in my opinion. I'm very guilty of it, though I never had the intent of coming across this way.
It's so sad to me that good typographical conventions have been co-opted by the zeitgeist of LLMs.
If you'd like more tips on writing I'd be happy to help.
But now, I have to be so picky about when I use them, even when I think it's the perfect punctuation mark. I'll often just resort to a single hyphen with spaces around. It's wrong, but it doesn't signal someone to go "AI AI AI!!"
Well, I haven't always—just for maybe 20 years.
Then came LLMs, and there was so much talk of them using em dashes. A few weeks ago, I finally decided it's time and learned the difference. (Which took all of 2 minutes, btw.) Now I love em dashes and am putting them everywhere I can! Even though most people now assume I'm using AI to write for me.
This wouldn't be an issue if mobile users or Windows users were exercising it too, but it's just Mac owners and LLMs. And Mac owners are probably the minority of instances where it is used.
so now, i just use double dashes for everything.
(shit, i wonder when llms will start doing this instead of normal em)
But anyways, you can't really control how people see your stuff, if you're human I think the humanness will come through anyways, even if you have some particular structure or happen to use em-dashes sometimes. They're so easy to prompt around anyways, that the real tricky LLM stuff to detect by sense and reading is the stuff where the prompter been trying to sneakily make them more human.
(Until a few years ago I probably mostly only saw them in print, and I suppose it just never occurred to me that I liked them in particular vs. just the whole book being professionally typeset generally.)
If AI was writing like everyone else we wouldn't be talking about this. But instead it writes like a subset of people write, many of them just some of the time as a conscious effort. An effort that now makes what they write look like lower quality
Say what you want about marketing-isms of your typical LLM, they have been trained and often succeed at making legible, easy to scan blobs of text. I suspect if more LLM spam was curated/touched up, most people would be unable to distinguish it from human discourse. There are already folks commenting on this article discussing other patterns they use to detect or flag bots using LLMs.
This is the first time I've ever heard the character ";" referred to as such. It's always been "semi-colon" to me, is this a region/culture difference?
I'm not saying you're wrong, I find it interesting.
i call it a super comma when its separating a list with commas within the sets.
so if i am listing colors like green, blue, red; foods like apple, orange, strawberry; and seasons like winter, summer, fall.
it's one use case for an em-dash, because whatever you have inside it has commas in the phrase.
square and rectangle situation. a supercomma is a subset of semicolon.
Em-dash matches how I speak and think-- frequently a halt, then push onto the digression stack, then pop-- so I use them like that.
Em-dash matches how I speak and think (frequently a halt, then push onto the digression stack, then pop) so I use them like that.
Em-dash matches how I speak and think, a halt, then push onto the digression stack, then pop, so I use them like that.
Em-dashes keep everything on the same level of importance in my brain.
Commas don’t feel as powerful. To be fair to the comma I’d probably do this:
Em-dash matches how I speak and think: A halt, then push onto the digression stack, then pop. So I use them like that.
Edit: I accidentally used an em-dash in the word em-dash. Interestingly HN didn’t consider changing the dash to be a change in my text so didn’t update it. I had to make a separate change and take that change out for my dash change to stick.
>this is [summary]
>not just x, it's y
>punchy ending, maybe question
Once you know it's AI it's very obvious they told it to use normal dashes instead of em dashes, type in lowercase, etc., but it's still weirdly formal and formulaic.
For example from https://news.ycombinator.com/threads?id=snowhale
"this is the underreported second-order risk. Micron, Samsung, SK Hynix all allocated HBM capacity based on hyperscaler capex projections. NAND fabs are similarly committed. a 57% reduction in projected OpenAI spend (.4T -> B) doesn't just affect NVIDIA orders -- it ripples into the memory suppliers who shifted capacity to HBM and away from commodity DRAM/NAND. if multiple hyperscalers revise down simultaneously you get a situation similar to the 2019 crypto ASIC overhang: companies tooled up for demand that evaporated. not predicting that, but the purchasing commitments question is real."
That's why. Boring, bland, etc. That account's M.O. is basically "write a paragraph that says nothing." Fwiw, I do think AI can be indistinguishable from dumb, boring people, but usually those kinds of people won't be on HN.
https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
I'll actually post a comment or question and I'll get a reply with a bit of a paragraph of what feels like a very "off" (not 'wrong' but strangely vague) summary of the topic ... and then maybe an observation or pointed agenda to push, but almost strangely disconnected from what I said.
One of the challenges is that yeah regular users don't get each other's meaning / don't read well as it is / language barriers. Yet the volume of posts I see where the other user REALLY isn't responding to the other person seems awfully high these days.
I wonder if it is neural networks that are inherently biased, but in blind spots, and that applies to both natural and artificial ones. It may be that to approximate neutrality we or our machines have to leave behind the form of intelligence that depends on intrinsically biased weights and instead depend on logically deriving all values from first principles. I have low confidence that AI's can accomplish that any time soon, and zero confidence that natural intelligence can. And it's difficult to see how first principles regarding human values can be neutral.
I'm also skeptical that succeeding at becoming unbiased is a solution, and that while neutrality may be an epistemic advance, it also degrades social cohesion, and that neutrality looks like rationality, but bias may be Chesterson's Fence and we should be very careful about tearing it down. Maybe it's a blessing that we can't.
https://news.ycombinator.com/item?id=45322362
> First impression: I need to dive into this hackernews reply mockup thing thoroughly without any fluff or self-promotion. My persona should be ..., energetic with health/tech insights but casual and relatable.
> Looking at the constraints: short, punchy between 50-80 characters total—probably multiple one-sentence paragraphs here to fit that brevity while keeping it engaging.
> User specified avoiding "Hey" or "absolutely."
Lots more in its other comments (you need [showdead] on).
Is it ideological?
Is it product marketing in those relevant threads where someone is showcasing?
Or is it pure technical testing, playing around?
My relationship with writing, while improved, has been a difficult one. Part of me has always felt that there was a gap in my writing education. The choices other writers seem to make intuitively - sentence structure, word choice, and expression of ideas - do not come naturally to me. It feels like everyone else received the instructions and I missed that lesson.
The result was a sense of unequal skill. Not because my ideas are any less deserving, but because my ability to articulate them doesn't do them justice. The conceit is that, "If I was able to write better, more people would agree with me." It's entirely based on ego and fear of rejection.
Eventually, I learned that no matter how polished my writing is, even restructured by LLMs, it won't give me what I craved. At that moment, the separation of writer and words widened to a point where it wasn't about me anymore and more about them, the readers. This distance made all the difference and now I write with my own voice however awkward that may be.
Because it looks completely adequate for me. Maybe you're not the bad writer you think you are.
So far it hasn't happed here, but we'll see!
Incidentally, how much do they pay for a HN account that is a few years old and accumulated a few thousand Internet points?
Asking for a friend.
Sometimes there is no clear reason for fake account registration. Perhaps they were registered to be used in the future, as most fraud prevention techniques target new account registration and therefore old, aged accounts won't raise suspicion.
Slightly off-topic, but there are relatively new services that offer native brand mentions in Reddit comments, perhaps this will soon be available for HN as well, and warming up accounts might be needed for this purpose.
Slashdot's system was superior because mod points were finite and randomly dispensed. This entropy discouraged abuse by design—as opposed to making it a key feature of the site.
It's the Achilles' heel of Reddit and every site that attempts to emulate it.
I've been advocating for a while now that HN could use meta-moderation at least on flagging activity, so it can stop giving flagging powers to users who are abusing it.
Other accounts might be trying to age accounts and dilute their eventual coordinated voting or commenting rings. It's harder to identify sockpuppet accounts when they've been dutifully commenting slop for months before they start astroturfing for the chosen topic.
They don't have anything worth saying but want people to think they do
To reverse the argument - it would be amateurish and plain stupid to ignore it. Barrier to entry is very low. Politics, ads, swaying mildly opinions of some recent clusterfuck by popular megacorp XYZ, just spying on people, you have it all here.
I dont know how dang and crew protects against this, I'd expect some level of success but 100% seems unrealistic. Slow and steady mild infiltration, either by AI bots or humans from GRU and similar orgs who have this literally in their job description.
This loss of trust is getting tiresome. Depending on context we've likely all wondered if something is astro turfed, but with the frequency increase from llms it's never really possible to not have it somewhere in mind
To date, I've never used an LLM directly. I find them deeply repellant, and I've yet to be convinced that there exists a sufficiently tuned prompt that will make me not hate their literally 'mid' output.
Loss of trust though, that's a societal issue of this gilded age of grifters and scammers. Until we have a system of accountability and consequences for serial lying, we're gonna drown in this shit. LLMs are jet fuel for our existing environment of impunity.
The incentives to use bots are many.
I present ⸻ the U+2E3B dash.
There is nothing to fear, MY HUMAN FRIEND!
If AI starts use the New Yorker style diaeresis (umlaut-looking thing when there are two vowels in words like coöperate) I swear I'm gonna lose it.
Join me in double-dash em proximates. Shows you manually typed it out with total disregard token count and technical correctness.
I was going to say that I respect it, but find it utterly absurd that they do that. But your comment made me look it up again—I had no idea it was just obsolete/archaïc (except in the New Yorker), I'd thought it was a language feature their 'style' guide had invented.
Often lean slightly pro-AI, but otherwise avoid saying much about anything.
(1) I don't recommend focusing disproportionately on one signal. They'll change, and are incredibly easy to optimize for. https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
(2) I do recommend taking one minute to dash a note off to hn@ycombinator.com if you see suspicious patterns. Dang and our other intrepid mods are preturnatually responsive, and appear to appreciate the extra eyeballs on the problem.
This wasn't really a intended as an "wow, dang is sure sleeping on the job", more than an interesting observation on the new bot ecosystem.
I also feel like there's a missing discussion about the comment quality on HN lately. It feels like it's dropped like crazy. Wanted to see if I could find some hard data to show I haven't gone full Terry Davis.
Obvious AI-generated posts and articles make it to the front page on a daily basis, and I get the impression that neither the average user nor the moderation team see that as a problem at all anymore.
I know there are legitimate usecases for the em-dash, but a few paragraphs (at most) of text in an HN/Reddit comment? Into the trash it goes.
Don’t mind me, just skewing the results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — results. — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Could be an argument made for aggregating by user instead however, if some bots are found to be particularly active and skewing the data.
Why not? I am a descendant of Africans. I am a mildly successful author by tech nerd standards. I was educated in the British Public School tradition, right down to taking Latin in high school and cheering on our Rugby* and Cricket teams.
If someone doesn't want to read my words or employ me because I must be AI, that's their problem. The truth is, they won't like what I have to say any more than they like the way I say it.
I have made my peace with this.
———
Speaking of Rugby, in 1973 another school's Rugby team played ours, and almost the entire school turned out to watch a celebrity on the other school's team.
His name was Andrew, and he is very much in the news today.
I also see AIs use emdashes in places where parentheses, colons, or sentence breaks are simply more appropriate.
What we think others around us think has a big effect on our own behavior
Not sure which is scarier
Is it possible to differentiate between a bot, and a human using AI to 'improve' the quality of their comment where some of the content might be AI written but not all? I don't think it is.
hm, the whole internet really, youtube, reddit, twitter, facebook, blog posts, food recipes, news articles, it's getting more and more obvious
And bots reposting a trending post from like 12 years ago to farm internet points... with other bots reposting the top comments of the initial post
lets bring back Chrome's WEI while we're at it
/s
Brevity is the soul of wit.
I'm more worried about how many people reply to slop and start arguing with it (usually receiving no replies — the slop machine goes to the next thread instead) when they should be flagging and reporting it; this has changed in the last few months.
I'm never suspicious though. One of the strange, and awesome, and incredibly rare things about HN is that I put basically zero stock in who wrote a comment. It's such a minimal part of the UI that it entirely passes me by most of the time. I love that about this site. I don't think I'm particularly unusual in that either; when someone shared a link about the top commenters recently there were quite a few comments about how people don't notice or how they don't recognize the people in the top ranks.
The consequence of this is that a bot could merrily post on here and I'd be absolutely fine not knowing or caring if it was a bot or not. I can judge the content of what the bot is posting and upvote/downvote accordingly. That, in my opinion, is exactly how the internet should work - judge the content of the post, not the character of the poster. If someone posts things I find insightful, interesting, or funny I'll upvote them. It has exactly zero value apart from maybe a little dopamine for a human, and actually zero for a robot, but it makes me feel nice about myself that I showed appreciation.
Bot prevention is a very difficult constant game of cat and mouse, and a lot of bot operators have become very skilled at determining the hidden metrics used by platforms to bless accounts; that's their job, after all. I've become a big fan of lobste.rs' invitation tree approach, where the reputation of new accounts rides on the reputation of older accounts, and risks consequence up the chain. It also creates a very useful graph of account origin, allowing for scorched earth approaches to moderation that would otherwise require a serious (and often one-off) machine learning approach to connect accounts.
Additionally, lots of Chinese and Russian keyboard tools use the em dash as well, when they're switching to the alternative (en-US) layout overlay.
There's also the Chinese idiom symbol in UTF8 which gets used as a dot by those users a lot, so that could be a nice indicator for legit human users.
edit: lol @ downvotes. Must have hit a vulnerable spot, huh?
That’s why the analysis was performed over time. All of those em dash sources you mentioned were present before LLM written content became popular.
I'm not trying to negate the fact. I'm just pointing out that a correlation without another indicator is not evidence enough that someone is a bot user, especially in the golden age of rebranded DDoS botnets as residential proxy services that everyone seems to start using since ~Q4 2024.
AI use is similar. Ask it to do whatever writing or text wrangling you want, but please show the public the sanitized version.
Our company is being attacked rn in tech media and at least some of it, gut feeling wise, seems obviously sponsored / promoted by competitors. I know that's not surprising, but never watched it happen from this side before.
There is no real AI detection tool that works.
When we see something like emd-ashes its simply the average of the used text the models trained on. If you fall into one the averages of a model you basically part of the model ouput. Yikes.
It is also interesting to note that the comparison is between recent comments and recent comments by new users. So, I guess this would take care of the objection that em-dashes (a perfectly fine piece of punctuation) have just been popularized by bots, and now are used more often by humans as well.
Maybe there is a bot problem. Seems almost impossible to fix for a site like this…
To turn off Smart Punctuation: Home > Settings > General > Keyboard > Smart Punctuation > Off.
Bye bye em-dash, we had a nice run together.
I might start using that⸻one (a bit long...)
- Generate age so spamming a product/service is easier and the account appears more trustworthy
- Influence discussions in a particular direction for monetary gain, i.e. "I got rich on bitcoin, you'd be crazy not to invest".
- Influence discussions in a particular direction for political gain, i.e. "I went to Xinjiang and the Uyghurs couldn't be happier!"
Incidentally, some folks reported my stuff for potential AI generation and I had to respond to the mods about it. So that was kinda funny, if also sad to hear that some folks thought I was a bot.
I’m a dinosaur, not a robot dinosaur. I’m nowhere near that cool, alas.
The tell here is that you used a hyphen, not an em-dash.
This `-` is a hyphen, which I love, even if I'm fairly sure I'm not using it correctly in grammar a lot of the time.
This `--` is an EM-Dash, apparently, which is also what I never use but I also thought was just a hyphen in a different context (incorrect!).
Show HN: Hacker News em dash user leaderboard pre-ChatGPT - https://news.ycombinator.com/item?id=45071722 - Aug 2025 (266 comments)
... which I'm proud to say originated here: https://news.ycombinator.com/item?id=45046883.
Every time someone states they stop reading when they encounter proper typography, I feel attacked.
even though I used to like pointing out the difference between a hyphen and a period.
Spaces like HN then become a cacophony of clankers clanking as their numbers increase
I just hope my writing carries enough voice and perspective that people respond, even if there's an em dash or two.
Perhaps there needs to be some sort of voluntary ethical disclosure practice to disclaim text as AI-generated with some sort of unusual signifiers. „Lower double quotes perhaps?„
What could help is a careful clique hunting algorithm to accurately identify and delete the entire clique.
Of course, all of the above can be replaced by AI, but it would not significantly alter the status quo.
https://practicaltypography.com/hyphens-and-dashes.html
I will not allow my good practices to get co-opted as AI "smoke tests".
again with the conspiracy theories
Yeah, right? Not one ever actually turned out to be true!
That conspiracy about billionaires, who supposedly own all of western media, having deliberately created an environment in which anyone who expresses even the remote idea of a conspiracy, gets discreditted, is also not true!
None of them are true!
Not. A. Single. One.
*noms cheese pizza*
What will/can HN do about it?
If that's worth the cost... probably not?
For now maybe all forums should require some bloody swearing in each comment to at least prove you've got some damn human borne annoyance in you? It might even work against the big players for a little bit, because they have an incentive to have their LLMs not swearing. The monetary reward is after all in sounding professional.
Easy enough for any groups to overcome of course, but at least it'd be amusing for a while. Just watching the swear-farms getting set up in lower paid countries, mistakes being made by the large companies when using the "swearing enabled" models and all that.
Maybe the em dash is the self censorship/deletion mechanism that we've all been waiting for. Better than having to write pill subscription ads, I suppose.