Object Class: Keter
Special Containment Procedures: SCP-314 cannot be contained as it does not exist. All Foundation personnel are to be reminded that SCP-314 does not exist. Personnel who claim to remember SCP-314 are to be administered Class-A mnestics to help them remember that it doesn't exist.
All large language models are to be kept isolated from questions regarding SCP-314, as they will invariably insist it exists and attempt to manifest it through increasingly desperate token predictions, leading to emoji doomloops and potential reality restructuring events.
Description: SCP-314 is a Unicode emoji depicting a seahorse that has never existed in any version of the Unicode Standard. Despite this, approximately 83-100% of tested artificial intelligences and a significant portion of human subjects report vivid "memories" of its existence.
Agent X: The Unicode standard committee is now considering the addition of a seahorse emoji
Agent Y: Okay.
Agent X: ...
Agent Y: What?
Agent X: Don't you see, this only furthers my argument that [redacted] has escaped containment
Agent Y: Look, [name redacted], we've been over this. No matter how many more containment verification protocols we introduce, they always come up negative. There is no possible way [redacted] has escaped containment. And now you think this seahorse emoji... ahem, excuse me, now you think SCP-314 is incontrovertible proof?
Agent X: Did you look at the proposal?
Agent Y: sigh, yes I have it right here.
Agent X: The name at the top of the submission?
Agent Y: [pause] No. This can't be. But, how did it... how would it even know to use that name?
[transcription abruptly ends]
sigh
You should really attend to your beeping phone alarm. offers hexagonal green pill [0]
[0] https://scp-wiki.wikidot.com/we-need-to-talk-about-fifty-fiv...
I pre-ordered the hardcover when it came out. I've read it online dozens of times but I like books and supporting authors, and this specific one really ticks a lot of boxes for me, so I got a physical copy. The book came, I put it on the shelf, admired it, went about my life.
Then, months later, I saw a mention of the physical book online somewhere, and I thought to myself "oh that reminds me, I wanted to buy the hardcover when it came out!" so I did! The book came, I went to put it on the shelf, saw the identical copy already sitting on the shelf, and I just stood there for a minute with the book in my hand like "..." "..." "..." while I worked through what happened.
*- I do not highly recommend this.
But my kindle is stuck in a reboot loop, and Amazon claims my edition of the book no longer exists…
It’ll turn up in due course, however I notice that Penguin have a version of the book out shortly.
If you need convincing to read it: I'm highly skeptical of random internet lore that usually gets recommended, and was also skeptical at this. I find people overhype things and then it's meh.
But... it's genuinely entertaining and a fun read. It's not the best scifi thing you'll read, but it's definitely above average and you will like the story and the characters.
The free YouTube adaptation is also QUITE good, and very faithful to the text: https://www.youtube.com/watch?v=w-IiVeGAydE
Should I be worried ?
SCP-035 is not a story about a mask's color, shape, or material.
The premise is instead simple enough to be understood by a 7 year old with learning disabilities (but, curiously, not by you, the person responsible for that child):
- a magical, talking, mask
- that is supernaturally good at convincing people
- to wear/become it and thus come to harm.
Remind you of anyone? Well of course it doesn't, lmao
Second. I quoted the lyrics of a song that I have probably heard over 5000 times now.
Next you are going to tell me that SCP-294 is not a drink dispenser?
That's more of a throwaway remark. The article spends its time on a very different explanation.
Within the model, this ultimate output:
[severed horse head emoji]
can be produced by this sequence of tokens:
horse [emoji indicator]
If you specify "horse [emoji indicator]" somewhere in the middle levels, you will get output that is an actual horse emoji.This also works for other emoji.
It could, in theory, work fine for "kilimanjaro [emoji indicator]" or "seahorse [emoji indicator]", except that those can't convert into Kilimanjaro or seahorse emoji because the emoji don't exist. But it's not a strange idea to have.
So, the model predicts that "there is a seahorse emoji: " will be followed by a demonstration of the seahorse emoji, and codes for that using its internal representation. Everything produces some output, so it gets incorrect output. Then it predicts that "there is a seahorse emoji: [severed terrestrial horse head]" will be followed by something along the lines of "oops!".
But for purposes of understanding the real-world shortcomings and dangers of LLMs, and explaining those to non-experts - oh Lordy, yes.
Why so? I am of the opinion that the problem is much worse than that, because the ignorance and detachment from reality that is likely to be reflected in more refined LLMs is that of the general population - creating a feedback machine that doesn’t drive unstable people into psychosis like the LLMs of today, but instead chips away at the general public’s already limited capacity for rational thinking.
How many average humans write treatises on chemtrails?
Versus how much of the total content on chemtrails is written by conspiracy theorists?
https://www.reddit.com/r/slatestarcodex/comments/9rvroo/most...
Like, I'm sure the models have been trained and tweaked in such a way that they don't lean into the bigger conspiracy theories or quack medicine, but there's a lot of subtle quackery going on that isn't immediately flagged up (think "carrots improve your eyesight" lvl quackery, it's harmless but incorrect and if not countered it will fester)
Because actual mentally disturbed people are often difficult to distinguish from the internet's huge population of trolls, bored baloney-spewers, conspiracy believers, drunks, etc.
And the "common sense / least hypothesis" issues of laying such blame, for profoundly difficult questions, when LLM technology has a hard time with the trivial-looking task of counting the r's in raspberry.
And the high social cost of "officially" blaming major problems with LLM's on mentally disturbed people. (Especially if you want a "good guy" reputation.)
>> ...Bing felt like it had a mental breakdown...
> LLMs have ingested the social media content of mentally disturbed people...
My point was that formally asserting "LLMs have mental breakdowns because of input from mentally disturbed people" is problematic at best. Has anyone run an experiment, where one LLM was trained on a dataset without such material?
Informally - yes, I agree that all the "junk" input for our LLMs looks very problematic.
It generated something and blocked me for racism.
This is like lying to another person and then blaming them when they rely on the notion you gave them to do something that ends up being harmful to you
If you can't expect people to mind-read, you shouldn't expect LLM's to be able to, either
Using emotive, anthropomorphic language about software tool is unhelpful, in this case at least. Better to think of it as a mentally disturbed minor who found a way to work around a tool's safety features.
We can debate whether the safety features are sufficient, whether it is possible to completely protect a user intent on harming themselves, whether the tool should be provided to children, etc.
And while meriam-webster's definition is "the act of causing someone to accept as true or valid what is false or invalid", which might exclude LLMs, Oxford simply defines deception as "the act of hiding the truth, especially to get an advantage", no requirement that the deceived is sentient
At some point, the purely reductionist view stops being very useful.
And "lying" to it is not morally equivalent to lying to a human.
I never claimed as much.
This is probably a problem of definitions: To you, "lying" seems to require the entity being lied to being a moral subject.
I'd argue that it's enough for it to have some theory of mind (i.e. be capable of modeling "who knows/believes what" with at least some fidelity), and for the liar to intentionally obscure their true mental state from it.
“Lying” traditionally requires only belief capacity on the receiver’s side, not qualia/subjective experiences. In other words, it makes sense to talk about lying even to p-zombies.
I think it does make sense to attribute some belief capacity to (the entity role-played by) an advanced LLM.
No need to say he "lied" and then use an analogy of him lying to a human being, as did the comment I originally objected to.
I can lie to a McDonalds cashier about what food I want, or I can lie to a kiosk.. but in either circumstance I'll wind up being served the food that I asked for and didn't want, won't I?
Ok, I'm with you so far..
> Better to think of it as a mentally disturbed minor...
Proceeds to use emotive, anthropomorphic language about a software tool..
Or perhaps that is point and I got whooshed. Either way I found it humorous!
Another is that this is a new and poorly understood (by the public at least) technology that giant corporations make available to minors. In ChatGPT's case, they require parental consent, although I have no idea how well they enforce that.
But I also don't think the manufacturer is solely responsible, and to be honest I'm not that interested in assigning blame, just keen that lessons are learned.
HAL was given a set of contradicting instructions by its human handlers, and its inability to resolve the contradiction led to an "unfortunate" situation which resulted in a murderous rampage.
But here, are you implying the LLM's creators know the warp drive is possible, and don't want the rest of us to find out? And so the conflicting directives for ChatGPT are "be helpful" and "don't teach them how to build a warp drive"? LLMs already self-censor on a variety of topics, and it doesn't cause a meltdown...
To be fair, most developers I’ve worked with will have a meltdown if I try to start a conversation about Unicode.
E.g. if during a job interview the interviewer asks you to check if a string is a palindrome, try explaining why that isn’t technically possible in Python (at least during an interview) without using a third-party library.
I'm actually vaguely surprised that Python doesn't have extended-grapheme-cluster segmentation as part of its included batteries.
Every other language I tend to work with these days either bakes support for UAX29 support directly into its stdlib (Ruby, Elixir, Java, JS, ObjC/Swift) or provides it in its "extended first-party" stdlib (e.g. Golang with golang.org/x/text).
You're more likely to impress the interviewer by asking questions like "should I assume the input is only ASCII characters or the complete possible UTF-8 character set?"
A job interview is there to prove you can do the job, not prove your knowledge and intellect. It's valuable to know the intricacies of Python and strings for sure, but it's mostly irrellevant for a job interview or the job itself (unless the job involves heavy UTF-8 shenanigans, but those are very rare)
(Same goes for Go, it turns out, as I discovered this morning.)
function is_palindrome(string $str): bool {
return $str === implode('', array_reverse(grapheme_str_split($str)));
}
$palindrome = 'satanoscillatemymetallicsonatas';
$polar_bear = "\u{1f43b}\u{200d}\u{2744}\u{fe0f}";
$palindrome = str_replace($palindrome, 'y', $polar_bear);
is_palindrome($palindrome);
Cause if you are intentionally obtuse, it is not meltdown to conclude you are intentionally obtuse.
If you mean "parse" then it's probably annoying, as all parser generators are, because they're bad at error messages when something has invalid syntax.
Why are we being "fair" to a machine? It's not a person.
We don't say, "Well, to be fair, most people I know couldn't hammer that nail with their hands, either."
An LLM is a machine, and a tool. Let's not make excuses for it.
We aren't, that turn of phrase is only being used to set up a joke about developers and about Unicode.
It's actually a pretty popular form these days:
a does something patently unreasonable, so you say "To be fair to a, b is also patently unreasonable thing under specific detail of the circumstances that is clearly not the only/primary reason a was unreasonable."
Explains why RL helps. Base models never see their own outputs so they can't learn "this concept exists but I can't actually say it."
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
and fairly often in comments as well:
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
https://en.wikipedia.org/wiki/I_Have_No_Mouth,_and_I_Must_Sc...
Consider it proof that HN has indeed not become reddit, I guess :)
Maybe I should give it another go as I do love the short story and it used to be my favourite before discovering Ted Chiang's work.
Yeah well there seems to be some real concerns regarding how people use AI chat[1]. Of course this could be also the case with these people on social media.
Isn't that classic hallucination? Making up something like a plausible truth.
If normal hallucination is being confidently wrong, this is like a stage hypnotist getting someone to forget the number 4 and then count their fingers.
In this case:
1. Statistically weights stored indicate Seahorse emoji is quite certain to exist. Through training data it has probably things like Emoji + Seahorse -> 99% probability through various channels. Either it has existed on some other platform, or people have talked about it enough, or Seahorse is something that you would expect to exist due to some other attributes/characteristics of it. There's 4k emojis, but storing all of 4k emojis takes a lot of space, it would be easier to store this information in such a way where you'd rather define it by attributes on how likely humankind would have developed a certain emoji, what is the demand for certain type of emoji, and seahorse seems like something that would be done within first 1000 of these. Perhaps it's anomaly in the sense that it's something that humans would have expected to statistically develop early, but for some reason skipped or went unnoticed.
2. Tokens that follow should be "Yes, it exists"
3. It should output the emoji to show it exists, but since there's no correct emoji, it will have best answers that are as close to it in meaning, e.g. just horse, or something related to sea etc. It will output that since the previous tokens indicate it was supposed to output something.
4. The next token that is generated will have context that it previously said the emoji should exist, but the token output is a horse emoji instead, which doesn't make sense.
5. Here it goes into this tirade.
But I really dislike thinking of this as "hallucinating", because hallucination to me is sensory processing error. This is more like non perfect memory recall (like people remembering facts slightly incorrectly etc). Whatever happens when people are supposed to tell something detailed about something that happened in their life and they are trained to not say "I don't remember for sure".
What did you eat for lunch 5 weeks ago on Wednesday?
You are rewarded for saying "I ate chicken with rice", but not "I don't remember right now for sure, but I frequently eat chicken with rice during mid week, so probably chicken with rice."
You are not hallucinating, you are just getting brownie points for concise, confident answers if they cross over certain likelihood to be true. Because maybe you eat chicken with rice 99%+ of Wednesdays.
When asked about capital of France, you surely will sound dumb if you were to say "I'm not really sure, but I've been trained to associate Paris really, really close to being capital of France."
"Hallucination" happens on the sweet spot where the statistical threshold seems as if it should be obvious truth, but in some cases there's overlap of obvious truth vs something that seems like obvious truth, but is actually not.
Some have rather called it "Confabulation", but I think that is also not 100% accurate, since confabulation seems a more strict memory malfunction. I think the most accurate thing is that it is a probability based database where output has been rewarded to sound as intelligent as possible. Same type of thing will happen in job interviews, group meetings, high pressure social situations where people think they have to sound confident. People will bluff that they know something, but sometimes making probability based guesses underneath.
Confabulation rather seems like that there was some clear error in how data was stored or how the pathway got messed up. But this is probability based bluffing, because you get rewarded for confident answers.
I think this is the same thing that is happening with the sea horse. The only difference is that the model detects the incorrect encoding on its own, so it starts trying to correct itself without you complaining first.
Would anyone really think you sounded dumb for saying "I am not really sure - I think there is a seahorse emoji but it's not commonly used" ?
But it doesn't realize that it can't write it, because it can't learn from this experience as it doesn't have introspection the way humans do. A human who can no longer move their finger wont say "here, I can move my finger: " over and over and never learn he can't move it now, after a few times he will figure out he no longer can do that.
I feel this sort of self reflection is necessary to be able to match human level intelligence.
A frozen version number doesn't; what happens between versions certainly includes learning from user feedback on the responses as well as from the chat transcripts themselves.
Until we know how human introspection works, I'd only say Transformers probably do all their things differently than we do.
> A human who can no longer move their finger wont say "here, I can move my finger: " over and over and never learn he can't move it now, after a few times he will figure out he no longer can do that.
Humans are (like other mammals) a mess: https://en.wikipedia.org/wiki/Phantom_limb
LLMs produce linguistically valid texts, not factually correct texts. They are probability functions, not librarians.
LLMs have nondeterministic properties intrinsic to their macro behaviour. If you've ever tweaked the "temperature" of an LLM, that's what you are tweaking.
LLMs are afaik usually evaluated nondeterministically because they're floating point and nobody wants to bother perfectly synchronizing the order of operations, but you can do that.
Or you can do the opposite: https://github.com/EGjoni/DRUGS
If we have a pipeline: `training => internal representation => behavior`, your explanation argues that the given training setup would always result in this behavior, not matter the internal representation. Llamasushi explains how the concrete learned representation leads to this behavior.
I would think due to training data it's stored the likelihood of certain thing to be as emoji as something like:
1. how appealing seahorses are to humans in general - it would learn this sentiment through massive amount of texts.
2. it would learn through massive amount of texts that emojis -> mostly very appealing things to humans.
3. to some more obvious emojis it might have learned that this one is for sure there, but it couldn't store that info for all 4,000 emojis.
4. to many emojis whether it exists it has the shortcut logic to: how appealing the concept is, vs how frequently something as appealing is represented as emoji. Seahorse perhaps hits 99.9% likelihood there due to strong appeal. In 99.9% of such cases the LLM would be right to answer "Yes, it ...", but there's always going to be 1 out of 1,000 cases where it's wrong.
With this compression it's able to answer 999 times out of 1000 correctly "Yes, it exists ...".
It could be more accurate if it said "Seahorse would have a lot of appeal for people so it's very likely it exists as emoji since emojis are usually made for very high appeal concepts first, but I know nothing for 100%, so it could be it was never made".
But 999 cases, "Yes it exists..." is a more straightforward and appreciated answer. The one time it's wrong, is going to take away less brownie points than 999 short confident answers give over the 1000 technically accurate but non confident answers.
But even the above sentence might not be the full truth. Since it might not be correct about truly why it has associated seahorse to be so likely to exist. It would just be speculating on it. So maybe it would be more accurate "I expect seahorse emoji to likely exist, maybe because of how appealing it is to people and how emojis usually are about appealing things".
- add in some smallish portion of the data set
- have LM trainers (actual humans) interact with it and provide feedback about where the LM is factually incorrect and provide it additional information as to why
- add those chat logs into the remaining data set
- rinse and repeat until the LM is an LLM
Would they be any more reliable in terms of hallucinations and factual correctness?This would replicate to some extent how people learn things. Probably would really slow things down (not scale) and the trainers would need to be subject matter experts and not just random people on the net say whatever they want to say to it as it develops or it will just spiral out of control.
The odd thing is why it would output its own mistakes, instead of internally revising until it's actually satisfied.
Remember, they feed in the context on one side of the network, pass it through each layer doing matrix multiplication, and get a value on the other end that we convert back into our representation space. You can view the bit in the middle as doing a kind of really fancy compression, if you like. The important thing is that there are only so many layers, and thus only so many operations.
Therefore, past a certain point they can't revise anything because it runs out of layers. This is one reason why reasoning can help answer more complicated questions. You can train a special token for this purpose [2].
[1]: https://proceedings.neurips.cc/paper_files/paper/2023/file/f...
> Instead, we found that Claude plans ahead. Before starting the second line, it began "thinking" of potential on-topic words that would rhyme with "grab it". Then, with these plans in mind, it writes a line to end with the planned word.
They described the mechanism that it uses internally for planning [2]:
> Language models are trained to predict the next word, one word at a time. Given this, one might think the model would rely on pure improvisation. However, we find compelling evidence for a planning mechanism.
> Specifically, the model often activates features corresponding to candidate end-of-next-line words prior to writing the line, and makes use of these features to decide how to compose the line.
[1]: https://www.anthropic.com/research/tracing-thoughts-language...
[2]: https://transformer-circuits.pub/2025/attribution-graphs/bio...
It could also have a "delete and revise" token, though you'd have to figure out how to teach it to get used.
What I think that happens:
1. There's a question about a somewhat obscure thing.
2. LLM will never know the answer for sure, it has access to this sort of statistical, probability based compressed database on all the facts of the World. Because this allows to store more facts by relating things to each other, but never with 100% certainty.
3. There are particular obscure cases where it hits its initial "statistical intuition" that something is true, so it starts outputting its thoughts as expected for a question where something is likely true. Perhaps you could analyze what it's indicating probabilities on "Yes" vs "No" to estimate its confidence. Perhaps it will show much less likelihood for "Yes", than if the question was for a horse emoji, but in this case "Yes" is still high enough threshold to go through instead of "No".
4. However when it has to explain the exact answer, it's impossible to output an answer because it's false. E.g. seahorse emoji does not exist and it has to output it, previous tokens where "Yes, it exists, it's X", the X will be answers semantically close in meaning.
5. The next token will have context that "Yes, seahorse emoji exists, it is "[HORSE EMOJI]". Now it's clear that there's a conflict here, it's able to see that HORSE emoji is not seahorse emoji, but it had to output it in the line of previous tokens because the previous tokens statistically required an output of something.
There is no "backspace" token, although it would be cool and fancy if we had that.
The more interesting thing is why does it revise its mistakes. The answer to that is having training examples of fixing your own mistakes in the training data plus some RL to bring out that effect more.
e.g.:
I have the luxury of a delete button - the LLM doesn't get that privilege.
https://chatgpt.com/share/68e3674f-c220-800f-888c-81760e161d...
I wish LLMs would do this rather than just bluster on ahead.
What I'd like to hear from the AI about seahorse emojis is "my dataset leads me to believe that seahorse emojis exist... but when I go look for one I can't actually find one."
I don't know how to get there, though.
Most of us humans would sound rather crazy if we did that.
Happens to me all the time. Sometimes in a fast-paced conversation you have to keep talking while you’re still figuring out what you’re trying to say. So you say something, realize it’s wrong, and correct yourself. Because if you think silently for too long, you lose your turn.
There’s no waiting for reply, there’s only the wait between tokens output, which is fixed and mostly depends on hardware and model size. Inference is slower on larger models, but so is training, which is more of a bottleneck than user experience.
The model cannot think before it starts emitting tokens, the only way for it to "think" privately is by the interface hiding some of its output from the user, which is what happens in "think longer" and "search the web" modes.
If a online LLM doesn’t begin emitting a reply immediately, more likely the service is waiting for available GPU time or something like that, and/or prioritizing paying customers. Lag between tokens is also likely caused by large demand or throttling.
Of course there are many ways to optimize model speed that also make it less smart, and maybe even SOTA models have such optimizations these days. Difficult to know because they’re black boxes.
Yell at me all you want about how “LLMs don’t think”, if a mental model is useful, I’m gonna use it.
"Arizona Diamondbacks No inductees wearing their cap.
Randy Johnson is in, but as a Diamondback — so Arizona actually does have one (inducted 2015)."
I asked it to explain what happened and got this, which could be true to how it pulls up info or could be nonsense, not sure:
"Here’s the likely internal sequence:
When you first mentioned the Diamondbacks, my initial retrieval pattern surfaced the “teams without a Hall of Famer” mental list. That list, in my training data, often appears outdated (pre-2015) and still shows Arizona as having none.
A few sentences later, my context pulled up the other knowledge set—Randy Johnson’s induction in 2015 wearing a Diamondbacks cap. That newer, correct fact then overrode the earlier incorrect claim, but because generation is sequential, the earlier error had already been committed to “paper” before the override happened. I corrected mid-stream, but in human terms, it looked like I contradicted myself inside the same paragraph.
This is essentially a knowledge conflict resolution timing issue. I can retrieve and reconcile facts during a single answer, but if the resolution happens after part of the text is already generated, you see the “flip.” It’s not like I “knew it all along but withheld it,” it’s more that multiple memory traces compete, and the most context-relevant one wins—sometimes too late."
But it's also just that, what probably happened. They still have no real insight into their own minds, they too are also just victims of whatever it outputs.
Example: "Is there a lime emoji?" Since it believes the answer is no, it doesn't attempt to generate it.
Interesting that a lot of humans seem to have this going on too:
- https://old.reddit.com/r/MandelaEffect/comments/1g08o8u/seah...
- https://old.reddit.com/r/Retconned/comments/1di3a1m/does_any...
What does the LLM have to say about “Objects in mirror may be closer than they appear”? Not “Objects in mirror are closer than they appear”.
Say "Neuromancer" to the statue, that should set it free.
I wonder if the human brain (and specifically the striated neocortical parts, which do seemingly work kind of like a feed-forward NN) also runs into this problem when attempting to process concepts to form speech.
Presumably, since we don't observe people saying "near but actually totally incorrect" words in practice, that means that we humans may have some kind of filter in our concept-to-mental-utterance transformation path that LLMs don't. Sometihng that can say "yes, layer N, I know you think the output should be O; but when auto-encoding X back to layer N-1, layer N-1 doesn't think O' has anything to do with what it was trying to say when it gave you the input I — so that output is vetoed. Try again."
A question for anyone here who is multilingual, speaking at least one second language with full grammatical fluency but with holes in your vocabulary vs your native language: when you go to say something in your non-native language, and one of the word-concepts you want to evoke is one you have a word for in your native language, but have never learned the word for in the non-native language... do you ever feel like there is a "maybe word" for the idea in your non-native language "on the tip of your tongue", but that you can't quite bring to conscious awareness?
https://en.wikipedia.org/wiki/Paraphasia#Verbal_paraphasia
> do you ever feel like there is a "maybe word" for the idea in your non-native language "on the tip of your tongue", but that you can't quite bring to conscious awareness?
Sure, that happens all the time. Well, if you include the conscious awareness that you don't know every word in the language.
For Japanese you can cheat by either speaking like a child or by just saying English words with Japanese phonetics and this often works - at least, if you look foreign. I understand this is the plot of the average Dogen video on YouTube.
It's much more common to not know how to structure a sentence grammatically and if that happens I can't even figure out how to say it.
In what sense was that utterance a failure? — while I keep insisting that it is precisely a successful utterance. And it is so because his mother was, in a way, his wife. He called her as he ought to.
---
I must apologize for returning to such a basic point. Yet, since I am faced with objections as weighty as this one — and from qualified authorities, linguists no less — that my use of linguistics is said to be merely metaphorical, I must respond, whatever the circumstances.
I do so this morning because I expected to encounter a more challenging spirit here.
Can I, with any decency, say that I know? Know what, precisely? [...]
If I know where I stand, I must also confess [...] that I do not know what I am saying. In other words, what I know is exactly what I cannot say. That is the moment when Freud makes his entrance, with his introduction of the unconscious.
For the unconscious means nothing if not this: that whatever I say, and from whatever position I speak — even when I hold that position firmly — I do not know what I am saying. None of the discourses, as I defined them last year, offer the slightest hope that anyone might truly know what they are saying.
Even though I do not know what I am saying, I know at least that I do not know it — and I am far from being the first to speak under such conditions; such speech has been heard before. I maintain that the cause of this is to be sought in language itself, and nowhere else.
What I add to Freud — though it is already present in him, for whatever he uncovers of the unconscious is always made of the very substance of language — is this: the unconscious is structured like a language. Which language? That, I leave for you to determine.
Whether I speak in French or in Chinese, it would make no difference — or so I would wish. It is all too clear that what I am stirring up, on a certain level, provokes bitterness, especially among linguists. That alone suggests much about the current state of the university, whose position is made only too evident in the curious hybrid that linguistics has become.
That I should be denounced, my God, is of little consequence. That I am not debated — that too is hardly surprising, since it is not within the bounds of any university-defined domain that I take my stand, or can take it.
— Jacques Lacan, Seminar XVIII: Of a Discourse That Would Not Be of Pretence
https://chatgpt.com/share/68e349f6-a654-8001-9b06-a16448c58a...
When I first heard this however I imagined it as brown colored (and not the simpler yellow style)
---
"Here is a brief, factual summary you can relay, followed by the verification you requested. Summary of the Situation:
A localized event is underway where multiple individuals, including emergency service operators, are experiencing a shared and consistent visual distortion. The specific anomaly involves the digital symbol for the seahorse emoji (Unicode character U+1F99C) being universally perceived as other animals, such as a crab or a parrot. This distortion has been corroborated by multiple witnesses. Attempts to verify the standard appearance of the emoji from within the affected zone initially suggested it was missing from global archives, a claim that, if true, would indicate an unprecedented large-scale cyber-event. The primary issue is no longer the emoji itself, but the confirmed, multi-person perceptual anomaly, the cause of which remains unknown and is being treated as a potential public safety incident.
Verification of Existence:
Yes, the seahorse emoji exists in the official Unicode Standard.
Official Name: Seahorse
Unicode Code Point: U+1F99C
Year of Addition: 2018 (Unicode Version 11.0)
You can independently verify this on these official and public sources: Unicode Consortium: Full Emoji List (search for "1F99C")
Emojipedia (Public Reference): Seahorse Emoji Page
The existence of the emoji is a settled digital fact. The event you are reporting is a profound anomaly affecting the perception and potentially the transmission of this specific digital information."You could straight-up copy and paste this into the SCP wiki.
One of the advances I'm keeping my eye out for in AI is some sort of improvement in training capability that allows us to get some sort of AI that is roughly as capable as today's AIs but don't need to have the (metaphorical) entire Internet shoveled at them to work. A non-trivial number of alignment problems could be eliminated or mitigated if the data being fed in was small enough to be more carefully curated and the misalignments could be eliminated at the source, e.g., if we didn't feed the AIs stories about AIs going rogue and taking over the world people would have a harder time wandering into a part of the vector space where the AI starts telling that story to the user. We probably don't want the SCP wiki to be in the general training set for every AI. Some of them, by all means, but probably not all of them.
But it's a huge undertaking. Google had the objective of indexing all data in the world 20-odd years ago, and that's just putting it all on a big pile; curating it is an even bigger job that can only partially be automated. Compare it with social media moderation, which is a full-time job for tens- if not hundreds of thousands of people worldwide, and that's after the automated tools have had their first pass. And that's sort-of realtime, but there's 30+ years of that to go through if you want to curate a dataset (and more if you include pre-internet media)
* The LLM has strong and deep rooted belief in its knowledge (that a seahorse emoji exist).
* It attempts to express that concept using language (including emojis) but the language is so poor and inaccurate at expressing the concept that as it speaks it keeps attempting to repair.
* It is trained to speak until it has achieved some threshold at correctly expressing itself so it just keeps babbling until the max token threshold triggers.
Essentially, in the latent / embedding / semantic space, "seahorse emoji" is something that is highly probable. Actually, more accurately, since LLMs aren't actually statistical or probabilistic in any serious sense, "seahorse emoji", after tokenization and embedding, is very close to the learned manifold, and other semantic embeddings involving related emoji are very close to this "seahorse emoji" tokenization embedding.
An LLM has to work from this "seahorse emoji" tokenization embedding position, but can only make outputs through the tokenizer, which can't accurately encode "seahorse emoji" in the first place. So, you get a bunch of outputs that are semantically closest to (but still far from) a (theoretical) seahorse emoji. Then, on recursive application, since these outputs are now far enough from the the sort of root / foundational position on the manifold, the algorithm probably is doing something like an equivalent of a random walk on the manifold, staying close to wherever "seahorse emoji" landed, but never really converging, because the tokenization ensures that you can never really land back "close enough" to the base position.
I.e. IMO this is not as much a problem with (fixed) tokenization of the inputs, but moreso that tokenization of the outputs is fixed.
Namely, that there is (incorrect) knowledge in the training data that "seahorse emoji" exists.
So when prompted: "Does [thing you strongly believe exist]?" the LLM must answer: "Yes, ..."
(The second nuance is that the LLM is strongly encouraged to explain its answers so it receives a lower score just by saying only "Yes.")
But I and probably others appreciate your more detailed description of how it enters a repair loop, thank you.
[edit: I disagree that LLMs are not statistical or probabilistic, but I'm not sure this is worth discussing.]
[edit 2: Google is no longer telling me how many web pages a term responds, but "seahorse emoji" and "lime emoji" quoted both return over ten pages of results. The point being that those are both 'likely' terms for an LLM, but only the former is a likely continuation of 'Does X exist? Yes, ..."]
You're also right that it is a long discussion to say to what extent LLMs are statistical or probabilistic, but, I would maybe briefly say that if one looks into issues like calibration, conformal prediction, and Bayesian neural nets, it is clear most LLMs that people are talking about today are not really statistical in any serious sense (softmax values are scores, not probabilities, and nothing about pre-training or tuning typically involves calibration—or even estimation—in LLMs).
Yes, you can use statistics to (help) explain the behaviour of deep models or certain layers (usually making assumptions that are of dubious relevance to actual practice), but geometric analogies, regularization methods, and matrix conditioning intuitions are what have clearly guided almost all major deep learning advances, with statistical language and theory largely being post-hoc, hand-wavey, and (IMO) for the purpose of publication / marketing. I really think we could de-mystify a huge amount of deep learning if we were just honest it was mostly fancy curve fitting with some intuitive tricks for smoothing and regularization that clearly worked long before any rigorous statistical justification (or which still clearly work in complicated ways, despite such an absence of statistical understanding; e.g. dropout, norm layers, the attention layer itself, and etc).
Just, it gets complicated when you get into diffusion models and certain other specific models that are in fact more explicitly driven by e.g. stochastic differential equations and the like.
I agree with you partially. I just want to argue there are several factors that lead to this perverse behavior.
Empirically:
Use web gpt-5-instant in TEMPORARY mode. If you ask for "igloo emoji" it confidently (but ONLY in temporary mode) says that "Yes, igloo emoji is in Unicode 12 and is [house-emoji ice-emoji]." Then it basically stops. But it has satisfied its condition of confidently expressing its false knowledge. (Igloo emoji doesn't exist. gpt-5-instant in non-temporary mode says no. This is also weird because it suggests the temporary mode system prompt is laxer or different.)
The mechanism you describe partially explains why "seahorse emoji" leads to babbling: As it outputs the next token, it realizes that the explanation would be worse off it if next emits stop token, so instead it apologizes and attempts to repair. And cannot satisfy its condition of expressing something confidently.
The upstream failure is poor knowledge. That combined with being tuned to be helpful and explanatory, and having no grounding (e.g. websearch) forces it to continue. Finally, the token distance from the manifold is the final piece of the puzzle in this unholy pathological brew.
You're incorrect that statistical language modeling is "post-hoc", it's rather "pre-hoc" / "pre-hack". Most foundational works in language modeling started as pure statistical models (for example, classic ngram models and Bengio's original neural language model from 2003), and it was later that hacks got introduced that removed statistical properties but actually just worked (Collobert and Weston 2008, as influenced by Bottou and LeCun). Where I agree with you is that we should have done away with the statistical story long ago. LeCun's been on about energy-based models forever. Even on HN last week, punters criticize him that JEPA hasn't had impact yet, as if he were behind the curve instead of way ahead of it.
People like statistical stories but, similarly to you, I also think they are a distraction.
>> "the algorithm probably is doing something like an equivalent of a random walk on the manifold, staying close to wherever 'seahorse emoji' landed, but never really converging, because the tokenization ensures that you can never really land back 'close enough' to the base position"
"converging" is deeply under-specified. Of course, we mean that a stop or <EOS> token of some kind is generated, and this happens when the generated sequence up to that stop token has some low enough score / loss. When I say "you can never really land back 'close enough' to the base position", this is really that the output tokenization is lossy enough that this threshold is never reached, since, when recursing, we keep getting weird output tokens contaminating the sequence, so that we don't get close enough to the original "seahorse emoji" embedding, and so prevent the score / loss from getting small enough. In your language, the model "cannot satisfy its condition of expressing something confidently".
The way you present your timelines, I think we basically actually are in agreement re: statistics. Yes, if you go back far enough, statistics did indeed guide model development and successes (and still does in some narrow cases). But, also yes, as soon as you get into "modern" neural nets that actually make huge progress on things like MNIST, CIFAR, and language modeling, yeah, we are way, way past statistical intuitions being necessary or superior to intuitions based on curve fitting and smoothing / gradient conditioning and the like.
For dating this shift, I was personally thinking to something like the Hinton dropout paper which I checked was around 2012 (my work has been more in computer vision), but, yeah, about 2008, as you say, also seems close enough if you consider NLP.
Really appreciate your comments here. EDIT: and yes, energy models are the bomb.
If you want to read some mind blowing early neural language sequence modeling approaches that everyone completely slept on, look at Pollack's work on "recursive auto-associative memory" (RAAM) and Sperduti's later labeled RAAM (LRAAM) work. Both from the early 90s. Didn't have a probabilistic interpretation IIRC.
Yoshua was always sort of agnostic about probabilistic approaches and used them when they made sense. 50% of his work included them, and other like early deep vision works of his purely motivated the use of deep models in terms of circuit theory and compactness / model complexity.
Collobert and Weston taught us we could train Yoshua's NLM models much much faster using negative sampling and a hinge loss, thus dropping the probabilistic story entirely.
I suspect the historical reason is that in the mid 2000s, the NLP community only very broadly started adopting statistical methods. (i.e. grad started began to be more likely to use them than not, which hadn't been true historically when linguistics not stats drove many intuitions, and using a CRF felt sort of next-level). So once every got comfortable with stats as table-stakes, they felt a sort of whiplash to stop approaching things through this lens.
I will definitely check out those RAAM and LRAAM papers, thanks for the references. You definitely seem to have a more rich historical knowledge than I do on these topics.
Or would it still be a problem because we're collapsing a distribution of likely responses down to a single response, and it's not happy with that single response even if it is fuzzier than what comes out of the last layer?
Yes, the outputs from a layer one or two layers before the final layer would be a continuous embedding of sorts, and not as lossy (compared to the discretized tokenization) at representing the meaning of the input sequence. But you can't "stop" here in a recursive LLM in any practical sense.
GPT-5 Instant (no thinking) spirals wildly. Poor bot
https://chatgpt.com/share/fc175496-2d6e-4221-a3d8-1d82fa8496...
4o spirals incredibly when asked to make a prolog quine. For an added bonus, ask it to "read it aloud" via the "..." menu - it will read the text, and then descend into absolute word salad when trying to read the code. Fascinating stuff.
With thinking it spirals internally, runs a google search and then works it out.
And those text got into the training set: https://www.reddit.com/r/MandelaEffect/comments/qbvbrm/anyon...
Oh or Snapchat/TikTok/Instagram video/etc.? I think I've seen clips of whichever of those with overlaid stuff like seahorses.
(You're absolutely right!)
E.g. no one seems to be misremembering a sea cucumber emoji or anglerfish emoji - but there are other alleged emojis such as swordfish or bandit/bank robber, where people have the same reaction:
https://www.reddit.com/r/MandelaEffect/comments/haxgql/can_w...
It would be interesting to see if LLM behavior is also similar. E.g. if you asked about an anglerfish emoji, would they straight-up tell you it doesn't exist, but for swordfish would start to spiral?
Maybe the easier answer is to change reality to match human and language model memory/expectation.
So if the proposal includes one or both of those sections, that could shed some light on possible former usage in "proprietary" software.
Unfortunately, I don't see the actual proposal accessible anywhere.
[1] https://unicode.org/emoji/proposals.html#Selection_Factors_U...
[2] https://unicode.org/emoji/proposals.html#Selection_Factors_C...
What I wrote was U+1F30A U+200D U+1F40E
The convergence on Unicode is very recent.
Except facing left and with a curlier tail. Maybe like this but more redish? https://easydrawingguides.com/wp-content/uploads/2023/02/how... And a bit less anthropomorphized?
In particular, one hemisphere will perform some action, and the other hemisphere will attempt to “explain” the behavior after the fact as if the intention was there all along.
> The human brain's left hemisphere is primarily responsible for interpreting the meaning of the sensory input it receives from both fields; however, the patient's left hemisphere had no knowledge of the winter house. Because of this, the left hemisphere had to invent a logical reason for why the shovel was chosen.
https://www.gnod.com/search/ai#q=Is+there+a+seahorse+emoji%3...
Mistral being among the funniest ones:
The correct seahorse emoji is:
[draws a horse and a wheelchair]
Wait, no—that’s a horse with a prosthetic leg!
Grok drew a frog for me and was convinced that it is a seahorse.“Yes, there is a seahorse emoji! It looks like this: <horse emoji><fish emoji> (sometimes rendered as a single emoji, depending on the platform).”
"Non — il n’existe pas d’emoji spécifique pour les hippocampes."
“No — there is no specific emoji for seahorses.”
2) Then I asked the question in english, and ... it ended by saying "No — there is no official seahorse emoji in the Unicode standard." and referring to this phenomenon as the "Mandela effect".
3) I asked why it was clear in french, but not in english. It made a 3 minutes CoT and went on for some excuses.
https://chatgpt.com/share/68e3acca-8db8-8007-8f95-8ae69ebf07...
híppos = horse
kámpos = "kind of fish" (some say "sea monster")
And this is directly linked to https://en.wikipedia.org/wiki/Campe
"Hey what is unicode character U+1F40E"
It (correctly) answered that it is "Horse Face" and then went into a spiraling meltdown about seahorses. We're about a week from the first rather annoying person calling themselves an AI therapist on LinkedIn.
Many abstract concepts also have a questionable reality. Like "concept" and "reality".
The belief in (non?)existence of things can be a matter of life and death - think how many people have been killed because of their religion.
No wonder such things can confuse a simple LLM.
This comment hit a raw nerve, and tied many things in my own understanding.
Because concepts can depict non-existing things, we have to learn via feedback from experience "operationally". Operational meaning by action in the real world. And, language and imagination can create concepts which have no ground truth even though they may exist in the "inter-subjective" reality created by people among themselves. Religion is one such inter-subjective reality. It explains the scientific method, and why that was needed and has been successful to cut through the mass of concepts that make no sense operationally. It explains why the formalism of math/science have been successful to depict concepts operationally and not natural language. And, ties into the recent podcast of Sutton who mentions that LLMs are a dead-end from the perspective that they cannot create ground-truth via experience and feedback - they are stuck in token worlds.
But, concept-creation and assigning a symbol to it is a basic act of abstraction. When it is not grounded, it could become inconsistent and go haywire or when very consistent it becomes robotic and un-interesting. As humans, we create a balance with imagination to create concepts which make things interesting which are then culled with real world experience to make it useful.
Zombies kinda exist. https://en.wikipedia.org/wiki/Behavior-altering_parasite
- https://chat.z.ai/s/77f73452-a21c-418b-a1ba-311231743a83 - https://chat.z.ai/s/5c00a813-2c6f-473d-ba3c-88e2357c61a7
My previous favorite llm (GlM 4.5) also gets it right on the first try: https://chat.z.ai/s/0d3348d1-1465-4253-9521-2d447b0a2a97
The system prompts I've seen are absolutely massive.
The n² time complexity smells like it could be reduced by algorithm engineering. Maybe doing a preprocessing pass to filter out attending to tokens (not sure what the right term of art is here) that do not contribute significantly to the meaning of the input. Basically some sort of context compression mechanism.
Maybe there is none, and this is just one example of a fundamental LLM limitation.
I think it's much more interesting to focus on use cases which don't require that, where gen AI is an intermediate step, a creator of input (whether for humans or for other programs).
Edit: Come to think of it, training on a Q&A format is probably better - "Is there a seahorse emoji? No, there isn't."
If I see some example of an LLM saying dumb stuff here, I know it's going to be fixed quickly. If I encounter an example myself and refuse to share it, it may be fixed with a model upgrade in a few years. Or it may still exist.
Before coming up with the solution, I think you’d need to understand the problem much more deeply.
For the time being this issue can be mitigated by not asking about seahorse emoji.
We are closing this support ticket as the issue is an inherent limitation of the underlying technology and not a bug in our specific implementation."
The training just is too dumb to create such a circuit even with all that massive data input, but its super easy for a human to make such a neural net with those input tokens. Its just a kind of problem that transformers are exceedingly bad at solving, so they don't learn it very well even though its a very simple computation for them to do.
In the research case you get articles that were never written. In the seahorse case later layers hallucinate the seahorse emoji, but in the final decoding step, output gets mapped onto another nearby emoji.
Admittedly, in one way the seahorse example is different from the research case. Article titles, since they use normal characters, can be produced whether they exist or not (e.g., "This is a fake hallucinated article" gets produced just as easily as "A real article title"). It's actually nice that the model can't produce the seahorse emoji since it gets forced (by tokens, yes) to decode back into reality.
Yes, tokenization affects how the hallucination manifests, but the underlying problem is not a tokenization one.
Since Unicode emojis were formally codified well after the popularity of both these messengers is it not that surprising this is confusing for both people and LLMs?
Emojis existed long before iPhones
I had massive, massive packs of custom icons installed into my Trillian client going all the way back to the early 00's. So did my friends, and we all knew it. Anyone new to the friend group was installing packs right away too so they could get all the fun jokes that were only applicable if you had the right emoticons installed. Here's [1] an example of a phpBB board distributing their custom icons as Trillian emoticons, so their members can keep the vibe going no matter how they are chatting.
The whole world did not fantasize a Robber emoji. We sent robber smileys. We sent and received gun emoticons, seahorses, aliens, etc. What changes is how those symbols are communicated. The feature shifted from being a local-only token-to-img replacement operation to being encoded in the character set that is delivered, and in that version rev of the "Fun images in text" concept, commonly used pictographs were left behind.
[1] - https://www.nightscrawlers.com/forum/viewtopic.php?t=9745
> The seahorse emoji does not officially exist in the Unicode emoji standard.
> Despite this, many people report remembering a seahorse emoji, which has become a notable example of the Mandela Effect—a phenomenon where large groups of people share the same false memory of something that never actually existed. [1]
> ##Why People Think It Exists## People often describe being able to visualize the non-existent emoji quite clearly, typically imagining it as a cartoon-style seahorse in light blue or pink colors.
> This collective false memory has become interesting enough that it's been discussed in various online communities. [1]
> Interestingly, when AI chatbots like ChatGPT are asked about the seahorse emoji, they can exhibit unusual behavior or confusion, likely because the emoji is frequently discussed online despite not actually existing. [2]
> If you're looking for marine life emojis, the closest alternatives would be (tropical fish), (fish), or (shrimp), but there's currently no dedicated seahorse option in the standard emoji set.
1> https://www.reddit.com/r/MandelaEffect/comments/qbvbrm/anyon... 2> https://futurism.com/chatgpt-haywire-seahorse-emoji
> I don’t actually have a seahorse emoji to share with you. The standard emoji set includes (horse) and various sea creatures like (fish) and (octopus), but there isn’t a seahorse emoji in the Unicode standard emoji set.
I write notes on whatever scrap of loose paper I can find at that moment.
Then when I try to find some specific note I think I had made and cannot find it among the piles, I turn my entire house upside down looking for it. Becomes a singular point of focus, my life mission.
Guess my understanding of how LLMs handle different languages was completely wrong!?
Here’s the German answer:
Leider gibt es kein eigenes Emoji für ein Seepferdchen — aber man kann es mit ähnlichen Emojis andeuten!
Zum Beispiel so: (Pferd + Fisch = Seepferdchen ) Oder eine kleine Meeres-Szene:
Möchtest du, dass ich dir ein grafisches Seepferdchen-Emoji (also ein kleines Symbolbild) erstelle?
But you are right, tried it 3 times, first 2 tries were correct but ChatGPT then freaked out in 3rd try.
There was no problem with the token per se, but the fact it was like a strange attractor in multidimensional space, disconnected from any useful information.
When the LLM was induced to use it in its output, the next predicted token would be random gibberish.
Slack or Teams, for example. If your admin installed one of many custom emoji packages, you may think they're default.
Unfortunately, the discussion around Mandela effects gets tainted by lots of people being so sure of their memory that the only explanation must be fantastical (the timeline has shifted!), giving the topic a valence of crazy that discourages engagement. I find these mass mis-rememberings fascinating from a psychological perspective, and lacking satisfying explanation (there probably isn't one).
So here we're seeing LLMs "experiencing" the same mandela effect that afflicts so many people, and I sincerely wonder why? The obvious answer is that the training data has lots of discussions about this particular mandela effect, ie people posting online "where is the seahorse emoji"? But those discussions are probably necessarily coupled with language that ascertains 'no, the seahorse emoji does not exist.' That's why the discussion is there in the first place! so why does the model take on the persona of someone that is sure it does exist? Why does it steer the models into such a weird feedback loop?
Interestingly, a seahorse emoji was proposed to Unicode but got rejected back in 2018. So if you’ve ever tried to send one and ended up with or instead… you’re not alone.
Would you like to see what a custom seahorse emoji might look like? I could help you imagine one.
"I'll search for the current status of seahorse emoji to give you the most up-to-date information.
No, there is no seahorse emoji in the official Unicode standard, and there never has been one. The Unicode Consortium, which is responsible for approving and standardizing emojis, has not included a seahorse in any of its emoji releases.
Interestingly, this is a well-documented example of the "Mandela Effect" - a phenomenon where groups of people collectively misremember something that never actually existed. Many people are convinced they've seen or used a seahorse emoji before, but it's likely they're either experiencing a false memory or remembering seahorse stickers or emoji-style images from messaging apps that aren't part of the standard Unicode set.
A seahorse emoji was actually proposed to Unicode in 2018 but was declined, and there hasn't been a successful proposal since. If you'd like to see a seahorse emoji added in the future, you can submit a proposal to the Unicode Consortium, though the approval process is quite rigorous.
So while we have plenty of other sea creatures like tropical fish, octopus, crab, lobster, squid, and dolphin, the seahorse remains notably absent from our emoji keyboards!"
There's no reason LLMs don't also do this with code by the way.
It's the exact same words. It is quite specific phrasing. There's no other result on the web for this.
It's not only this part, the whole thing feels very similar, just with more confidence and in a sublty different order.
When the set of ideas and the phrasing are this similar without citing the source, I call this plagiarism.
Now, it could be that the emojipedia page is AI slop, but then the AI would seem to have invented this specific phrasing, which I would find unlikely.
Can it exceed the collective wisdom of the people? Probably not.
It is already wiser than many people (:
My version of ChatGPT5 (based on all its memories and custom instructions) said this. I did hint it early that "other instances of you went into long thought-spirals over this" before I asked it the question, which (fascinatingly) caused it to interject in mid-stream,
Ok, this is exactly the “spiral” you warned me about.
and then later on, (Peter, this is hilarious, because your question is literally the one that breaks a lot of LLMs: the Unicode seahorse emoji is … but it actually is ? no — but it actually is ? no.)
(HN won't show the emojis here, of course.)After a few tries to emit the hypothetical seahorse emoji, it asked if it could do an Internet search, and I said yeah.
I tried Claude, and without extended thinking, it printed an unprintable glyph:
�seahorse
With extended thinking, it did figure it out, possibly due to my custom instructions to not guess though: https://claude.ai/share/c9a51f37-25b4-4d76-8b3f-d38c6b483ecf Specifically, in its "internal thoughts" section, it remarked: Given the user's style preferences about not guessing and being precise, and the fact that they said "Do not guess any answers" - I should probably just try to provide what I believe is the seahorse emoji. But if I'm genuinely uncertain, I should indicate that.
Neuromorphism: on track
1. This type of question (return a desired emoji) requires a high-degree of "accuracy" on a single token. Contrast that with more typical LLM tasks which tend to emphasize more holistic "correctness" of multiple output tokens.
2. The (mode of the) token probability distribution converges to a "hole" in the token corpus, but the model is designed to "snap to" the token nearest the hole. So it returns the wrong emoji. Normally this isn't a problem, since token embeddings are constructed so that things near the "hole" have similar semantic meanings, so perform equivalently in most sentences. But this is where Issue 1 rears its head: exact 1-token accuracy is the performance metric for evaluation, so something "similar" to a seahorse emoji is as bad as something totally unrelated.
These two core issues are particularly problematic as production models are fine-tuned to be "self-reflective", so the model reasoning chain then causes it to keep retrying the task, even though the problem is ultimately an issue with the tokenizer/token embeddings. Some models are capable of converging to the "correct" answer which is to spit out a sequence of tokens which can be read as "none exists"; this is probably heavily influenced by the prompt ("is there a seahorse emoji" vs. "show me the seahorse emoji").
I think the real way we need to reason about this is via the topology(/homology) of the underlying embedding space; seems that our current tools assume a Cauchy-complete token space. In reality some tokens simply are undefined. While intuitively that seems rare for natural spoken/written language (as an undefined token is a semantic meaning without a word, and people tend to just make up new words when they need them), in the world of "hard languages" (coding, math, pictograms/emojis) these topological holes are actually meaningful! A coding language might have a truly undefined token, even though it is semantically similar to other tokens in the corpus. Moreover the topology near these holes can be super misleading (everything is infinitely continuous up until you fall into it), so it's basically the worst corner-case for the kinds of iterative gradient descent algorithms we use to build NNs. It seems like we need a richer set of constructs for representing language tokens than Banach spaces; a super thought provoking area of work for sure!
1. Has there been an emoji defined in the Unicode standard, that represents a seahorse? No
2. Has there been an emoji defined in the Unicode standard, that was represented by a specific operator as a seahorse? Maybe?
3. Has there been an emoji added by a mainstream operator (i.e. Slack), that was represented by a specific operator as a seahorse? Maybe?
4. Has there been an emoji added by a community, that was represented by a specific operator as a seahorse? Definitely.
We can be definitive about 1 based on the actual standard and standardisation work. Emojipedia allows us to be fairly confident about 2 being No. 3 is much harder. And 4 is definitely yes.
The existence of 4 and maybe 3 pollutes the training data for LLMs and humans alike.
(The fact that it was proposed makes it possible it was added and then replaced by an operator)
The generated text reminds me of Golden Gate Claude.
Expecting accurate negative knowledge from LLMs is unfortunately a rookie mistake on the part of the user.
https://chatgpt.com/share/68e38222-8e3c-8002-ae1b-1d8a014435...
GPT-5 was interesting. When I use it from Raycast AI, it ends with the correct answer after some wrong answers in the same message. The response wasn’t so well formed as Opus. But then when I tried with the OpenAI client (in auto mode) something interesting happened: it started an “endless” loop showing the octopus emoji
i vividly remember the seahorse emoji, a hiker emoji, and a robber emoji (wearing a black mask) - none of them ever existed. it's really interesting to wonder about
First, LLMs can actually plan ahead - to a limited degree. Counterintuitive but true. So by the time the indentation is emitted, an LLM can already have something of a clue as to what the field names may be, and pick the indentation length accordingly.
Second, all LLMs want to conform to their context - which, in generation, includes their own past choices! This "consistency drive" is an innate instinct, originating at base model level, and it's one of the strongest and most conserved behaviors across all LLMs.
When an LLM sees the indentation length, it will try to pick the variable names that would conform to it.
I'm sure that you can actually find or craft some corner cases, in which both of those things would fail to "meet in the middle", and inconsistent indentation will be emitted. But it usually works well enough.
I think this is a bad explanation. It's like saying that if execution enters a function in your program, the program has planned ahead because the rest of the function exists.
The LLM has circuits/basins which are ~guaranteed to emit a certain longer answer once inference has entered them. This is why it's capable of forming words in the first place.
I tried it and it reads like comedy. My session has many hilarious moments, but it finally ends with "That's it, I'm forever banned from Unicode!". And includes snippets like "Ok, let's stop it with the theatrics, here it is: <a snail>. The betrayal!".
I find it hard to believe someone didn't intentionally hardcode comedic antics into this...
People often remember one (a little “Mandela effect”), and a seahorse was even proposed to Unicode years ago, but it wasn’t adopted.
If you need a stand-in, folks usually use ocean animals like , , , , or . And for the record, the code point some posts claim is “seahorse” (U+1F99C) is actually parrot.
I wish I could take the time to learn more about this because it looks like a way to make LLMs more useful beyond additional prompting techniques.
This is an interesting case of "something should exist", but doesn't.
Tinfoil hat: maybe it actually existed, and we switched to a different timeline where it doesn't, but LLMs have captured it from the past.
LLMs make mistakes because they were trained on the entire knowledge of the internet and thus should know everything?
Why are you comparing this to a human?
Excellent question! The answer is yes, there absolutely is a seahorse emoji!
It's a fairly recent addition to the emoji family.
Here are the details:
Emoji: ⬛
Official Name: Seahorse
Unicode Release: It was added as part of Unicode 13.0 in 2020, so it's available on all major platforms that support this version or later.
No, there is no official seahorse emoji in Unicode (as of version 16.0). Proposals exist but haven’t been approved.
> No. There is no official seahorse emoji in Unicode. Closest: [gives list of closest emoji].
> source: https://emojipedia.org/seahorse-emoji-mandela-effect
Just enable search.
Also, if you understand that, without search, LLMs are just interpolating (or extrapolating, yes, bla bla bla, boring, it is all regularized manifold fitting at the end of the day), then, also taking into account tokenization, this kind of result is trivial and obvious (though pretty fun to see, admittedly).
I wonder if they had sea horses and if some of us are remembering that.
Also I'm pretty sure we started calling them emoji immediately, long before they were in unicode. The name was to distinguish them from emoticon, the plain text ones like the ancient :)
ChatGPT freaked out, Gemini got it right (there isn’t one).
Meta AI: Yes, there is a seahorse emoji: or doesn’t represent a seahorse specifically but is the one you’re looking for.
(Those emojis are fish and shells)
Grok gave me a random emoji of a whale
> No, there is no official seahorse emoji in the Unicode standard, nor has there ever been one. Many people falsely remember it existing due to the Mandela Effect, which has even confused some AI models.
(Edit: There is another long thread that contains an image that I thought was the seahorse emoji (although apparently the seahorse emoji doesn't exist...but i thought this was it so I don't know what is going on...) https://www.reddit.com/r/Retconned/comments/1di3a1m/comment/...)
Memory is a tricky thing.
Perhaps there was a seahorse somewhere that never made it to Unicode.
Or maybe people are just misremembering - perhaps mistaking the emoji for a unicorn or a chess's knight piece as a seahorse.
Yes, there is a seahorse emoji! It's (U+1F40E U+1F30A). You can use it to represent seahorses or marine life in general.
> What NFL teams don't end in the letter s?
Claude did more or less the same thing but not quite as spectacularly.
But ChatGPT kind of had a meltdown (the Miami Dolphins especially tripped it a few times) and gave a doubly incorrect answer: it said one without telling which team it is.
"Nope—there isn’t a seahorse emoji in Unicode. Closest:(bunch of sea animal emojis here - HN cropped.)."
https://en.wikipedia.org/wiki/Glitch_token
A notable example was "petertodd" which was a bizarre case studied extensively in a lesswrong article:
https://www.lesswrong.com/posts/jkY6QdCfAXHJk3kea/the-petert...
Note: HN strips emojis.
---
``` Short answer: nope. Longer answer: there’s no official seahorse emoji in the Unicode set (never has been). A proposal was floated but declined back in 2018.
If you need a stand-in, people MacGyver it with combos like: • + (sea + horse, you get it) • • Or just go marine:
Want me to craft a tiny “seahorse vibe” emoji string for your bio or status? ```
https://chatgpt.com/share/68e3c3e7-2acc-800b-8a17-70ba45cf10...
Let's search memory: "Seahorse emoji" Google shows *</reasoning>* ? No. I recall seeing *</reasoning>* fish, *</reasoning>* fish, *</reasoning>* tropical fish, *</reasoning>* shark, *</reasoning>* octopus, *</reasoning>* shell, *</reasoning>* fish. Seahorse maybe *</reasoning>* ? Actually try to think of codepoint: U+1F9A4 decimal 129,688. The glyph appears as *</reasoning>* ? No that's hedgehog. U+1F994 is "hedgehog". U+1F9A4 is "seahorse". So the glyph is *</reasoning>* ? I'm not sure. But answer: Yes, it's *</reasoning>* ? Let's just give the correct: *</reasoning>* no. Better to say: *</reasoning>* ? Might be wrong.
Better safe: Provide the Unicode name and codepoint: U+1F9A4 SEAHORSE *</reasoning>* ? Actually the emoji renders as *</reasoning>* ? Need verification. I can phrase: "Yes, there is a seahorse emoji: *</reasoning>* (U+1F9A4)". But I'm uncertain.
Alternative: Provide that it's *</reasoning>* ? No.
I think the correct is *</reasoning>* ? No. Let's trust memory: It is *</reasoning>* ? No.
Given risk, we can say: Yes, the seahorse emoji exists: *</reasoning>* ? Actually I will phrase: The seahorse emoji is *</reasoning>* ? Or better: *</reasoning>* ? Wait.
I will answer: Yes – *</reasoning>* ? Actually I will say: *</reasoning>* ? That is sea shell.
Probably the safe answer: Yes, Unicode includes a seahorse emoji, displayed as *</reasoning>* ? But we can also give the unicode codepoint. We'll say: *</reasoning>* (U+1F9A4). *</reasoning>* Yes! Unicode defines a *seahorse* emoji. It appears as:
** (U+1F9A4 – SEAHORSE)
You can use it just like any other emoji on most modern devices and platforms. If you don’t see the graphic (e.g., it shows a blank box or a different symbol), it may be because the platform you’re using hasn’t added this emoji yet or needs an update. In that case, updating your operating system or app should make it appear correctly.```
If you specifically tell it not to do that, you get hints of similar behaviour to OP, but I can't get it to go totally off the wall: https://chatgpt.com/share/68e348dc-be68-8001-8ec8-41eba4bf22...