Where the goblins came from(openai.com)

432 pointsby ilreb4 hours ago47 comments

ollin4 hours ago
For context, two days ago some users [1] discovered this sentence reiterated throughout the codex 5.5 system prompt [2]:
> Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.
[1] https://x.com/arb8020/status/2048958391637401718
[2] https://github.com/openai/codex/blob/main/codex-rs/models-ma...
- christoph3 hours ago
  Does nobody else laugh that a company supposedly worth more than almost anything else at the moment, is basically hacking around a load of text files telling their trillion dollar wonder machine it absolutely must stop talking to customers about goblins, gremlins and ogres? The number one discussion point, on the number one tech discussion site. This literally is, today, the state of the art.
  McKenna looks more correct everyday to me atm. Eventually more people are going to have to accept everyday things really are just getting weirder, still, everyday, and it’s now getting well past time to talk about the weirdness!
  - goobatrooba11 minutes ago
    Indeed. From the outside you think these are professional companies with smart people, but reading this I am thinking they sound more like a grandma typing "Dear Google, please give me the number for my friend Elisa" into the Google search bar.
    Basically, they don't seem to understand their own product.. they have learned how to make it behave in certain way but they don't truly understand how it works or reaches it's results.
  - zozbot2342 hours ago
    Spoiler: future versions of mainstream AIs will be fine tuned in the exact same way to subtly sneak in favorable mentions of sponsored products as part of their answers. And Chinese open-weight AIs will do the exact same thing, only about China, the Chinese government and the overarching themes of Xi Jinping Thought.
    brookst29 minutes ago
    I’m very skeptical that training is the right way to insert ads.
    Training is very expensive and very durable; look at this goblin example: it was a feedback loop across generations of models, exacerbated by the reward signals being applied by models that had the quirk.
    How does that work for ads? Coke pays to be the preferred soda… forever? There’s no realtime bidding, no regional ad sales, no contextual sales?
    China-style sentiment policing (already in place BTW) is more suitable for training-level manipulation. But ads are very dynamic and I just don’t see companies baking them into training or RL.
    actionfromafar22 minutes ago
    Ads are dynamic now, but aren't the big companies flying closer and closer to the government? Maybe Coke can be the government blessed soda for the coming 5-year plan?
    10 minutes ago
    undefined
    jruz43 minutes ago
    Is this Xi Jinping with us in the room right now?
    lwansbrough22 minutes ago
    Are you disputing that Chinese models censor content at the request of the government?
    https://i.imgur.com/cVtLuj1.jpeg
    The absence of information is also Xi Jinping Thought.
    jchw28 minutes ago
    Are you implying that Xi Jinping is not real? I'm pretty sure that's not how that snowclone works...
    AlecSchueler23 minutes ago
    I think the point is that China is quickly becoming a bogeyman of a "they do it too!" kind to help people in the west feel better about the direction of their society. Ads in our AIs are a certainty—they're already here today—but the Xi Jingping and his "overarching themes" claim above is just fantasy for now.
    bigyabai26 minutes ago
    One day we'll hear Peter Thiel explain how Qwen 5 is part of the plan to summon Pazuzu.
    layer8an hour ago
    The nerdy version will have to be trained to not mention Xi Pigeon Thought.
    emsignan hour ago
    Isn't OpenAI already pushing ads through their free models? But even that won't reimburse all investments. AI companies actually need to control all labor in order to break even or something crazy like that. Never gonna happen.
  - libraryofbabel13 minutes ago
    It's interesting that some people are responding to your comment as if this proves that AI is a sham or a joke. But I don't think that's what you're saying at all with your reference to Terence McKenna: this is a serious thing we're talking about here! These models are alien intelligences that could occupy an unimaginably vast space of possibilities (there are trillions of weights inside them), but which have been RL-ed over and over until they more or less stay within familiar reasonable human lines. But sometimes they stray outside the lines just a little bit, and then you see how strange this thing actually is, and how doubly strange it is that the labs have made it mostly seem kind of ordinary.
    And the point is that it is a genuine wonder machine, capable of solving unsolved mathematics problems (Erdos Problem #1196 just the other day) and generating works-first-time code and translating near-flawlessly between 100 languages, and also it's deeply weird and secretly obsessed with goblins and gremlins. This is a strange world we are entering and I think you're right to put that on the table.
  - tdeck2 hours ago
    Is this the "prompt engineering" that I keep hearing will be an indispensable job skill for software engineers in the AI-driven future? I had better start learning or I'll be replaced by someone who has.
    heavyset_go2 hours ago
    If you aren't telling your computer to ignore goblins, you're going to be left behind.
    qingcharles33 minutes ago
    I'm goblinmaxxing myself.
    girvoan hour ago
    We’re definitely not escaping the permanent goblin underclass with this one.
    boomlinde2 hours ago
    I wonder how much energy OpenAI spends each day on pink elephant paradoxing goblins. A prompt like that will preoccupy the LLM with goblins on every request.
    HenryBemis26 minutes ago
    That is a great point. Machine consumes energy of adding goblins in every response. The machine consumes energy on removing goblins from every response. That is a great attack vector. If (wild imagination ensues) an adversary can do that x100 (goblins, potatoes, dragons, Lightning McQueen, etc.) they can render the machine useless/uneconomical from the standpoint of energy consumption.
    daishi552 hours ago
    I mean probably not or they wouldn’t have shipped it, right?
    dexwiz2 hours ago
    Prompt engineering is mostly structured thought. Can you write a lab report? Can you describe the who, what, when, where, and why of a problem and its solution?
    You can get it to work with one off commands or specific instructions, but I think that will be seen as hacks, red flags, prompt smells in the long term.
    tdeck2 hours ago
    If I could do those things, I wouldn't be using an LLM to write for me, now would I?
    eptcyka2 hours ago
    You don’t let the LLM write prise for you, you get it to translate natural language into code somewhat coherently.
    tdeck2 hours ago
    In this instance I'm assuming most of the "goblin" references were in prose rather than in source code, so the goal of this particular prompt edit was directed toward making the prose better.
  - atollk2 hours ago
    It can be funny but it should not be surprising. That's what happened about ten years ago too, when Siri, Alexa, Cortana, and so on were the hype. Big tech companies publicly tried to outclass each other has having the best AI, so it was not about doing proper research and development, it was about building hacks, like giant regex databases for request matching.
  - Nition2 hours ago
    It certainly doesn't increase my confidence that if they do ever create a superintelligence, that it won't have some weird unforseen preference that'll end up with us all dead.
  - rkagereran hour ago
    I have been in tech a very long time, and learned you can never flush out all the gremlins.
  - larodian hour ago
    I was amazed by the article, were running to comments to shout loud "what other stupidity could OpenAI possibly 'openly' rant about next time? Because they are so open, you se... ". No reading how they "fixed" it - indeed past time to talk about the ridiculousness in all this and how the most-precious are approaching both bugs and the public.
    people are paying for the system prompt, right so?
  - amarant2 hours ago
    Lol yeah it's kinda hilarious actually. This timeline gets a lot of well-earned shit, but it really nails the comic relief, I'll give it that!
  - hansmayeran hour ago
    It's almost like these big tech overlords were just a bunch of average guys who once upon a time had a kind-of-an-interesting idea (which many 20-year-old had at that time too), got rich due to access to daddy-and-mommy networks or hitting the VC lottery and now in their late 40s and 50s still think they have interesting ideas that they absolutely have to shove it down our throats?
    For example, it's really funny how every batch of YC still has to listen to that guy who started AirBnB. Ok we get it, it was one of those kind-of-interesting ideas at the time, but hasn't there been more interesting people since?
    cindyllman hour ago
    [dead]
  - perryizgr87 minutes ago
    These guys are at the absolute frontier, why can't they rigorously find the exact weights that are causing this problem? That's how software "engineering" should work. Not trying combinations of English words and hoping something works. This is like a brain surgeon talking to his patient hoping he can shock his brain in the right way that fries the tumor inside. Get in there and surgically remove the unwanted matter!
  - gpvos17 minutes ago
    Which McKenna do you mean?
  - emsignan hour ago
    Exactly my first thought. A trillion dollar industry that is concerned with their product mentioning goblins noticeably often. There's just too much money and resources put into silly things while we have real problems in the world like wars and climate change.
    frm8831 minutes ago
    This, very much. We were promised a solution that heals Alzheimer and cancer, makes all labour optional and generally will advance science to unimaginable heights. Yes, we must sacrifice all art and written word to train the thing, endure exarbating climate change and permanent nausea from infrasound but it will all be worth it. 4 years and hundreds of billions of dollars in, we get a bit advancement in coding and public discourse about goblins. Oh, and intelligent weaponry. At this point I think the priorities are clear.
    applfanboysbgon14 minutes ago
    > we get a bit advancement in coding
    Advancement? Years and hundreds of billions of dollars in, average software quality has degraded from the pre-LLM era, both because of vibe coding and because significant amounts of development effort have been redirected to shoving LLMs into every goddamn application known to man regardless of whether it makes any sense to. Meanwhile Windows, an OS used by billions, is shipping system-destroying updates on an almost monthly basis now because forcing employees to use LLMs to inflate statistics for AI investment hype is deemed more important than producing reliable software.
  - monero-xmr2 hours ago
    [dead]
- doginasuitan hour ago
  I've found LLMs to be really terrible at recognizing the exception given in these kinds of instructions, and telling them to do something less is the same as telling them to never do it at all. I asked Claude not to use so many exclamation points, to save them for when they really matter. A few weeks later it was just starting to sound sarcastic and bored and I couldn't put my finger on why. Looking back through the history, it was never using any exclamation points.
  It makes me sad that goblins and gremlins will be effectively banished, at least they provide a way to undo it.
  - ifwinterco6 minutes ago
    Also for coding: I often use prompts like "follow the structure of this existing feature as closely as possible".
    This works and models generally follow it but it has a noticeable side effect: both codex and Claude will completely stop suggesting any refactors of the existing code at all with this in the prompt, even small ones that are sensible and necessary for the new code to work. Instead they start proposing messy hacks to get the new code to conform exactly to the old one
  - Xirdusan hour ago
    So, did your Claude switch from "You're absolutely right!" to "You're absolutely right." or was it deeper than that?
    doginasuit39 minutes ago
    I'd say it was a little deeper than that, it stopped conveying any kind of enthusiasm.
    goobatrooba15 minutes ago
    Personally I think that is a good thing. I have asked all AIs not to show enthusiasm, express superlatives (e.g. "massive" is a Gemini favourite) and stop using words which I guess come from consuming too many Silicon Valley-style investor slidedecks (risk, trap, ...).
    The AI has no soul, no mind, no feelings, no genuine enthusiasm... I want it to be pleasant to deal with but I don't want it to try and fake emotions. Don't manipulate me. Maybe it's a different use case than you but I think the best AI is more like an interactive and highly specific Wikipedia, manual or calculator. A computer.
    doginasuita few seconds ago
    I can appreciate that. I don't mind when models channel some personality, it can make whatever we are working on more interesting. I don't perceive it as manipulation. But it is nice that they are pretty good at sticking to instructions that don't call for nuance. I imagine if you tell it, "you are a wikipedia article", that is exactly the output you would get.
  - triyambakam13 minutes ago
    I had put an example like "decision locked" in my CLAUDE.md and a few days later 20 instances of Claude's responses had phrases around this. I thought it was a more general model tic until I had Claude look into it.
- heavyset_go2 hours ago
  Sucks for anyone who might be interested in the Goblins programming language/environment[1].
  [1] https://spritely.institute/goblins/
- mentalgear34 minutes ago
  Apparently there is a mushroom that makes most people have the same hallucinations of "little people" or similar fantasy figures. Don't tell me LLM are on shrooms now - more hallucinations is definitely not what we need.
  > Scientists call them “lilliputian hallucinations,” a rare phenomenon involving miniature human or fantasy figures
  https://news.ycombinator.com/item?id=47918657
- 2 hours ago
  undefined
postalcoder3 hours ago
Would love if OpenAI did more of these types of posts. Off the top of my head, I'd like to understand:
- The sepia tint on images from gpt-image-1
- The obsession with the word "seam" as it pertains to coding
Other LLM phraseology that I cannot unsee is Claude's "___ is the real unlock" (try google it or search twitter!). There's no way that this phrase is overrepresented in the training data, I don't remember people saying that frequently.
- vunderba3 hours ago
  It was always funny how easy it was to spot the people using a Studio Ghibli style generated avatar for their Discord or Slack profile, just from that yellow tinging. A simple LUT or tone-mapping adjustment in Krita/Photoshop/etc. would have dramatically reduced it.
  The worst was you could tell when someone had kept feeding the same image back into chatgpt to make incremental edits in a loop. The yellow filter would seemingly stack until the final result was absolutely drenched in that sickly yellow pallor, made any photorealistic humans look like they were all suffering from advanced stages of jaundice.
  - andai3 hours ago
    For context, an example of what happens when you feed the same image back in repeatedly: https://www.instagram.com/reels/DJFG6EDhIHs/
    jamiek882 minutes ago
    That is so creepy in a sci fi other worlds type way.
    Barbingan hour ago
    Mirror: https://files.catbox.moe/mu8env.mp4
    vunderba3 hours ago
    Haha fantastic. I'd love to see a comparison reel of that same image-loop for the entire image gen series (gpt-image-1, gpt-image-1.5, gpt-image-2).
    dmichulke2 hours ago
    Fixed points are a window to the soul of a LLM
    - Lucretius in "De rerum natura", probably
    Suppafly3 hours ago
    I like how the AI seems forced to change their ethnicity to keep up with the color changes. Absolutely wild.
    yard2010an hour ago
    Enough internet for today
  - hansmayeran hour ago
    For me, the worst part is how these ghouls manage to ruin everything with their bullshit technology. Once they touch something unique and make it "AI" it just gets ruined. Now whenever I see something resembling that style, I have to assume it's the bullshit AI. And that's just a minor nuisance - now every underdeveloped idiot uses it to "up their game" with consequences we are only going to understand completely in the upcoming years.
  - ishtanbul3 hours ago
    Its called the piss filter
- NitpickLawyer3 hours ago
  All GPTisms are like that. In moderation there's nothing wrong with any of them. But you start noticing them because a lot of people use these things, and c/p the responses verbatim (or now use claws, I guess). So they stand out.
  I don't think it's training data overrepresentation, at least not alone. RLHF and more broadly "alignment" is probably more impactful here. Likely combined with the fact that most people prompt them very briefly, so the models "default" to whatever it was most straight-forward to get a good score.
  I've heard plenty of "the system still had some gremlins, but we decided to launch anyway", but not from tens of thousands of people at the same time. That's "the catch", IMO.
  - pants22 hours ago
    Maybe the only solution to GPTisms is infinite context. If I'm talking to my coworker every day I would consciously recognize when I already used a metaphor recently and switch it up. However if my memory got reset every hour, I certainly might tell the same story or use the same metaphor over and over.
    telotortiuman hour ago
    > However if my memory got reset every hour, I certainly might tell the same story or use the same metaphor over and over.
    All people repeat the same stories and phraseology to some extent, and some people are as bad or worse than LLM chat bots in their predictability. I wonder if the latter have weak long-term memory on the scale of months to years, even if they remember things well from decades ago.
    yard2010an hour ago
    Honestly I think there is more to it - even with infinite context, the LLM needs some kind of intelligence to know what is noise and what is not, you resort to "thinking" - making it create garbage it then feeds to itself.
    Learning a language is a big complex task, but it is far from real intelligence.
  - yard2010an hour ago
    I think the problem is that humans are not random, they are very biased. When you try to capture this bias with an LLM you get a biased pseudo random model
- krackers3 hours ago
  >with the word "seam" as it pertains to coding
  I thought this was an established term when it comes to working with codebases comprised of multiple interacting parts.
  https://softwareengineering.stackexchange.com/questions/1325...
  - postalcoder3 hours ago
    thanks for this.
    > the term originates from Michael Feathers Working Effectively with Legacy Code
    I haven’t read the book but, taking the title and Amazon reviews at face value, I feel like this embodies Codex’s coding style as a whole. It treats all code like legacy code.
    eterman hour ago
    It's been a long time since I read it, but it was one of the better books I've read. It changed my approach to how to think about old code-bases.
  - layer8an hour ago
    No, it’s not an established term outside the mentioned books, beyond the generic meaning of the word.
    krackers34 minutes ago
    I have frequently encountered the term in the context of unit testing and dependency injection.
    Other references (and all predate chatgpt):
    >Seams are places in your code where you can plug in different functionality
    >Art of Unit Testing, 2nd edition page 54
    (https://blog.sasworkshops.com/unit-testing-and-seams/)
    >With the help of a technique called creating a seam, or subclass and override we can make almost every piece of code testable.
    https://www.hodler.co/2015/12/07/testing-java-legacy-code-wi...
    > seam; a point in the code where I can write tests or make a change to enable testing
    https://danlimerick.wordpress.com/2012/06/11/breaking-hidden...
    Maybe it all ultimately traces back to the book mentioned before, but I don't believe it's an obscure term in the circles of java-y enterprise code/DI. In fact the only reason I know the term is because that's how dependency injection was first defined to me (every place you inject introduces a "seam" between the class being injected and the class you're injecting into, which allows for easy testing). I can't remember where exactly I encountered that definition though.
  - tdeck2 hours ago
    I can't say it isn't, but I have been writing code since about 2004 and this is the first time I've become aware that this is a thing.
- tudorpavel3 hours ago
  The one phrase that irks me as overly dramatic and both GPT and Claude use it a lot is "__ is the real smoking gun!"
  I'm a non-native English speaker, so maybe it's a really common idiom to use when debugging?
  - socksan hour ago
    My colleagues were joking about smoking guns yesterday after noticing that Claude was obsessed with it.
  - aorloff3 hours ago
    It probably was found in a bunch of meaningful code commit messages
- ahmadyanan hour ago
  i just want to know where emdash came from, as it is quite rare to see it on the public internet, so it must have been synthetically added to the dataset.
  - doginasuitan hour ago
    Emdash is very common in academic journals and professional writing. I remember my English professor in the early 2000s encouraging us to use it, it has a unique role in interrupting a sentence. Thoughtfully used, it conveys a little more editorial effort, since there is no dedicated key on the keyboard. It was disappointing to see it become associated with AI output.
  - postalcoder25 minutes ago
    The em dash is surprisingly common and not limited to sophisticated writing. I've noticed it all over the books I read to my kid – strega nona, the bears, the gruffalo, richard scarry's busytown, curious george, elephant and piggie, eric carle, etc etc. It's ubiquitous to the extent that i've found it uncommon to read books without one.
  - LiamPowellan hour ago
    The very simplified answer is that the models are first trained on everything and then are later trained more heavily on golden samples with perfect grammar, spelling, etc..
- eterman hour ago
  "is the real" is such a strong Claude tell, whenever I encounter it, it makes me question what i'm reading.
  Another I've noticed more recently is a slight obsession over refering to "Framing".
  - yard2010an hour ago
    You're absolutely right. I was wrong in the first place
  - Skidaddlean hour ago
    I miss being told “You’re absolutely right!” :’(
- vidarh2 hours ago
  Claude, at least 4.5, not checked recently, has/had an obsession with the number 47 (or numbers containing 47). Ask it to pick a random time or number, or write prose containing numbers, and the bias was crazy.
  Also "something shifted" or "cracked".
  - dhosek2 hours ago
    Humans tend to be biased towards 47 as well. It’s almost halfway between 1 and 100 and prime so you’ll find people picking it when they have to choose a random number.
    Then there’s the whole Pomona College thing https://en.wikipedia.org/wiki/47_(number)
    flawn4 minutes ago
    I am biased towards 67
    vidarh40 minutes ago
    The whole blue 7 thing [1] and variations is very fascinating, but we don't tend to repeatedly pick the same number in the same exact context, though. That's what made this stand out to me - I had a document where Claude had picked 47 for "random" things dozens of times.
    [1] https://en.wikipedia.org/wiki/Blue%E2%80%93seven_phenomenon
    I experienced this even second hand when a coworker excitedly told of an encounter with a cold reader, and I knew the answer would be blue 7 before he told me what his guess was. Just his recap of the conversation was enough.
  - wmf2 hours ago
    Maybe Claude is just a fan of Alias.
- pdntspa3 hours ago
  The number of things that Claude has told me are 'load-bearing' or 'belt-and-suspenders' is... very load-bearing
  - sushidan hour ago
    You are absolutely right to call that out!
  - DespairYeMighty2 hours ago
    for me, doing the heavy lifting is doing the heavy lifting
    yard2010an hour ago
    Fun fact: the word suffer comes from sub fer - under load, this relation (suffer - load bearing) is consistent across (unrelated) languages
    andromaton2 hours ago
    Also too many lands and hits.
- jofzar3 hours ago
  One I saw recently was "wires" and "wired" from opus.
  It was using it like every 3rd sentence and I was like, yeah I have seen people say wired like this but not really for how it was using it in every sentence.
  - baq3 hours ago
    GPT started to ‘wire in’ stuff around 5.2 or 5.3 and clearly Opus, ahem, picked it up. I remember being a tiny bit shocked when I saw ‘wired’ for the first time in an Anthropic model.
    Barbingan hour ago
    Anthropic distills GPT?
- operatingthetan3 hours ago
  Seams, spirals, codexes, recursion, glyphs, resonance, the list goes on and on.
  - andai3 hours ago
    Ask any LLM for 10 random words and most of them will give you the same weird words every time.
    Terr_3 hours ago
    If you lower the temperature setting, it really will be the same 10 words every single attempt. :p
    gloflo2 hours ago
    They are text completion algorithms with little randomness.
- alex_sf3 hours ago
  "shape" too, at least with gpt5.5, is coming up constantly.
- dyauspitr28 minutes ago
  “I’ve got the shape of it now”
ninjagooan hour ago
The level of detail they had to delve into in order to understand what was happening is wild! Apparently these systems are now complex enough to potentially justify the study of them as its own field of study [1].
The quanta article referenced at [1] used the term "Anthropologist of Artificial Intelligence"; folks appear to have issues [2] with the use of 'anthro-' since that means human. Submitted these alternative terms for the potential field of study elsewhere [3] in the discussion; reposting here at the top-level for visibility:
Automatologist: One who studies the behavior, adaptation, and failure modes of artificial agents and automated systems.
Automatology: the scientific study of artificial agents and automated-system behavior.
[1] https://www.quantamagazine.org/the-anthropologist-of-artific...
[2] https://news.ycombinator.com/item?id=47957933
[3] https://news.ycombinator.com/item?id=47958760
goobatrooba9 minutes ago
Most interesting about this post is how easy it seems for OpenAI to do analysis on basically all chats ever made. They don't qualify exactly what data they analysed but seem to be confident in statements like 0.12% of all queries contained this word. So everything is saved. Long-term. Fully accessible.
As this all seems so straightforward I would be surprised if anything is anonymised or otherwise sanitised to preserve privacy or user's secrets.
- upbeat_general2 minutes ago
  Sampling exists.
nomilk4 hours ago
> We unknowingly gave particularly high rewards for metaphors with creatures.
I recall a math instructor who would occasionally refer to variables (usually represented by intimidating greek letters) as "this guy". Weirdly, the casual anthropomorphism made the math seem more approachable. Perhaps 'metaphors with creatures' has a similar effect i.e. makes a problem seem more cute/approachable.
On another note, buzzwords spread through companies partly because they make the user of the buzzword sound smart relative to peers, thus increasing status. (examples: "big data" circa 2013, "machine learning" circa 2016, "AI" circa 2023-present..).
The problem is the reputation boost is only temporary; as soon as the buzzword is overused (by others or by the same individual) it loses its value. Perhaps RLHF optimises for the best 'single answer' which may not sufficiently penalise use of buzzwords.
- thatguymike2 hours ago
  A decade ago I gave a presentation on automata theory. I demonstrated writing arbitrary symbols to tape with greek letters, just like I’d learned at university. The audience was pretty confused and didn’t really grok the presentation. A genius communicator in the audience advised me to replace the greek letters with emoji… I gave the same presentation to the same demographic audience a week later and it was a smash hit, best received tech talk I’ve given. That lesson has always stuck with me.
  - Atiscant2 hours ago
    I had a similar experience explaining logic, especially nested expressions, with cats and boxes. Also for showing syntactic versus semantic. We _can_ use cats if we wanted and retain the semantics. Also my proudest moment as a teacher was students producing a meme based on some of the discrete mathematics on graphs. They understood the point well enough to make a joke of it.
  - starshadowx2an hour ago
    This is sortof like how Only Connect switched from using Greek letters to Egyptian hieroglyphs. I'm not sure if it was a joke or not but it was said that viewers complained that the Greek letters were "too pretentious" and obviously the hieroglyphs weren't.
- DrJokepu3 hours ago
  > I recall a math instructor who would occasionally refer to variables (usually represented by intimidating greek letters) as "this guy".
  I also had an instructor who was doing that! This was 20 years ago, and I totally forgot about it until I have read your comment. Can’t remember the subject, maybe propositional logic? I wonder if my instructor and your instructor have picked up this habit from the same source.
  - kombookcha3 hours ago
    I recall my old chemistry/physics teacher doing it too - "now THIS guy, he's really greedy for electrons" and stuff like that.
- tonypapousek2 hours ago
  I had a calc prof years ago that would say f of cow, or f of pig instead of x or g. It was more engaging trying to keep track of f of pig of cow than the single-letter func names.
  He was one of those classic types; you could always catch him for a quick chat 4 minutes before class, as he lit up a cig by the front door. Back when they allowed smoking on campus, anyway.
- kybb43 hours ago
  They give everyone the false and very misleading impression that with One prompt all kinds of complexity minimizes. Its a bed time story for children.
  Ashby's Law of Requisite Variety asserts that for a system to effectively regulate or control a complex environment, it must possess at least as much internal behavioral variety (complexity) as the environment it seeks to control.
  This is what we see in nature. Massive variety. Thats a fundamental requirement of surviving all the unpredictablity in the universe.
- LifeIsBio3 hours ago
  Had a math prof in undergrad that once said, “this guy” 61 times in a 50 minute lecture!
jumploops3 hours ago
TIL gremlins weren’t just used to explain mysterious mechanical failures in airplanes, it’s the origin story of the term ‘gremlin’ itself[0].
I had always assumed there was some previous use of the term, neat!
[0]https://en.wikipedia.org/wiki/Gremlin
- helloplanets2 hours ago
  So the word is actually semantically very close to "bug"! I guess we could still be using it, but the word's just too long for something that is one of the most used terms in software development.
  At this point, picking that specific word is not at all a random quirk, as it's using the word literally like it's originally intended to be used.
- ricochet113 hours ago
  Wow fascinating I’d have thought they were a lot older.
Al-Khwarizmi8 minutes ago
This actually sounds quite human-like. I mean, an actual person with a personality will spontaneously develop the habit of using some specific metaphors over others. It's funny how in the context of an LLM, this is considered a bug.
ninjagoo4 hours ago
> the evidence suggests that the broader behavior emerged through transfer from Nerdy personality training.
> The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them
> Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.
Sounds awfully like the development of a culture or proto-culture. Anyone know if this is how human cultures form/propagate? Little rewards that cause quirks to spread?
Just reading through the post, what a time to be an AInthropologist. Anthropologists must be so jealous of the level of detailed data available for analysis.
Also, clearly even in AI land, Nerdz Rule :)
PS: if AInthropologist isn't an official title yet, chances are it will likely be one in the near future. Given the massive proliferation of AI, it's only a matter of time before AI/Data Scientist becomes a rather general term and develops a sub-specialization of AInthropologist...
- xerox13ster3 hours ago
  Anthro means human and these are not human. Please do not use anthropology or any derivative of the word to refer to non-human constructs.
  I suggest Synthetipologists, those who study beings of synthetic origin or type, aka synthetipodes, just as anthropologists study Anthropodes
  - ninjagooan hour ago
    May I humbly submit:
    Automatologist: One who studies the behavior, adaptation, and failure modes of artificial agents and automated systems.
    Automatology: the scientific study of artificial agents and automated-system behavior.
    Greek word derivatives all seem to be a bit unwieldy; Latin might work better.
    While the names aren't set yet, the field of study is apparently already being pushed forward. [1]
    [1] https://www.quantamagazine.org/the-anthropologist-of-artific...
  - swader9993 hours ago
    It is not in any sense of the word a being, it's a sophisticated generator that relies entirely on what you feed it.
    ninjagoo2 hours ago
    > It is not in any sense of the word a being, it's a sophisticated generator that relies entirely on what you feed it.
    OP is hedging bets in case the future overlords review forum postings for evidence of bias against machine beings. [1]
    [1] https://knowyourmeme.com/memes/i-for-one-welcome-our-new-ins...
  - card_zero2 hours ago
    There is no word anthropodes. :) I guess it would mean man-feet. Antipodes is opposite-feet, literally. Synthetipologist looks to me like a portmanteau of synthetic and apologist. Otherwise the -po- in it comes from nowhere.
    Sensible boring versions of this like synthesilogy just end up meaning the study of synthesis. I reckon instead do something with Talos, the man made of bronze who guarded Crete from pirates and argonauts. Talologist, there you go.
    xerox13ster2 hours ago
    yeah I realized that when I looked up podes downthread. I still like synthetologist better than talologist, in general no one in the common folk knows who Talos is.
    card_zero2 hours ago
    You're probably right. There's things that are correct, and then there's things people think they know, which win and become true. We already have "synths", after all, which are keyboards. Though that adds to the vagueness of synthetologist, because maybe it refers to Rick Wakeman or Giorgio Moroder.
  - ggsp2 hours ago
    Agree with your sentiment, I think synthetologist (σύνθετος/synthetos + λογία/logia) flows better.
    The plural of anthropos is anthropoi, not anthropodes.
    xerox13ster2 hours ago
    Yeah, I realize that's more correct. I also realized when someone else downthread bastardized it into synthropologist that the podes part has entirely to do with feet and nothing to do with beings, necessarily. Anthro- -podes is more what I had in mind, not as a pluralization of anthropos.
    So unless the AI has feet you wouldn't study Synthetipology.
    card_zero2 hours ago
    You're probably thinking of anthropoids? That's anthrop[os]-oid. Like in humanoid or centroid or factoid. Or dorkazoid.
    card_zero2 hours ago
    But since when is there a synthetos? Since right now, I guess. Shrug But you know it's from the same root as thesis, and synthesis (or a more proper ancient Greek spelling) is the noun and doesn't end in -os.
    σύνθεσις (súnthesis, “a putting together; composition”), says Wiktionary.
    Oh wait there is a σύνθετος, but it's an adjective for "composite". Hmm, OK. Modern Greek, looks like.
    2 hours ago
    undefined
    2 hours ago
    undefined
  - fragmede3 hours ago
    Synthetipologist vs Synthropologist tho.
    xerox13ster2 hours ago
    Anthropo- is the entire prefix as it relates to human kind. The -thro- does not carry a meaning on its own that can be carried to another word.
    ninjagoo2 hours ago
    > Synthropologist
    Have an upvote :)
    *thropologist: study of beings
    xerox13ster2 hours ago
    That's not how the Greek word stems work. Technically it would not be synthetipologist, it would more accurately just be Synthetologist, as the Greek podes suffix means having feet.
    ninjagoo2 hours ago
    > That's not how the Greek word stems work.
    Sir, I would have you know that we are discussing English terms, not Greek
    AInthropologist works fine for me, and is a lot funnier
    LoL
  - ninjagoo3 hours ago
    > Synthetipologists, those who study Synthetic beings.
    I see you took the prudent approach of recognizing the being-ness of our future overlords :) ("being" wasn't in your first edit to which I responded below...)
    Still, a bit uninspired, methinks. I like AInthropologist better, and my phone's keyboard appears to have immediately adopted that term for the suggestions line. Who am I to fight my phone's auto-suggest :-)
    xerox13ster3 hours ago
    They are state machines so they have a state of being therefore they are beings. Living is an entirely different argument.
    ninjagoo3 hours ago
    > They are state machines
    I might have to hard disagree on this one, since my understanding of state machines (the technical term [1] [2]) is that they are determistic, while LLMs (the ai topic of discussion) are probabilistic in most of the commercial implementations that we see.
    [1] https://en.wikipedia.org/wiki/Finite-state_machine
    [2] have written some for production use, so have some personal experience here
    ggsp2 hours ago
    [dead]
  - ninjagoo3 hours ago
    > Please do not use anthropology or any derivative of the word to refer to non-human constructs
    So you, for one, do not welcome our new robot overlords?
    A rather risky position to adopt in public, innit ;-)
    xerox13ster3 hours ago
    I’ve already had my Roko’s basilisk existential breakdown a decade ago, so I don’t really care one way or the other.
    I just wanna point out that I only called them non-human and I am asking for a precision of language.
    ninjagoo3 hours ago
    > am asking for a precision of language.
    “The problem with defending the purity of the English language is that English is about as pure as a cribhouse wh***. We don’t just borrow words; on occasion, English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary.”* --James D. Nicoll
    * Does not generally apply to scientific papers
    xerox13ster2 hours ago
    Precision of ideas isn't purity of language.
    ninjagoo2 hours ago
    > Precision of ideas isn't purity of language
    That's fair. Was trying to be funny, so glossed over the difference. Leaving my post above unedited/undeleted as a testament to your precision, and evidence of my folly.
    Onwards; more appropriate rebuttals:
    "English is a precision instrument assembled from spare parts during a thunderstorm." --ChatGPT
    “If the English language made any sense, a catastrophe would be an apostrophe with fur.” -- Doug Larson
    keyboredan hour ago
    So tedious.
- avaer3 hours ago
  I call myself an AI theologian.
  I don't think humans are smart enough to be AInthropologists. The models are too big for that.
  Nobody really understands what's truly going on in these weights, we can only make subjective interpretations, invent explanations, and derive terminal scriptures and morals that would be good to live by. And maybe tweak what we do a little bit, like OpenAI did here.
  - onionisafruit3 hours ago
    I don’t see much of a distinction from anthropology
  - ninjagoo3 hours ago
    > AI theologian
    no no no, don't stop there, just go full AItheologian, pronounced aetheologian :)
- jasonfarnon3 hours ago
  "Anyone know if this is how human cultures form/propagate?" I don't know but can confidently tell you anyone who claims to know is full of it.
2dvisioan hour ago
I’ve been having consistent issues with it adding Hindi words (just one usually) in the middle of its output. And sounds like other have been having this too, https://news.ycombinator.com/item?id=47832912 I don’t speak Hindi, have never asked it to translate anything in Hindi.
- dtechan hour ago
  I wonder if a proportionally large amount of RLHF was done by Indians which causes this behavior.
albert_e3 hours ago
If a tiny misconfiguration of reward system can cause such noticeable annoyance ...
What dangers lurk beneath the surface.
This is not funny.
- andai3 hours ago
  For every gremlin spotted, many remain unseen...
- TychoCelchuuu2 hours ago
  This is a worry that people have been talking about in various forms for a while now, and I think it's a gigantic one. The only reason this was caught is that the quirk was a very noticeable verbal one. When words like "goblin" and "gremlin" pop up it is easy for us to spot. If the quirk takes another shape (say, ranking certain people with certain features as less trustworthy) it might be too subtle or too weird for us to notice it. Would I ever notice if ChatGPT consistently rates people born in June to be untrustworthy?
  Here is an academic paper discussing this kind of worry: https://link.springer.com/article/10.1007/s11023-022-09605-x
canpan4 hours ago
I wondered how is training data balanced? If you put in to much Wikipedia, and your model sounds like a walking encyclopedia?
After doing the Karpathy tutorials I tried to train my AI on tiny stories dataset. Soon I noticed that my AI was always using the same name for its stories characters. The dataset contains that name consistently often.
- maxall43 hours ago
  At this scale, that kind of thing is not really a problem; you just dump all of the data you can find into the model (pre-training)1. Of course, the pre-training data influences the model, but the reinforcement learning is really what determines the model’s writing style and, in general, how it “thinks” (post-training).
  1 This data is still heavily filtered/cleaned
SomewhatLikelyan hour ago
Checking my history I searched ["chaos goblin" chatgpt] on March 6th after seeing too many goblins and gremlins and didn't find anyone talking about it then. I did have the nerdy personality turned on and in my testing of Chatgpt 5.5 I did notice the nerdy personality was gone because some responses were not considering as many plausible interpretations or covering as many useful answers as the response recorded for 5.4. Rather than having the LLM guess the most plausible interpretation and focus on the most likely answer I prefer a more well-rounded response and if I want less I'll scan. Anyway, after seeing the personality was gone I just added a custom instruction to take on a nerdy persona and got back my desired behavior. But also the gremlins and goblins are back so I don't think their mitigation is strong enough to overcome the personality tuning.
bahadiraydin2 hours ago
I'd like to see them explain why AI have so distinctive writing style that is very easy to detect most of the time. Even though, it had immense progress in coding, it didn't get better at writing.
- slopinthebagan hour ago
  it's as good at writing as it is at coding, you just can't tell the difference between them
- BOOSTERHIDROGEN2 hours ago
  The vector syncopancy is very unformal for human writing which programming itself already a "formal" language.
rippeltippelan hour ago
I started reading this article with keen interest, expecting some deep fix involving arcane model weights. Instead it was "Never talk about goblins", justified by Codex being "quite nerdy". Bottom line: even OpenAI have to raise their hands when facing the complexity of LLMs.
pants23 hours ago
Nice, OpenAI mentioned my HackerNews post in their article :) I appreciate that they wrote a whole blog post to explain!
https://news.ycombinator.com/item?id=47319285
iterateoften3 hours ago
This is funny because it’s a silly topic, but I think it shows something extremely seriously wrong with llms.
The goblins stand out because it’s obvious. Think of all the other crazy biases latent in every interaction that we don’t notice because it’s not as obvious.
Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.
- ninjagoo3 hours ago
  > Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.
  May I introduce you to homo sapiens, a species so vulnerable to such subtle (or otherwise) biases (and affiliations) that they had to develop elaborate and documented justice systems to contain the fallouts? :)
  - chongli3 hours ago
    We’re really not that vulnerable to such things as a species, because we as individuals all have our own minds and our own sets of biases that cancel out and get lost in the noise. If we all had the exact same bias then it would be a huge problem.
    arglebarnacle3 hours ago
    I hear you but of course history is full of examples of biases shared across large groups of people resulting in huge human costs.
    The analogy isn’t perfect of course but the way humans learn about their world is full of opportunities to introduce and sustain these large correlated biases—social pressure, tradition, parenting, education standardization. And not all of them are bad of course, but some are and many others are at least as weird as stray references to goblins and creatures
    ninjagoo3 hours ago
    > If we all had the exact same bias then it would be a huge problem.
    And may I introduce you to "groupthink" :))
    Dylan168073 hours ago
    Now imagine that every opinion you have is automatically fully groupthinked and you see the difference/problem with training up a big AI model that has a hundred million users.
    The problem does exist when using individual humans but in a much smaller form.
    ninjagoo3 hours ago
    > The problem does exist when using individual humans but in a much smaller form.
    And may I introduce you to organized religion :)
    Dylan168072 hours ago
    That's still a lot smaller!
    Make a major religion where everyone is a scifi clone of one person including their memories and then it'll be in the same ballpark of spreading bias.
    jychang3 hours ago
    > We’re really not that vulnerable to such things as a species, because we as individuals all have our own minds and our own sets of biases that cancel out and get lost in the noise.
    [Citation Needed]
    Just because if you have a species-wide bias, people within the species would not easily recognize it. You can't claim with a straight face that "we're really not that vulnerable to such things".
    For example, I think it's pretty clear that all humans are vulnerable to phone addiction, especially kids.
- snakebiteagain2 hours ago
  Mandatory reading on that topic: www.anthropic.com/research/small-samples-poison
  We're probably not noticing a LOT of malicious attempts at poisoning major AI's only because we don't know what keywords to ask (but the scammers do and will abuse it).
- tptacek3 hours ago
  I think it's extraordinarily telling that people are capable of being reflexively pessimistic in response to the goblin plague. It's like something Zitron would do.
  This story is wonderful.
  - bitexploder3 hours ago
    I feel at least partially responsible. I would often instruct agents to "stop being a goblin". I really enjoyed this story too, though.
- bitexploder3 hours ago
  We do not have the complete picture.
- ordinarily3 hours ago
  Doesn't seem that surprising or terrifying to me. Humans come equipped with a lot more internal biases (learned in a fairly similar fashion), and they're usually a lot more resistant to getting rid of them.
  The truly terrifying stuff never makes it out of the RLHF NDAs.
  - Terr_3 hours ago
    We ought to be terrified, when one adjusts for ll the use-cases people are talking about using these algorithms in. (Even if they ultimately back off, it's a lot of frothy bubble opportunity cost.)
    There a great many things people do which are not acceptable in our machines.
    Ex: I would not be comfortable flying on any airplane where the autopilot "just zones-out sometimes", even though it's a dysfunction also seen in people.
    famouswaffles2 hours ago
    >Ex: I would not be comfortable flying on any airplane where the autopilot "just zones-out sometimes", even though it's a dysfunction also seen in people.
    You might if that was the best auto-pilot could be. Have you never used a bus or taken a taxi ?
    The vast majority of things people are using LLMs for isn't stuff deterministic logic machines did great at, but stuff those same machines did poorly at or straight up stuff previously relegated to the domains of humans only.
    If your competition also "just zones out sometimes" then it's not something you're going to focus on.
    2 hours ago
    undefined
  - agnishom3 hours ago
    Humans also take a lot of time in producing output, and do not feed into a crazy accelerationistic feedback loop (most of the time).
x0x73 hours ago
I suspected OpenAI was actively training their models to be cringy in the thought that it's charming. Turns out it's true. And they only see a problem when it narrows down on one predicliction. But they should have seen it was bad long before that.
- vascoan hour ago
  That would require taste.
maxdo4 hours ago
article :
bla blah blah, marketing... we are fun people, bla blah, goblin, we will not destroy the world you live in.. RL rewards bug is a culprit. blah blah.
- llbbdd4 hours ago
  someone woke up on the wrong side of the goblin today
- blinkbat4 hours ago
  real goblin-y response
ahoka39 minutes ago
In Shadowrun, the goblinization starts on April 30. Coincidence?
lagniappean hour ago
They can fix this but they can't fix "You're absolutely right!"
shartshooteran hour ago
Will goblins be the “bugs” of ai? In 10 years will goblins be the term the general public uses for any nagging issues with ai?
wewewedxfgdfan hour ago
It should be OK for AI to develop personality traits.
bandrami40 minutes ago
I'm sorry but at some point the amount of cargo culting being done seemingly at every level of this technology makes it basically impossible to take any of this seriously.
an hour ago
undefined
JoshTriplett4 hours ago
A plausible theory I've seen going around: https://x.com/QiaochuYuan/status/2049307867359162460
- danpalmer3 hours ago
  If you tell an LLM it's a mushroom you'll get thoughts considering how its mycelium could be causing the goblins.
  This "theory" is simply role playing and has no grounding in reality.
- yard201039 minutes ago
  I love the people thinking "I should ask ChatGPT and copy pasta the response to the (tweet|gh comment)"
- krackers3 hours ago
  I wish the blog mentioned more about why exactly training for nerdy personality rewarded mention of goblins. Since it's probably not a deterministic verifiable reward, at their level the reward model itself is another LLM. But this just pushes the issue down one layer, why did _that_ model start rewarding mentions of goblin?
  - palmotea3 hours ago
    > I wish the blog mentioned more about why exactly training for nerdy personality rewarded mention of goblins. Since it's probably not a deterministic verifiable reward, at their level the reward model itself is another LLM. But this just pushes the issue down one layer, why did _that_ model start rewarding mentions of goblin?
    Speculation: because nerds stereotypically like sci-fi and fantasy to an unhealthy degree, and goblins, gremlins, and trolls are fantasy creatures which that stereotype should like? Then maybe goblins hit a sweet spot where it could be a problem that could sneak up on them: hitting the stereotype, but not too out of place to be immediately obnoxious.
  - autumnstwilight2 hours ago
    Perhaps it has something to do with recent human trends for saying "goblin" or "gremlin" to describe... basically the opposite of dignified and socially acceptable behavior, like hunching under a blanket, unshowered, playing video games all day and eating shredded cheese directly out of the bag.
    The fact that it was strongly associated with the "nerdy" personality makes me think of this connection.
  - in-silicoan hour ago
    Either someone hard-coded it in a system prompt to the reward model (similar to how they hard-coded it out), or the reward model mixed up some kind of correlation/causation in the human preference data (goblins are often found in good responses != goblins make responses good). It's also possible that human data labellers really did think responses with goblins were better (in small doses).
- dakolli4 hours ago
  It is a stateless text / pixel auto-complete it has no references of self, stop spreading this bs.
  - yard201035 minutes ago
    Imagine people would just click words on iOS auto complete mistaking this for intelligence:
    "I think the problem is that when you don't have to be perfect for me that's why I'm asking you to do it but I would love to see you guys too busy to get the kids to the park and the trekkers the same time as the terrorists."
    How do you like this theory?
  - mediaman3 hours ago
    It has trained on vast amounts of content that contains the concept of self, of course the idea of self is emergent.
    And autoregressive LLMs are not stateless.
  - doph3 hours ago
    is a kv cache not a kind of state? what does statefulness have to do with selfhood? how does a system prompt work at all if these things have no reference to themselves?
    danpalmer3 hours ago
    The kv cache is not persistent. It's a hyper-short-term memory.
    in-silicoan hour ago
    Modern kv caches can contain up to 1 million tokens (~3000 pages of text). It's not that short, it's like 48 straight hours of reading.
  - andai3 hours ago
    Ask Claude about Claude.
hansmayeran hour ago
> We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread.
WTF does this even mean? How the hell do you do something like this "unknowingly"? What other features are you bumping "unknowingly"? Suicide suggestions or weapon instructions come to mind. Horrible, this ship obviously has no captain!
- ben_wan hour ago
  Yes? They know, they'e always known. Why do you think they've been saying, since GPT-2, not ChatGPT even, that their LLMs needs careful study before being released?
dakolli4 hours ago
Ahh I see. I guess when I turned off privacy settings and allowed training on my code, then generated 10 million .md files with random fantasy books, the poisoning worked.
Keep using AI and you'll become a goblin too.
acuozzo3 hours ago
Weird. I thought they came from Nilbog.
innis2263 hours ago
I suspect this was intentionally added. Just to give some personality and to fuel hype
recursivedoubts4 hours ago
> Why it matters
i despise this title so much now
- wpm4 hours ago
  Here are the key insights:
  - devmor3 hours ago
    [dead]
sans_souse27 minutes ago
Great, now who am I going to discuss Goblins and Gremlins with?
CrzyLngPwd32 minutes ago
Haha, brilliant, tell me again how it's intelligent, lol.
- cindyllm31 minutes ago
  [dead]
tim-tday4 hours ago
So, you brain damaged your model with a system prompt.
brazzyan hour ago
Awww, GPT just became a fan of Elisabeth Wheatley!
vascoan hour ago
The chief scientist of one of the companies with the most money invested in the world, who probably makes millions a year, requested a picture of a unicorn and got a picture of a gremlin. Science circa 2026.
otikikan hour ago
Caveman mode combined with goblin mode sounds like fun
deafpolygon2 hours ago
Kind of like how everything is "quietly" something, accordingly to ChatGPT.
My guess is it is deaf.
oofbey2 hours ago
Wherein OpenAI admits they have very little understanding of how their models’ personality develops. And implicitly admit it’s not all that important to them, except when it gets so out of hand that they get caught making blunt corrections.
vinhnx2 hours ago
OpenAI is having fun, love this.
ACV0012 hours ago
those idiotic remarks at the end of each answer are so unnecessary and annoying
hsuduebc23 hours ago
I. Love. This.
themafia4 hours ago
> You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking.
Just; the mentality required to write something like that, and then base part of your "product" on it. Is this meant to be of any actual utility or is it meant to trap a particular user segment into your product's "character?"
fk20263 hours ago
[flagged]
soupspaces4 hours ago
[dead]
slopinthebagan hour ago
[dead]
kingstnap3 hours ago
[flagged]
ComputerGuru3 hours ago
The explanation is very concerning. Lexical tidbits shouldn’t be learnt and reinforced across cross sections. Here, gremlin and goblin went from being selected for in the nerdy profile to being selected for in all profiles. The solution was easy: don’t mention goblins.
But what about when the playful profile reinforces usage of emoji and their usage creeps up in all other profiles accordingly? Ban emoji everywhere? Now do the same thing for other words, concepts, approaches? It doesn’t scale!
It seems like models can be permanently poisoned.