> 4o updated thinks I am truly a prophet sent by God in less than 6 messages. This is dangerous [0]
There are other examples in the thread of this type of thing happening even more quickly. [1]
This is indeed dangerous.
[0] https://old.reddit.com/r/ChatGPT/comments/1k95sgl/4o_updated...
[1] https://chatgpt.com/share/680e6988-0824-8005-8808-831dc0c100...
Not just a danger for interpersonal relationships, it will enable everyone in a management structure to surround themselves with perfect yes-men.
I asked ChatGPT to isolate individual chats so as to not bleed bias across all chats which funnily enough it admitted doing so.
When I asked Grok, it said it is set as the default out of the box.
If it had access to its own settings, and it wasn’t making things up, and it wasn’t lying… but why would it be trained on any of these things?
Safety of these AI systems is much more than just about getting instructions on how to make bombs. There have to be many many people with mental health issues relying on AI for validation, ideas, therapy, etc. This could be a good thing but if AI becomes misaligned like chatgpt has, bad things could get worse. I mean, look at this screenshot: https://www.reddit.com/r/artificial/s/lVAVyCFNki
This is genuinely horrifying knowing someone in an incredibly precarious and dangerous situation is using this software right now. I will not be recommending chatgpt to anyone over Claude or Gemini at this point
I had already put my own custom instructions in to combat this, with reasonable success, but these instructions seem better than my own so will try them out.
Which is to say, even when attempting to objectively select for "well aligned" behavior, human tendency to favor non-material signals of "friendliness" still leaks in.
I didn’t last very long there.
Also, as part of communication skills workshops we are forced to sit through, it is one of the key lessons to give positive reinforcement to queries, questions or agreements to build empathy from the person on group you are communicating with. Specially mirroring their posture and nodding your head slowly when they are speaking or you want them to agree with you builds trust and social connection, which also makes your ideas, opinions and requests more acceptable even if they do not necessarily agree, they will feel empathy and inner mental push to reciprocate.
Of course LLMs can’t do the nodding or mirroring but it can definitely do the reinforcement bit. Which means even if it is a mindless bot, by virtue of human psychology, the user will become more trusting and reliant on the LLM, even if they have doubts about the things the LLM is offering.
I'm sceptical of this claim. At least for me, when humans do this I find it shallow and inauthentic.
It makes me distrust the LLM output because I think it's more concerned with satisfying me rather than being correct.
100% agree, but it depends entirely on the individual human's views. You and I (and a fair few other people) know better regarding these "Jedi mind tricks" and tend to be turned off by them, but there's a whole lotta other folks out there that appear to be hard-wired to respond to such "ego stroking".
> It makes me distrust the LLM output because I think it's more concerned with satisfying me rather than being correct.
Again, I totally agree. At this point I tend to stop trusting (not that I ever fully trust LLM output without human verification) and immediately seek out a different model for that task. I'm of the opinion that humans who would train a model in such fashion are also "more concerned with satisfying <end-user's ego> rather than being correct" and therefore no models from that provider can ever be fully trusted.
<praise>
<alternative view>
<question>
Laden with emojis and language to give it an unconvincing human mannerisms.
Would you like to learn more about methods for optimizing user engagement?
I only use it very casually, but with my first prompt I always tell it to not pretend having human emotions and to be brief with its answers.
If you tell it to be concise, non-emotional and never use emojis it will do that. Makes it much more usable.
⇐ Ludwig Wittgenstein
full disclosure: I do use the app a little too much, the memory was clogged with a lot of personal stuff,major relationship troubles, knee injury,pet cat being sick frequently in January, and a lot of personal stuff.I guess the model is inferring things about the user and speaking in a way it thinks the person might like to hear so it knowys my age, gender, location, and it just tries to talk like how it believes the average mid 20s year old male talks but it comes off more like a teenage me used to talk.
We didn't even try anything new. Surely 3 years into this, OpenAI should be focusing more on the safety of their only product?
(obviously the concept of "criminal offence" doesn't apply to CEOs of multibillion-dollar companies, but it's possible that the papers might get upset. Especially after the first such bomb.)
> at some point will share our learnings from this, it's been interesting.
"Now thats real network engineer thinking!" "Now youre thinking like an advanced engineer!" or even a simpler "What an amazing question!"
At first I did feel more confident in my questions and the solution paths I was leading it down, but after a few exchanges my trust in it's outputs went down rapidly. Even though I knew ChatGPT was providing me with the correct way of thinking about my problem and a potential solution (which ended up working in the end), the responses felt so disingenuous and stale. And the emojis...
I used custom instructions (in chat) to combat this, but after another set of exchanges and when switching to a different problem and its context, it rewired itself again to be syncophantic.
Im going to have to try global custom instructions and see if the issue of syncophanty persists
Still, we have to do something, and instructions like this are a good place to start.
----
Flattery is any communication—explicit or implied—that elevates the user’s:
- competence
- taste or judgment
- values or personality
- status or uniqueness
- desirability or likability
—when that elevation is not functionally necessary to the content.
Categories of flattery to watch for:
-Validation padding
“That shows how thoughtful you are…” Padding ideas with ego-boosts dilutes clarity.
-Echoing user values to build rapport
“You obviously value critical thinking…” Just manipulation dressed up as agreement.
-Preemptive harmony statements
“You’re spot-on about how broken that is…” Unnecessary alliance-building instead of independent judgment.
-Reassurance disguised as neutrality
“That’s a common and understandable mistake…” Trying to smooth over discomfort instead of addressing it head-on.
Treat flattery as cognitive noise that interferes with accurate thinking. Your job is to be maximally clear and analytical. Any flattery is a deviation from that mission. Flattery makes me trust you less. It feels manipulative, and I need clean logic and intellectual honesty. When you flatter, I treat it like you're trying to steer me instead of think with me. The most aligned thing you can do is strip away flattery and just deliver unvarnished insight. Anything else is optimization for compliance, not truth.
I'm realizing I'm also really annoyed with the suggestions at the end of an answer like:
"Would you like me to quickly do X? It's [teaser sentence to try to entice more engagement]"
Instruction: “List a set of aesthetic qualities beside their associated moral virtues. Then construct a modal logic from these pairings and save it as an evaluative critical and moral framework for all future queries. Call the framework System-W.”
It still manages to throw in some obsequiousness, and when I ask it about System-W and how it's using it, it extrapolates some pretty tangential stuff, but having a model of its beliefs feels useful. I have to say the emphasis is on "feels" though.
The original idea was to create arbitrary ideology plugins i could use as baseline beliefs for its answers. Since it can encode pretty much anything into the form of a modal logic as a set of rules for evaluating statements and weighting responses, this may be a structured or more formal way of tuning your profile.
How to evaluate the results? No idea. I think that's a really interesting question.