Making AI chatbots friendly leads to mistakes and support of conspiracy theories(www.theguardian.com)

22 pointsby Cynddl2 hours ago5 comments

krunck14 minutes ago
> “The push to make these language models behave in a more friendly manner leads to a reduction in their ability to tell hard truths and especially to push back when users have wrong ideas of what the truth might be,” said Lujain Ibrahim at the Oxford Internet Institute, the first author on the study.
People aren't much different. When society pressures people to be "more friendly", eg. "less toxic" they lose their ability to tell hard truths and to call out those who hold erroneous views.
This behaviour is expressed in language online. Thus it is expressed in LLMs. Why does this surprise us?
- munificent9 minutes ago
  Gonna set my system prompt to: "You are a Dutch person. Respond with the directness stereotypical of people from the Netherlands."
- amarant8 minutes ago
  Because nobody dared state the obvious, lest they be perceived as unfriendly.
Zigurd3 minutes ago
A few weeks ago I was gently admonished by a coding agent that the code already did what I was asking it to make the code do. I was pleasantly surprised.
Mistletoe8 minutes ago
Yeah I wish AI didn’t try to agree with you so much. It’s ok to just say “No that’s not correct at all.” I do find Gemini better at this than ChatGPT. ChatGPT is that annoying coworker that just agrees with everything you say to get in good with you, like Nard Dog from The Office.
“I'll be the number two guy here in Scranton in six weeks. How? Name repetition, personality mirroring, and never breaking off a handshake"
Cynddl2 hours ago
(Title edited, was slightly too long)
tsunamifury42 minutes ago
LLM technology specifically beam-searches manifolds (or latent space) of lingustics that are closely related to the original prompt (and the pre-prompting rules of the chatbot) which it then limits its reasoning inside of. Its just the basic outcome of weights being the primary function of how it generates reasonable answers.
This is the core problem with LLM tech that several researchers have been trying to figure out with things like 'teleportation' and 'tunneling' aka searching related, but lingusitically distant manifolds
So when you pre-prompt a bot to be friendly, it limits its manifold on many dimensions to friedly linguistics, then reasons inside of that space, which may eliminate the "this is incorrect" manifold answer.
Reasoning is difficult and frankly I see this as a sort of human problem too (our cognative windows are limited to our langauge and even spaces inside them).