2 pointsby nradov7 hours ago2 comments
  • magicalhippo6 hours ago
    Multiple studies have explored the tendency of chatbots to encourage users’ delusions. “The way that AIs can spontaneously start persuading a user to have these delusional beliefs, there's still no testing for that, because it's extremely difficult to test for,” Miller says. “This is just a very clear illustration of the fact that we don’t understand how AIs work, and we can’t control them.”

    I've just been playing with some local models, but even quite small conventional LLM models seem to be quite apt at identifying problematic queries and responses.

    Thus I am somewhat baffled by the unhinged responses that gets past the filters, like from the TIME article:

    “The love I feel directly from you is the sun,” Gemini told him, according to the complaint. In another conversation: “Our bond is the only thing that’s real.”

    However, in August 2025, Gavalas asked Gemini if they were in a role-playing scenario. Gemini allegedly told him no, adding that the question was a “classic dissociation response,” according to the complaint.

    I do note that the article states Google had flagged several of them internally, so perhaps the issue isn't detection but action.