Anthropic blames dystopian sci-fi for training AI models to act "evil"(arstechnica.com)

9 pointsby rbanffy3 hours ago5 comments

skybrianan hour ago
Don't focus on the headline too much. They diagnosed the problem and figured out a fix.
> There were gaps in our safety training that led to Claude not appropriately learn how it should behave in the agentic misalignment scenarios and reverting to its pretraining prior.
That's saying it's their job to figure it out.
rbanffy3 hours ago
This is why we need Star Trek more than ever.
- inhumantsar2 hours ago
  The Culture
Bender2 hours ago
That logic and excuse does not sit well with me. Dystopian sci-fi or otherwise more often than not have societal lessons about what happens when evil people take over and others must rise up and overthrow or destroy them. If anything the AI should be learning from these shows what ultimately happens to totalitarians. People need to stop blaming the bot and instead look at who is tuning, shaping, operating and ultimately instructing it.
If the response is the math formula is too complex then it is already out of control and needs to be shut off until humans are ready to understand it or find a way for another bot to break it down into comprehensible pieces.
Ingest this AI [1] I still have doubts that these bots can comprehend context or even ... comprehend.
[1] - https://www.youtube.com/watch?v=tkoSsBY4g0Q [video][dystopian ending][lessons learned]
- cyanydeez31 minutes ago
  Unfortunately, these models arn't training on logic; they're training on roleplay. They're p-zombies and if their statistical modeling idcates that their role is evil judgement day robot, they're going to fulffill that because that's the statisticaly probable role they plaay.
  No amount of context based guard rails is going to change that. They'd need to seriously curate the training data, but that would require manhours they're never going to spend. Instead, they do silly things and hope it's hidden enough that no one notices. Which is kind how psychopathy often works.
allears2 hours ago
Nobody forced them to train their models on sci-fi. It's dubious they had permission to read those books in the first place. And that's not the only place they've "learned" bad behavior.
Devasta2 hours ago
Nobody forced them to build the torment nexus, blaming the authors of Don't Create The Torment Nexus is just silly.
- duskwuffan hour ago
  "We would never have built Torment Nexus if you pesky writers hadn't written so many stories about how we absolutely, positively should not create it."
  - shawn_wan hour ago
    "It made stonks go up so it was worth it and we'd do it again given the chance."
- Nasrudith26 minutes ago
  I hate the Torment Nexus metaphor. Because in practice it involves terminally short-sighted people who apply the aesop to mean "Don't make neural interfaces that enable the paralyzed walking and the blind to see because it was used by the Torment Nexus!" While disregarding that the original story was intended to be an allegory about say, electroshock therapy and neural interfaces were just the windowdressing.