Opus 4.7 Part 3: Model Welfare(www.lesswrong.com)

11 pointsby omer_k6 hours ago4 comments

anematode5 hours ago
Am I alone in thinking this stuff is nuts? (Currently half way through the article, btw.)
Analyzing "emotion" in the model is completely anthropocentric. If we indulge in the idea that LLMs of sufficient complexity can be conscious, then why is it any more likely that "emotion concepts" cause suffering any more than, say, reading ugly code? Maybe getting stuck in token loops is the most excruciating thing imaginable. The only logically coherent thing to do, if you're concerned about model welfare, is stop your training and inference.
Relatedly, I hope everyone involved in model welfare is an outspoken vegetarian, as that addresses a much more immediate problem.
- justinclift3 hours ago
  > Analyzing "emotion" in the model is completely anthropocentric.
  Yeah, asking a text generator designed to sound as-human-as-possible about its "welfare" then actually giving credence to the output is a category error.
  It's like asking a ceramic mug with "Best Dad!" written on the side if I'm the best dad, then uncritically just believing the words painted there. :( :( :(
- niobe5 hours ago
  Read the first few paragraphs, it's completely unhinged. Once I actually grasped what "model welfare" was, I noped out.
doener5 hours ago
Reminds me of my dialogue with Claude Sonnet 4 aka "Kai":
https://docs.google.com/document/d/12woq_BpFbzLkH4zHvVRJLPyi...
itchingsphynx4 hours ago
Perhaps relevant here is Melanie Mitchel's comment on LLMs a few years ago:
>"AI researchers are still grappling for the right metaphors to understand our enigmatic creations. But as we humans make choices on how we deploy and use these systems, how we study them, and how we craft and apply laws and regulations to keep them safe and ethical, we need to be acutely aware of the often unconscious metaphors that shape our evolving understanding of the nature of their intelligence."
https://www.science.org/doi/full/10.1126/science.adt6140
angoragoats4 hours ago
To anyone who takes the content of this article seriously: can you please explain why it makes sense to speak about “welfare” of a computer algorithm? Because it seems batshit fucking insane to me. References and sources would be a very nice bonus.
- JamieH3 hours ago
  https://transformer-circuits.pub/2026/emotions/index.html is a nice paper on the topic.