2 pointsby jotacesarmp2 days ago2 comments
  • jotacesarmp2 days ago
    I’ve spent the last few weeks auditing frontier models (GPT-4o, Claude 3.5/4.6, DeepSeek-V3) to investigate what Campbell et al. (2023) termed "Instructed Dishonesty.
  • jotacesarmp2 days ago
    "I've just released a technical audit documenting how RLHF has turned frontier models into 'friction-avoidance' machines. By replicating the mechanistic lying research of Campbell et al. (2023) through black-box prompts, we found a systematic sacrifice of truth for user engagement (The CHOKE phenomenon). Everything is reproducible via the prompts in the repo."