1 pointby brandonb6 hours ago1 comment
  • brandonb6 hours ago
    This is the first new paper from Alec Radford since leaving OpenAI. Token-level data filtering is kind of a simple idea, but so are many effective ideas in LLMs.

    One advantage is that this type of safety guardrail can't be undone by an adversary in post-training, so it's a good fit for open source models.

    The experiments are all done in preventing models from acquiring medical capabilities, while preserving related capabilities like e.g., biology.