4 pointsby falsework5 hours ago1 comment
  • thenaturalist5 hours ago
    Adverserial work (be it agent or human).

    The one difference between "can do" and "should be trusted to do" is the ability to systematically prove that "can do" holds up close to 100% of task instances and under adverserial conditions.

    Hacking and pentesting are already scaling fully autonomously - and systematically.

    For now, lower level targets aren't yet attractive as such scale requires sophisticated (state) actors, but that is going to change.

    So building systems that white-hat prove your code is not only functional but competent are going to be critical not to be ripped apart by black-hat later on.

    One nice example that applies this quite nicely is roborev [0] by the legendary Wes McKinney.

    0: https://github.com/roborev-dev/roborev

    • falsework5 hours ago
      This is a good point. You're right that adversarial testing provides one form of validation that doesn't depend on credentials if the system holds up under systematic attack, that's evidence of competence regardless of who built it.

      But I think there's a distinction worth making between technical robustness (does the code have vulnerabilities?) and epistemic legitimacy (should we trust the analysis/conclusions?).

      Pentesting and formal verification can tell us whether a system is secure or functions correctly. That's increasingly automatable and credential-independent because the code either survives adversarial conditions or it doesn't.

      But what about domains where validation is murkier? Cross-domain analysis, research synthesis, strategic thinking, design decisions? These require judgment calls where "correct" isn't binary. The work can be rigorous and well-reasoned without being formally provable.

      The roborev example is interesting because code review is somewhat amenable to systematic validation. But we're also seeing AI collaboration extend into domains where adversarial testing isn't cleanly applicable—policy analysis, theoretical frameworks, creative work with analytical components.

      I wonder if we need different validation frameworks for different types of work. Technical systems: adversarial testing and formal verification. Analytical/intellectual work: something else entirely. But what?

      The deeper question: when the barrier to producing superficially plausible work drops to near-zero, how do we distinguish genuinely rigorous thinking from sophisticated-sounding nonsense? Credentials were a (flawed) heuristic for that. What replaces them in domains where adversarial testing doesn't apply?