4 pointsby carlosamg3 hours ago3 comments
  • kurenn3 hours ago
    hi, I'm Kuri part of the core development team, any suggestion or feedback will be highly appreciated.
    • hipvlady2 hours ago
      Saying "agents lying" isn't usually deception. It's the model telling a story about a gap that the runtime couldn't fill. This gap could be a rule or a state that the runtime was supposed to respect, but wasn't actually enforced. This means that the contract is completed as text. It is best to use a harness that enforces the SOP out of the model's reach. This is because if you leave anything as prose in context, it will lead to one compaction or one subagent spawn from evaporating. The most difficult case to test would be an SOP that depends on a state that changes under the agent. If step 3 says "use the latest config" and another process updated it after the agent read it at step 1, will the harness re-check, or enforce against the snapshot the agent loaded?
      • carlosamgan hour ago
        those are good points, the way we thought about the "use the latest config" issue was to instead of using references into somewhere else, if the SOP is critical, we ensure it loads those configs in the step, as a deterministic process, that way we know for sure that they were loaded, if something was not loaded the SOP fails loudly and produces an audit log of the failure, so it can be picked up and fixed.

        about the snapshot, we are using versioned SOPs so we can keep track and iterate on them, right now if an agent picks a SOP and runs it, it runs the current version, if we improve the SOP the agent should pick up the new one. So the SOP gets loaded as a snapshot, runs once, produces the audit log and ends the run. So the harness won't recheck.

        A retry if failed a specific step would be interesting though.

        Thank you for your comments!

  • reactance0083an hour ago
    [flagged]
  • ricalanis42 minutes ago
    [flagged]