about the snapshot, we are using versioned SOPs so we can keep track and iterate on them, right now if an agent picks a SOP and runs it, it runs the current version, if we improve the SOP the agent should pick up the new one. So the SOP gets loaded as a snapshot, runs once, produces the audit log and ends the run. So the harness won't recheck.
A retry if failed a specific step would be interesting though.
Thank you for your comments!