In our experience, three things matter most: - deterministic reruns (same input, same output), - traceable claims (path/hash/line), - explicit unknowns instead of hidden blind spots.
For teams that cannot share repo contents, we're also exploring offline/local execution models.
If you've run this process in practice: 1) What output format made reviewers trust the result? 2) What made reports stall or get ignored? 3) If you required local/offline execution, what constraints mattered most?
Context: I'm building in this space, so I'm looking for concrete operator feedback, not generic opinions.