1 pointby mbradber4 hours ago2 comments
  • mbradber4 hours ago
    This is a basic demo of an idea I've had around open-loop log replay regression testing. The basic idea is to take recorded logs of incidents that occurred in the field and utilize them for regression testing against your autonomy stack for all future deploys moving forward.

    Since this is just a "bag replay," it is inherently restricted to open-loop testing. The ROS bags of previous scenarios will be replayed as input against your new autonomy stack (presumably with the issue resolved) and then the output of your new candidate stack will be recorded. This would primarily be targeted towards teams shipping perception/localization/prediction code, where it would be useful to play pre-recorded inputs with known expected outputs and compare them.

    The analysis for regressions will be done on the recorded outputs. Flagging regressions can either be done by comparing the new outputs against an expected "correct" baseline (this would be particularly useful for localization testing) or against a config of declarative rules about expected outputs during different time windows in the scenario that was replayed.

    Over time, teams could build a regression suite composed of previously recorded real-world failure scenarios.

    Example workflow:

    1. Robot encounters a real-world failure in production (ex: reflective pallet wrap, localization drift, perception miss, repeated recovery loop)

    2. Team saves that incident log

    3. Engineer makes changes to the autonomy stack

    4. CI runs replay tests against previously recorded failure scenarios

    5. Tool verifies whether known failures were reintroduced

    6. Engineer gets pass/fail results before deployment

    The goal is to make regression testing easier for perception/localization/prediction teams using real-world logs.

    I'd love to know if there are reasons I haven't considered as to why this wouldn't be useful in practice. I'm sure there are plenty of experienced robotics engineers who have tried to build or use something like this before. Feedback on the concept would be awesome!

  • Webhix3 hours ago
    [dead]