> what are some of the notable issues that were find using the DST approach
We've discovered a few distributed deadlocks. And in general it's been incredibly helpful in exercising any parts of the system that involve caches or eventual consistency, as these can be really hard to reason about otherwise.
> if a LLM system would be able to help analyze the TRACE logs
Neat idea! For us, the logs are typically being dug into only if there is a failure condition for the test as a whole. Often times we'll inject additional logging or state monitoring to better understand what led to the failure (which is easy enough to do given the reproducibility of the failure in the sim). Trace logs are also being analyzed in the context of the "meta-test", but that's just looking for identical outputs. (More about that here: https://github.com/tokio-rs/turmoil/issues/19#issuecomment-2... )