I ended up using LangGraph to manage the workflow, Ansible for the 'hands' (execution), and a local SQLite DB to calculate standard deviations for anomaly detection so the LLM stays out of the math business. Curious to hear how others are handling stateful orchestration for AIOps.