What it does: It uses an Isolation Forest model to analyze system metrics. It learns what "normal" behavior looks like and triggers automated recovery (restarting containers, clearing caches) when it detects a multivariate anomaly.
Key Tech:
Python 3.11 (Async/Await)
Machine Learning (Isolation Forest)
Structured JSON Logging
Docker SDK for automated healing
I'm looking for feedback on how other engineers are structuring "Healer" modules to avoid restart loops in automated recovery.