2 pointsby PkLavc3 hours ago1 comment
  • PkLavc3 hours ago
    I built Aegis Sentinel because I wanted to move beyond simple threshold-based alerting. Traditional monitoring (CPU > 80%) often misses silent failures like memory leaks that signify an impending crash.

    What it does: It uses an Isolation Forest model to analyze system metrics. It learns what "normal" behavior looks like and triggers automated recovery (restarting containers, clearing caches) when it detects a multivariate anomaly.

    Key Tech:

    Python 3.11 (Async/Await)

    Machine Learning (Isolation Forest)

    Structured JSON Logging

    Docker SDK for automated healing

    I'm looking for feedback on how other engineers are structuring "Healer" modules to avoid restart loops in automated recovery.