Hi HN! I built Khaos to solve a problem I kept hitting: testing Kafka monitoring and alerting with realistic traffic patterns.
It lets you:
- Simulate scenarios like consumer lag, hot partitions, broker failures, rebalance storms
- Generate realistic message schemas with Faker (names, emails, addresses, etc.)
- Run correlated event flows across multiple topics
- Test against local Docker cluster or external Kafka (Confluent Cloud, MSK, etc.)
Example:
khaos run consumer-lag -d 60
This spins up a 3-broker cluster, creates topics, produces/consumes at configured rates, and injects the incident (slow consumers) at the scheduled time.
Built with Python, confluent-kafka, and Typer. Single `uv tool install khaos-cli` to get started.
Would love feedback on the scenario DSL and what other failure modes would be useful to simulate.