The Problem: Most "multi-AI" systems ask the same question to multiple models and aggregate responses. But models never see each other's reasoning, so you miss out on the refinement and synthesis that comes from real deliberation.
What's Different: - Models see previous responses and refine their positions across rounds - Evidence-based: Models can request tools (read files, search code) mid-debate to back up claims - Convergence detection: Automatically stops when consensus is reached or positions solidify - Decision graph memory: Past deliberations inform future ones - Production-ready: 113+ tests, structured logging, graceful error handling
Example Use Case: Ask "Should we use REST or GraphQL?" - Round 1 gives initial positions, Round 2 sees Claude respond to GPT's federation concerns by citing actual codebase complexity, Round 3 reaches consensus with concrete trade-offs documented.
How It Works: Uses Model Context Protocol (MCP) so it plugs into Claude Code, Cursor, or any MCP client. Supports Claude, GPT, Gemini, local models via llamacpp/Ollama. Responses include voting (option + confidence + rationale), tool execution records, and AI-generated summaries.
Built this because I was frustrated with getting contradictory advice from different models and having to manually synthesize. Curious what the community thinks - are there other use cases where deliberative AI makes sense?
Limitations I'm aware of: Can get expensive with reasoning models (60-180s timeouts recommended), context windows grow with debate length.