I've spent 20 years watching data quality problems get detected downstream — after the bad record is already in the warehouse, the dashboard, the report. I built OpenDQV to move that check upstream: validate records against a YAML data contract before they enter the pipeline.
The core idea: a contract defines what valid data looks like for a domain (banking transaction, healthcare patient, logistics shipment). Every write goes through /validate. Bad data is rejected at the source, not discovered three sprints later.
What's in v1.0.0: - 24 rule types (regex, min/max, date_format, lookup, checksum, cross_field_range, geospatial_bounds, and more) - 30 industry contracts across banking, healthcare, insurance, logistics, pharma, and 22 other domains - Contract lifecycle: draft → review → active, with maker-checker approval. Active contracts are immutable — nothing can silently change a rule in production - MCP server — works with Claude Desktop and Cursor, with a write guardrail so agents can propose contracts but can't activate them without human approval - Hash-chained audit log with NTP clock sync at startup - Importers from Great Expectations, dbt, Soda Core, ODCS 3.1, CSVW, OTel, NDC - Python SDK, CLI, Streamlit governance workbench - ~208 req/s on a 2017 laptop (single container, 4 workers)
What it isn't: it's not a pipeline monitor (Monte Carlo does that), not a dbt test framework (Soda does that). It's a write-time enforcement layer.
Solo project, done in my spare time using Claude Code and my AI Team of experts. 1,000+ tests. Runs on Linux, macOS, Windows, and Raspberry Pi.
Would love some feedback and support to take it to the next level - https://github.com/OpenDQV/OpenDQV
Thanks,
Sunny Sharma - OpenDQV Dev