A concrete eval method: run it on a file that caused trouble in the past, then on the same file after patch/refactor. Before vs. after tends to be clearer than looking at scores in isolation.
Tech notes: multi-language via tree-sitter (Python, JS/TS, Java, Go, Rust, C/C++, etc.), 100% local (no telemetry / no external APIs), deterministic (no LLMs for the scores).
Happy to answer questions about the metrics.