Novum is a Claude Code extension that runs an autonomous ML research loop with mechanical guardrails designed to reduce result fabrication.
The key idea is that instead of relying on prompts like "don't hallucinate", the system enforces constraints mechanically (e.g., preventing edits to protected result files and enforcing phase gates in the research pipeline).
In a recent test run, a single /research command ran autonomously for about 30 hours: 10 hypotheses tested, 4 iteration cycles, and one champion solution selected.
Happy to answer questions or hear feedback on the guard design and research workflow.
- How strict are the phase gates? Like, is it a hard checklist or can the system be more lenient depending on the task? - When picking the champion solution out of 10 hypotheses, what's actually being measured?
Phase gates are hard — it's a PreToolUse hook (phase-gate-guard.js) that checks prerequisites before allowing state.json updates. If something's missing, the write gets denied. Like Phase 1→2 won't pass without literature-review.md (>2000 words), ≥10 papers in metadata, and a references.bib. Phase 6→7 needs a completed tournament with a champion. No exceptions — the agent just can't advance. There are some softer warnings too, but the main gates are hard blocks.
For champion selection — it's Successive Halving. All hypotheses compete in Round 1 (15% of GPU budget), top half survive to Round 2 (30%), champion gets Round 3 (55%). Each round eliminates the bottom half by score. The score is a weighted mix of metric improvement, mechanism signal quality, compute efficiency, and novelty — weights shift depending on venue target (oral cares more about novelty, poster cares more about raw metric gains).