Show HN: Novum – Automated ML Research Pipeline with Anti-Fabrication Guards(github.com)

1 pointby euanai6 hours ago1 comment

euanai6 hours ago
Hi HN! I'm the author.
Novum is a Claude Code extension that runs an autonomous ML research loop with mechanical guardrails designed to reduce result fabrication.
The key idea is that instead of relying on prompts like "don't hallucinate", the system enforces constraints mechanically (e.g., preventing edits to protected result files and enforcing phase gates in the research pipeline).
In a recent test run, a single /research command ran autonomously for about 30 hours: 10 hypotheses tested, 4 iteration cycles, and one champion solution selected.
Happy to answer questions or hear feedback on the guard design and research workflow.
- isaackeitor5 hours ago
  Two things I'm curious about:
  - How strict are the phase gates? Like, is it a hard checklist or can the system be more lenient depending on the task? - When picking the champion solution out of 10 hypotheses, what's actually being measured?
  - euanai5 hours ago
    Great questions!
    Phase gates are hard — it's a PreToolUse hook (phase-gate-guard.js) that checks prerequisites before allowing state.json updates. If something's missing, the write gets denied. Like Phase 1→2 won't pass without literature-review.md (>2000 words), ≥10 papers in metadata, and a references.bib. Phase 6→7 needs a completed tournament with a champion. No exceptions — the agent just can't advance. There are some softer warnings too, but the main gates are hard blocks.
    For champion selection — it's Successive Halving. All hypotheses compete in Round 1 (15% of GPU budget), top half survive to Round 2 (30%), champion gets Round 3 (55%). Each round eliminates the bottom half by score. The score is a weighted mix of metric improvement, mechanism signal quality, compute efficiency, and novelty — weights shift depending on venue target (oral cares more about novelty, poster cares more about raw metric gains).