If AI is mission-critical for your business, build your own feedback loop(inato.substack.com)

5 pointsby anatolecallies4 hours ago1 comment

anatolecallies4 hours ago
We run an LLM system that reads messy medical records and determines clinical trial eligibility.
We tried eval platforms, LLM-as-judge, and automated prompt optimizers. None helped with what actually mattered: hidden domain policies that weren’t explicitly written anywhere.
We ended up building our own annotation UI, prompt integration workflow (via Claude Code SDK), and HTML diff-based experiment reports.
The biggest lesson: off-the-shelves Eval/Annotations/Prompt Optimization tools are sub-part because they can only be generic.
Curious whether others building AI products have reached the same conclusion.
- consumer4513 hours ago
  I will be dealing with something along these lines next month. Thanks for sharing. I already had Opus build me an OLTP compatible tracing system, which took all of 4 hours, with annotations and experiments.