1 pointby gk114 days ago1 comment

storystarling14 days ago
Matches my experience trying to stabilize long LangGraph workflows. The regex checks are fine for formatting but miss the semantic drift that happens when you're actually injecting context. The rubric-based approach makes sense, but I'm not sure how a bootstrapped team implements this without the human labeling budget. I've tried using a stronger model to grade the outputs, but the latency overhead is brutal.