1 pointby yuvalhaiman hour ago1 comment

yuvalhaiman hour ago
A local-first testing and optimization harness for iterative content creation — run multi-model A/B tests, refine prompts automatically with rubric-based grading, and generate scored synthetic data. Built for brand content workflows, prompt engineering, and LLM evaluation on your own hardware. Includes human in the loop arena to allow matching graders to brand manager preferences and file export tools for synthetic data.