11 pointsby RobertSerber17 days ago1 comment
  • jaynamburi3 days ago
    Consistency in AI generated apps usually comes down to treating prompts + outputs like real software artifacts. What’s worked for us: versioned system prompts, strict schemas (JSON + validators), golden test cases, and regression evals on every change. We snapshot representative inputs/outputs and diff them in CI the same way you’d test APIs. Also important: keep model upgrades behind feature flags and roll out gradually.

    Real example: in one LLM-powered support tool, a minor prompt tweak changed tone and broke downstream parsers. We fixed it by adding contract tests (expected fields + phrasing constraints) and running batch replays before deploy. Think of LLMs as nondeterministic services you need observability, evals, and guardrails, not just “better prompts.”