The architecture separates two concerns:
1. AI extraction (Gemini 3 Flash) — reads the SOW and extracts structured factual observations: what's present, what's missing, what's vague. No scoring, no judgement.
2. Deterministic Python scoring engine — takes the extraction output and grades it across 8 categories using calibrated decision trees. No AI in the scoring loop at all.
Why the separation? Because LLMs are terrible at consistent scoring. They inflate scores, change their minds between runs, and can't explain their reasoning traceably. The deterministic engine solves all three problems — same input always produces the same score, and every score can be traced back to specific criteria.
Stack: React 19 frontend, FastAPI backend, Google Cloud Run, Cloudflare. Analysis takes about 30 seconds.
Free, no sign-up, documents are never stored: sowscanner.com
Happy to discuss the architecture, the scoring methodology, or the trade-offs involved.