Show HN: API that falls back to humans when AI is unsure(sync-ai-11fj.vercel.app)

5 pointsby Saurabh_Kumar_a month ago2 comments

Saurabh_Kumar_a month ago
Hey HN, I’m Saurabh, founder of SyncAI.
While building fintech apps previously, I realized that GPT-4 is great, but getting it to read complex, messy invoices reliably (99.9%) is a nightmare. A 5% error rate is fine for a chatbot, but fatal for Accounts Payable.
I got tired of writing RegEx wrappers and retry logic, so I built SyncAI – a 'Safety Layer' for AI Agents.
How it works technically:
We ingest the PDF and run it through a mix of OCR + LLMs.
We calculate a 'Confidence Score' for every field extracted.
If confidence > 95%, it goes straight to your webhook.
If confidence < 95%, it routes to a Human-in-the-Loop (HITL) queue where a human verifies just that specific field.
Your Agent gets a strictly typed JSON 'Golden Record'.
Tech Stack: Python/FastAPI backend, React for the review dashboard, and we use a fine-tuned model for the routing logic.
The OCR Challenge: I know you guys are skeptical (as you should be). So I built a playground where you can upload your messiest, crumpled invoice to try it out without signing up: https://sync-ai-11fj.vercel.app/
Would love your feedback on the routing logic. I’ll be here answering questions all day!
kundan_s__r25 days ago
This is a very pragmatic take. The “90% accuracy is a liability” line resonates — in high-stakes systems, partial automation often costs more than it saves.
What I like here is the field-level confidence gating instead of a single document score. That maps much better to real failure modes, where one bad value (amount, date, vendor) can invalidate the whole record.
One question I’m curious about: how stable are the confidence thresholds over time? In similar systems I’ve seen, models tend to get confidently wrong under distribution shift, which makes static thresholds tricky.
Have you considered combining confidence with explicit intent or scope constraints (e.g., what the system is allowed to infer vs. must escalate), rather than confidence alone?
Overall, this feels much closer to how production systems should treat AI — not as an oracle, but as a component that earns trust incrementally.