I wanted to abstract away the PDF form building my own html form on top of a data model that can later be used to programmatically fill the PDF .
Since I had 100s of PDFs, I wanted an OCR+LLM pipeline to build a data model for each PDF. Unfortunately, OCR + LLM works ~90% of the time but sometimes fields are missed or mislabeled in the data model.
Does this sometimes get it wrong during programmatic filling? How do you deal with that?
But you're right that it's not as evident as I wanted to, I'm making a small copy update to make it clearer: "Public demo. Your chat messages leave your device and are sent to the selected AI provider. Use sample data only."
(Since there's support for local models, the popup is only displayed when NOT using your own model)
Thanks!
EDIT: the copy update is live, thanks again!
Disclaimer I'm the cofounder, only recommending it because it's legitimately the right shape for your problem. The idea is that the model runs inside a secure enclave (using NVIDIA confidential computing), and the enclave code is open source and is verified via remote attestation upon connection: https://docs.tinfoil.sh/verification/verification-in-tinfoil
Anything you see missing in Copilot to achieve that?
Not sure if you noticed, but there's an arch-diagram in the info popup [1].
[1] https://copilot.simplepdf.com/?share=a7d00ad073c75a75d493228...
Use cases range from:
- Filling foreign-language forms
- Navigating a contract before signing: "can I trust ALL the clauses here?"
- Pre-filling repetitive forms from existing data sources (CRM, EHR, etc. via MCP/RAG)
Copilot is designed to be embedded; our customers ship it white-labeled inside their own products.