I built a PDF Q&A tool that highlights exact source locations. Any feedback?(pdf-insight.com)

1 pointby jacoblav3 hours ago3 comments

jacoblav2 hours ago
Post: I've been building PDF Insight (https://pdf-insight.com) - an AI tool that answers questions about PDFs and shows exactly where the answer came from.
The core problem: My friend is an accountant. Every tax season he manually ctrl+F's through dozens of client PDFs (T4s, RRSP receipts, bank statements) to extract numbers. I watched him spend 20 minutes finding one RRSP contribution total across 4 documents.
So I built this. Upload multiple PDFs, ask "what's the total RRSP contribution?" and get the answer with yellow highlights on the exact source text.
Technical stack: - Backend: FastAPI + pdfplumber for text extraction - PDF rendering: react-pdf with custom highlight overlay positioning - Chunking: Layout-aware splitting that respects tables and preserves bbox coordinates - Retrieval: Hybrid approach (BM25 + semantic embeddings + numeric normalization for currency/percentages)
The hard part was highlight precision. Early versions highlighted entire pages. Now it targets specific values (e.g., "$50,000.00") by extracting highlight_targets from the LLM response and matching them to chunk bboxes.
Free tier: 10 queries/month. Would love feedback from anyone who deals with multi-document PDF workflows.
3 hours ago
undefined
isjdjsidjsidb3 hours ago
[dead]