1 pointby jacoblav3 hours ago3 comments
  • jacoblav2 hours ago
    Post: I've been building PDF Insight (https://pdf-insight.com) - an AI tool that answers questions about PDFs and shows exactly where the answer came from.

    The core problem: My friend is an accountant. Every tax season he manually ctrl+F's through dozens of client PDFs (T4s, RRSP receipts, bank statements) to extract numbers. I watched him spend 20 minutes finding one RRSP contribution total across 4 documents.

    So I built this. Upload multiple PDFs, ask "what's the total RRSP contribution?" and get the answer with yellow highlights on the exact source text.

    Technical stack: - Backend: FastAPI + pdfplumber for text extraction - PDF rendering: react-pdf with custom highlight overlay positioning - Chunking: Layout-aware splitting that respects tables and preserves bbox coordinates - Retrieval: Hybrid approach (BM25 + semantic embeddings + numeric normalization for currency/percentages)

    The hard part was highlight precision. Early versions highlighted entire pages. Now it targets specific values (e.g., "$50,000.00") by extracting highlight_targets from the LLM response and matching them to chunk bboxes.

    Free tier: 10 queries/month. Would love feedback from anyone who deals with multi-document PDF workflows.

  • 3 hours ago
    undefined
  • isjdjsidjsidb3 hours ago
    [dead]