6 pointsby pfdomizer3 hours ago2 comments
  • leechii13372 hours ago
    how does this compare to e.g. docling, mineru. hard to keep track of all the ocr libs that are being posted.
    • pfdomizer2 hours ago
      Docling and MinerU are great for structured output like markdown and table extraction, but they run at 1-5 pages/s because of the VLMs under the hood.

      Turbo-OCR gives you bounding boxes, text, and layout regions at multiple hundred img/s depending on the text density. When you have many PDFs to process, it makes a huge difference. You can always pipe the output into a VLM for the pages that need deeper extraction. Structured extraction and markdown output are on the roadmap (without sacrificing too much speed).

  • armando15142 hours ago
    Exactly what I was looking for!
    • pfdomizer2 hours ago
      Thanks, happy to hear that!