2 pointsby cjlooia day ago1 comment
  • cjlooia day ago
    PDF-based pipelines are fundamentally lossy and compute-heavy—whether they rely on OCR, GROBID, or LLM-based parsing. They're simply not good enough for accurate, scientific agents at scale.

    To fix this, I'm launching ScienceStack API: a lossless, node-based API for scientific papers with LaTeX source, starting with arXiv.

    It currently covers 150k+ arXiv papers, mainly in CS, Math, and Physics.

    Every paper also ships with a WYSIWYG interactive reader at sciencestack.ai/paper/{arxivId}. Example: https://www.sciencestack.ai/paper/2512.24601v1

    I’m giving away 5× 3-month Pro keys to early commenters who are building in this space (scientific tooling, agents, copilots, RAG etc). I’d love to hear what you’re working on