8 pointsby abdusm4 hours ago4 comments
  • abdusm4 hours ago
    Medical bills contain diagnosis codes. Diagnosis codes reveal conditions. We decided no patient should have to send that to a server just to check if they're being overcharged.

    So we built a bill analyzer where everything runs in the browser: Tesseract OCR, code extraction, pricing lookups against Medicare fee schedules, and 3.3M CMS bundling rule checks. Zero network calls after initial load. The hard problem was size. Raw CMS datasets run to tens of megabytes. We shard so first load is 198KB (479x reduction), detail shards on demand. Zod validation with fail-closed defaults: if data fails schema checks, the feature turns off rather than showing bad numbers.

    12 sprints to get OCR to 95.0% F1 across 19 real bills. The failure modes are specific to medical documents: thermal printer ink where $45 becomes $4,500, layouts where every code shifts one column right, ZIP codes in headers extracted as charge amounts. We built a 7-stage filter pipeline to catch these before they reach the pricing engine.

    The bundling checks are exhaustive. If a hospital bills code A and code B separately, but CMS says B is included in A, that's an unbundling violation. Most audit tools run this server-side. We load all 3.3M pairs into the browser via sharded JSON and in-memory indexing.

    • najarvg3 hours ago
      This is a great effort and thanks for this. Any reasons why the NBC article was linked rather than the tool itself? I found the tool link inside the article but was just curious why the link to the article and whether it would violate any HN terms to paste a link to the tool in the comments
  • 464931682 hours ago
    The software mentioned in the article is Orbdoc. https://orbdoc.com/blog
  • harvey94 hours ago
    The link is to an article. Would it be better to link to the software?
  • vivzkestrel3 hours ago
    how many americans out there resort to medical tourism as a viable alternative to beat hospital costs? any numbers?