The main insight: real medical data is scanned paper with OCR errors, not clean JSON. So we simulate script-aware OCR artifacts (Arabic dot-group confusions, Hebrew shape swaps, Latin diacritic loss) alongside schema variance across facilities.
6 locales ship today: he_IL, ar_SA, ar_EG, es_ES, es_MX, es_AR. No OpenAI key needed for basic generation.
Happy to answer questions about healthcare data challenges or adding new locales.