1 pointby Alechko4 hours ago1 comment

Alechko4 hours ago
Creator here. We work with healthcare orgs in MENA and Latin America and got tired of synthetic data that looks nothing like real hospital records.
The main insight: real medical data is scanned paper with OCR errors, not clean JSON. So we simulate script-aware OCR artifacts (Arabic dot-group confusions, Hebrew shape swaps, Latin diacritic loss) alongside schema variance across facilities.
6 locales ship today: he_IL, ar_SA, ar_EG, es_ES, es_MX, es_AR. No OpenAI key needed for basic generation.
Happy to answer questions about healthcare data challenges or adding new locales.