I have some lecture slides as image-only PDF (Hungarian language with a sparkle of English and Latin (biology)). I tried the tool on it and I had the following experience:
- proofreading with the overlay seems like a good idea, actually it is unusable when the original text has colors, and you need to recognize diacritic marks. Being able to show the original in grayscale or black&white could help. (BW worked, but Grayscale left everything colored)
- For proofreading the ebook mode was the most useful, I immediately spotted lots of errors that I could not see with overlay. A quick switch between the modes would be useful
- Editing text is not efficient when error rate is high (Hungarian language is not supported, that caused it mostly I guess), the interface has high overhead for mass corrections.
Very good idea, I think after a little polish it would even fit my usecase. For more traditional OCR usecases than mine it is probably already great.
A native MacOS or Windows application could use the OCR facilities of the operating system and, in my experience, both produce results that are far better than Tesseract.
Have you looked at EasyOCR?