2 pointsby hcmhcs010 hours ago2 comments
  • vunderba8 hours ago
    Nice job.

    I created something like this over a decade ago for Windows that would let you hit a globally registered shortcut to hover a magnifying glass over text in a windowed/fullscreen game - I used to use it while I was studying Chinese with emulated SNES RPGs.

    Back then the best we could do was tesseract OCR feeding down to the open CC-CEDICT dictionary. It was primitive but sufficed!

    • hcmhcs08 hours ago
      That's a really cool use case — OCR on emulated RPGs for language study! I didn't know Tesseract could handle pixel fonts. How well did it work?

      I went with Apple's Vision and Translation frameworks since they were the easiest path for me, but the downside is it requires macOS 15+. I'm thinking about adding Tesseract as an alternative OCR engine to support older versions — sounds like it could work well enough!

      • vunderba7 hours ago
        Thanks! Honestly? Initially very poorly.

        What I ended up doing was generating around a dozen versions of a screenshot in realtime (all with different combinations of thresholding, segmentation parameters, resolution scaling, and denoising) behind the scenes. Then it would fire Tesseract off on all of them in parallel threads and let them “vote” on the result.

        After I set that up, the accuracy improved significantly.

        If you're looking for an alternative rather than Tesseract - I'd actually recommend Surya. I've had a lot of success with it out of the box with doing OCR on comics.

        https://github.com/datalab-to/surya

        • hcmhcs07 hours ago
          That's a clever approach — running multiple preprocessing variants in parallel and letting them vote. Almost like an ensemble for OCR!

          Thanks for the Surya recommendation, I hadn't come across it before. Will definitely check it out!

  • hcmhcs07 hours ago
    Hi, I'm the author — a student developer. I've been really into AI agents lately and spend a lot of time reading system instructions and source code on OpenClaw. Problem is, my English isn't great, so I constantly needed to translate.

    Switching to a Google Translate tab every time or asking an AI to translate broke my flow completely. So I built this to translate right where I'm working — no tab switching, no copy-paste.

    Built with Claude Code over a weekend. Happy to answer any questions!