It uses SDL2 for rendering and libcurl for the network part, and works with both local servers (llama.cpp-style) and in theory remote APIs.
The workflow is: open image -> zoom/pan -> draw rectangle -> send -> get text
I wanted something lightweight and easy to understand, without large frameworks, and also as a way to experiment with vision-capable models in a simple pipeline.
Some features:
rectangle selection UI zoom and pan cancel running requests minimal dependencies
It’s still pretty early, but usable. https://github.com/haschka/ocr_tool