3 pointsby fidorka6 hours ago1 comment
  • fidorka6 hours ago
    Building an open-source tool that makes your screen activity searchable via AI because we wanted Claude/Cursor to know what we'd been working on without explaining it every time.

    Processing hundreds of screenshots/hour forced us to optimize for token costs.

    The surprise: send video, not images

    - Single screenshot (1698×894): 1,812 tokens

    - Same frame in video: 258 tokens (Gemini 2.5) or ~70 tokens (Gemini 3)

    - Full 8-hour workday: ~$1-3

    Video gives you timestamps for free and compresses well since consecutive frames are nearly identical. We keep costs down by having the LLM write short summaries while running OCR locally for text extraction.