3 pointsby pgbouncer4 hours ago1 comment
  • pgbouncer4 hours ago
    How tagging works: SigLIP (Google's CLIP successor) runs locally through ONNX Runtime. Image embeddings are scored against a 68,000-term vocabulary (pulled from WordNet nouns) via dot product + sigmoid scaling. A self-organizing relevance system adapts the vocabulary to your dataset i.e. frequently matched terms get promoted, irrelevant ones demote to a cold pool. So a photo of red sneakers gets tagged sneakers, footwear, red, fashion without any training or finetuning.

    The progressive encoding system should take the 90-minute cold start (encoding 68K text terms through SigLIP) down to ~30 seconds by encoding a seed vocabulary first, then background-encoding the rest while you're already processing images.

    It's pure Rust, single binary, pip install photon-imager or build from source.

    Would love feedback, contributions, and forks. Some areas where help would be especially welcome: - Windows support (currently macOS + Linux only) - Additional model backends beyond SigLIP - Frontend/UI for browsing tagged collections - Database integration examples (pgvector, Qdrant, etc.)