3 pointsby Beefin4 hours ago1 comment
  • Beefin4 hours ago
    Author here. I'm the founder of Mixpeek — we build multimodal search infrastructure.

    The core problem: most vector search assumes your query is a sentence or a single image. But we kept getting customers who wanted to pass entire video files as queries — a media company searching their archive with a raw broadcast clip, a legal team querying with a full contract PDF, an IP safety pipeline scanning videos frame-by-frame against a brand index.

    The key insight was that the decomposition pipeline we already use for ingestion (split → embed → store) is the same operation needed at query time — just routing output to search instead of write. Same extractor, same chunking, same embedding model. This guarantees query and index vectors are always in the same space.

    The execution is: detect large input → decompose via extractor → batch embed in parallel → N concurrent ANN searches → fuse results (RRF/max/avg). From the caller's perspective, the API shape doesn't change at all.

    One decision I'd be curious to get feedback on: we explicitly dropped an "auto" mode that would pick chunking strategy based on file type. The right decomposition depends on what you're searching for, not just the file itself. Felt like the wrong abstraction to hide. Curious if others have found ways to make auto-config work well here.

    Happy to answer questions about the fusion strategies, the credit model, or the architecture.