47 pointsby andrewbarba4 days ago2 comments
  • jakozaur3 days ago
    This feels similar to TurboBuffer, which is also built on top of S3 storage. TurboBuffer has been a leader in this space, powering vector search for major companies like Cursor, Linear, and Notion.

    It seems AWS is leveraging its strong S3 brand to compete directly in the vector database market.

    For more context, check out the TurboBuffer architecture docs and Notion’s presentation from the Data Council:

    https://turbopuffer.com/docs/architecture

    https://www.youtube.com/watch?v=_yb6Nw21QxA

    (anti-disclaimer: I'm not affiliated with TurboBuffer in any way.)

    • benji10093 days ago
      turbopuffer is the name btw
  • bob10293 days ago
    I still contend that what most people want is traditional full text search, not another layer of black box weirdness behind the LLM.

    You already have a model with incredibly powerful semantic understanding. Why do we need the document store to also be a smartass? The model can project multiple OR clauses into the search term based upon its interpretation of the context.

    If you are using something like Lucene, queries are extremely fast and the maximum # of supported documents in one index far exceeds what AWS says they can support here.

    • pilotneko3 days ago
      Keyword alone sucks for negation. Searching a set of patient documents for “Which of my patients has COPD?” to get a set of responses that states “COPD not indicated” will not endear you to the physician using your tool. Hybrid (keyword + semantic) is much more all-encompassing.
      • bob10293 days ago
        Forwarding the users query directly to the document store seems ridiculous to me. The whole point is for the LLM to interpret the context and issue multiple targeted queries based upon the interpretation(s) arrived at.

        The LLM is the semantic part. FTS is the keyword part. This is the hybrid you're looking for.

        • pilotneko3 days ago
          Sometimes you are searching for supporting evidence that is semantically related. COPD was just an example, you won’t get a direct keyword match if the Physician is searching for lung disease.