1 pointby jamesgresql5 hours ago1 comment

jamesgresql5 hours ago
This is a no-nonsense walkthrough of doing hybrid search inside Postgres without spinning up a separate search service.
A few takeaway: - Postgres’s native `tsvector/ts_rank` stuff works ok for basic text matching, but it doesn’t account for global term frequency like BM25 does , so rankings can feel “flat” or noisy as soon as you go beyond simple queries (it's also slow). - Using a BM25 index (via extensions like `pg_search`) actually gives you relevance scores similar to what you’d expect out of modern search engines, and you can use stemmers/tokenization directly in SQL. BM25 is the star of this story. - Vector search fills in the semantic gaps (so “database optimization” isn’t limited to exact keywords), but you still don’t want to throw out lexical relevance. The trick is making it additive, not just adding scores together. - RRF (Reciprocal Rank Fusion) is a neat practical tool here. It sidesteps trying to normalize totally different scoring systems by just focusing on rank positions.
If you’re building anything where relevance matters (docs, product search, help articles) having BM25 + vector makes a big difference over vanilla FTS + embeddings alone. It also keeps everything in Postgres, which simplifies consistency/ops compared to an external search cluster.