We do at Discourse, in thousands of databases, and it's leveraged in most of the billions of page views we serve.
> Pre- vs. Post-Filtering (or: why you need to become a query planner expert)
This was fixed in version 0.8.0 via Iterative Scans (https://github.com/pgvector/pgvector?tab=readme-ov-file#iter...)
> Just use a real vector database
If you are running a single service that may be an easier sell, but it's not a silver bullet.
- halfvec (16bit float) for storage - bit (binary vectors) for indexes
Which makes the storage cost and on-going performance good enough that we could enable this in all our hosting.
For anyone who hasn't seen it yet: it turns out many embedding vectors of e.g. 1024 floating point numbers can be reduced to a single bit per value that records if it's higher or lower than 0... and in this reduced form much of the embedding math still works!
This means you can e.g. filter to the top 100 using extremely memory efficient and fast bit vectors, then run a more expensive distance calculation against those top 100 with the full floating point vectors to pick the top 10.
Making the storage cost of the index 32 times smaller is the difference of being able to offer this at scale without worrying too much about the overhead.
By moving the values to a single bit, you’re lumping stuff together that was different before, so I don’t think recall loss would be expected.
Also: even if your vector is only 100-dimensional, there already are 2^100 different bit vectors. That’s over 10^30.
If your dataset isn’t gigantic and has documents that are even moderately dispersed in that space, the likelihood of having many with the same bit vector isn’t large.
https://ieeexplore.ieee.org/abstract/document/6296665/ (https://refbase.cvc.uab.cat/files/GLG2012b.pdf)
I find that cool and surprising.
On the iterative scan side, how do you prevent this from becoming too computationally intensive with a restrictive pre-filter, or simply not working at all? We use Vespa, which means effectively doing a map-reduce across all of our nodes; the effective number of graph traversals to do is smaller, and the computational burden mostly involves scanning posting lists on a per-node basis. I imagine to do something similar in postgres, you'd need sharded tables, and complicated application logic to control what you're actually searching.
How do you deal with re-indexing and/or denormalizing metadata for filtering? Do you simply accept that it'll take hours or days?
I agree with you, however, that vector databases are not a panacea (although they do remove a huge amount of devops work, which is worth a lot!). Vespa supports filtering across parent-child relationships (like a relational database) which means we don't have to reindex a trillion things every time we want to add a new type of filter, which with a previous vector database vendor we used took us almost a week.
I can totally see that at a trillion scale for a single shard you want a specialized dedicated service, but that is also true for most things in tech when you get to the extreme scale .
iterative scans are more of a bandaid for filtering than a solution. you will still run into issues with highly restrictive filters. you still need to understand ef_search and max_search_tuples. strict vs relaxed ordering, etc. it's an improvement for sure, but the planner still doesn't deeply understand the cost model of filtered vector search
there isn't a general solution to the pre- vs post-filter problem—it comes down to having a smart planner that understands your data distribution. question is whether you have the resources to build and tune that yourself or want to offload it to a service that's able to focus on it directly
In theory these can be more efficient than plain pre/post filtering.
- Related Topics, a list of topics to read next, which uses embeddings of the current topic as the key to search for similar ones
- Suggesting tags and categories when composing a new topic
- Augmented search
- RAG for uploaded files
also, when i run a discourse search does it really do both a regular keyword search and a vector search? how do you combine results?
does all discourse instances have those features? for example, internals.rust-lang.org, do they use pgvector?
You can upload files that will act as RAG files for an AI bot. The bot can also have access to forum content, plus the ability to run tools in our sandboxed JS environment, making it possible for Discourse to host AI bots.
> also, when i run a discourse search does it really do both a regular keyword search and a vector search? how do you combine results?
Yes, it does both. In the full page search it does keyword first, then vector asynchronously, which can be toggled by the user in the UI. It's auto toggled when keyword has zero results now. Results are combined using reciprocal rank fusion.
In the quick header search we simply append vector search to keyword search results when keyword returns less than 4 results.
> does all discourse instances have those features? for example, internals.rust-lang.org, do they use pgvector?
Yes, all use PGvector. In our hosting all instances default to having the vector features enabled, we run embeddings using https://github.com/huggingface/text-embeddings-inference
- We're IVF + quantization, can support 15x more updates per second comparing to pgvector's HNSW. Insert or delete an element in a posting list is a super light operation comparing to modify a graph (HNSW)
- Our main branch can now index 100M 768-dim vector in 20min with 16vcpu and 32G memory. This enables user to index/reindex in a very efficient way. We'll have a detailed blog about this soon. The core idea is KMeans is just a description of the distribution, so we can do lots of approximation here to accelerate the process.
- For reindex, actually postgres support `CREATE INDEX CONCURRENTLY` or `REINDEX CONCURRENTLY`. User won't experience any data loss or inconsistency during the whole process.
- We support both pre-filtering and post-filtering. Check https://blog.vectorchord.ai/vectorchord-04-faster-postgresql...
- We support hybrid search with BM25 through https://github.com/tensorchord/VectorChord-bm25
The author simplifies the complexity of synchronizing between an existing database and a specialized vector database, as well as how to perform joint queries on them. This is also why we see most users choosing vector solution on PostgreSQL.
[1] https://cdn.hashnode.com/res/hashnode/image/upload/v17434120...
The core part is a quantization technique called RaBitQ. We can scan over the bit vector to have an estimation about the real distance between query and data. I'm not sure what do you mean by "miss" here. As the approximate nearest neighbor index, all the index including HNSW will miss some potential candidates.
maintenance_work_mem begs to differ.
> You rebuild the index periodically to fix this, but during the rebuild (which can take hours for large datasets), what do you do with new inserts? Queue them? Write to a separate unindexed table and merge later?
You use REINDEX CONCURRENTLY.
> But updating an HNSW graph isn’t free—you’re traversing the graph to find the right place to insert the new node and updating connections.
How do you think a B+tree gets updated?
This entire post reads like the author didn’t read Postgres’ docs, and is now upset at the poor DX/UX.
That kills the indexing process, you cannot let it run with limited amount of memory.
> How do you think a B+tree gets updated?
In a B+Tree, you need to touch log H of the pages. In HNSW graph - you need to touch literally thousands of vectors once your graph gets big enough.
Considering the default value is 64 MB, it’s already throttled quite a bit.
> maintenance_work_mem
sure, but the knob existing doesn't solve the operational challenge of safely allocating GBs of RAM on prod for hours-long index builds.
> REINDEX CONCURRENTLY
this is still not free not free—takes longer, needs 2-3x disk space, and still impacts performance.
> HNSW vs B+tree
it's not that graph updates are uniquely expensive. vector workloads have different characteristics than traditional OLTP, and pg wasn't originally designed for them
my broader point: these features exist, but using them correctly requires significant Postgres expertise. my thesis isn't "Postgres lacks features"—it's "most teams underestimate the operational complexity." dedicated vector DBs handle this automatically, and are often going to be much cheaper than the dev time put into maintaining pgvector (esp. for a small team)
How does it not? You should know the amount of freeable memory your DB has, and a rough idea of peak requirements. Give the index build some amount below that.
> this is still not free not free—takes longer, needs 2-3x disk space, and still impacts performance.
Yes, those are the trade-offs for not locking the table during the entire build. They’re generally considered acceptable.
> it's "most teams underestimate the operational complexity.
Agreed, which is why I don’t think dev teams should be running DBs if they lack expertise. Managed solutions (for Postgres; no idea on Pinecone et al.) only remove backup and failover complexity; tuning various parameters and understanding the optimizer’s decisions are still wholly on the human. RDBMS are some of the most complicated pieces of software that exist, and it’s absurd that the hyperscalers pretend that they aren’t.
HNSW indices are big. Let's suppose I have an HNSW index which fits in a few hundred gigabytes of memory, or perhaps a few terabytes. How do I reasonably rebuild this using maintenance_work_mem? Double the size of my database for a week? What about the knock-on impacts on the performance for the rest of my database-stuff - presumably I'm relying on this memory for shared_buffers and caching? This seems like the type of workload that is being discussed here, not a toy 20GB index or something.
> You use REINDEX CONCURRENTLY.
Even with a bunch of worker processes, how do I do this within a reasonable timeframe?
> How do you think a B+tree gets updated?
Sure, the computational complexity of insertion into an HNSW index is sublinear, the constant factors are significant and do actually add up. That being said, I do find this the weakest of the author's arguments.
And if one needs the transactional/consistency semantics, hybrid/filtered-search, low latencies, etc - consider a SOTA Postgres system like AlloyDB with AlloyDB ScaNN which has better scaling/performance (1B+ vectors), enhanced query optimization (adaptive pre-/post-/in-filtering), and improved index operations.
Full disclosure: I founded ScaNN in GCP databases and currently lead AlloyDB Semantic Search. And all these opinions are my own.
BM25 with query rewriting & expansion can do a lot of heavy lifting if you invest any time at all in configuring things to match your problem space. The article touches on FTS engines and hybrid approaches, but I would start there. Figure out where lexical techniques actually break down and then reach for the "semantic" technology. I'd argue that an LLM in front of a traditional lexical search engine (i.e., tool use) would generally be more powerful than a sloppy semantic vector space or a fine tuning job. It would also be significantly easier to trace and shape retrieval behavior.
Lucene is often all you need. They've recently added vector search capabilities if you think you really need some kind of hybrid abomination.
I got a significant boost (from 65% on average to over 80%) by adding a proper reranker and query rewriting (3 additional phrases to search for).
I think embeddings are overrated in that blog posts often make you believe they are the end of the story. What I've found is that they should be rather treated as a lightweight filtering/screening tool to quickly find a pool of candidates as a first stage, before you do the actual stuff (apply a reranker). If BM25 already works as well as a pre-filtering tool, you don't even need embeddings (with all the indexing headaches).
The point of YAGNI is that you shouldn't over-engineer up front until you've proven that you need the added complexity.
If you need vector search against 100,000 vectors and you already have PostgreSQL then pgvector is a great YAGNI solution.
10 million vectors that are changing constantly? Do a bit more research into alternative solutions.
But don't go integrating a separate vector database for 100,000 vectors on the assumption that you'll need it later.
There's a big opportunity cost involved in optimizing prematurely. 9/10 times you're wasting your time, and you may have found product-market fit faster if you had spent that time trying out other feature ideas instead.
If you hit a point where you have to do a painful migration because your product is succeeding that's a point to be celebrated in my opinion. You might never have got there if you'd spent more time on optimistic scaling work and less time iterating towards the right set of features.
He seemed to really want me to blame everything on my predecessor and call some kind of crisis, and seemed annoyed by my analysis, which was confusing at the time. But yeah, there are absolutely tradeoffs you make early in a startups life, you just have to know where to take shortcuts and where you at least leave the architecture open to scaling. My biggest critique is that they were at least a year, if not two, past the point where they should have left ultra scrappy startup mode that just throws things at the wall and started building with a longer view.
I have also seen a friend build out a flawless architecture ready to scale to millions of users, but never got close to a product fit. I felt he wasted at least 6 months building out all this infra scaffolding for nothing.
Or it should be a well defined problem. It's easier to determine the right solution after you've already encountered the problem, maybe in a past project. If you're unsure, just keep your options open.
So 95% of use-cases.
Most of the rest of his complaints comes down to this is complex stuff. True, but its not a solution, its a tool used in making a solution. So when using pg_vector directly, you probably need to understand databases to a more significant degree than a custom solution that won't work for you the moment your requirements change. You surely need to understand databases more than the author does. He doesn't point to a single thing that pg_vector doesn't do or doesn't do well. He just complains it hard to do.
In summary, pg_vector is a toolkit for building vector based functionality, not a custom solution for a specific use case. What is best for you comes down to your team's skills and expertise with databases and if your specific requirements will change. Choose poorly and it could go very badly.
He very clearly complains that IVFFlat indexes have to be periodically rebuilt, that HNSW has high overhead (both during inserts and rebuilds) and that the query planner is not particularly good at optimizing queries involving this kind of indexes. None of this is a problem if the dataset is puny enough, but deadly if you want to scale up without investing significant engineering.
1. Updates: I wrote my own implementation of the HNSW with many changes compared to the paper. The result is that the data structure can be updated while it receives queries, like the other Redis data types. You add vectors with VADD, query for similarity with VSIM, delete with VREM. Also deleting vectors will not perform just a thumbstone deletion. The memory is actually reclaimed immediately.
2. Speed: The implementation is fast, fully threaded reads, partially threaded writes: even for insertion it is easy to stay in the few hundreds of ops/sec, and querying with VSIM is like 50k ops/sec in normal hardware.
3. Trivial: You can reimplement your use case in 10 minutes including learing how it works.
Of course it costs some memory, but less than you may guess: it supports quantization by default, transparently, and for a few millions of elements (most use cases) the memory usage is very low, totally affordable.
Bonus point: if you use vector sets you can ask my help for free. At this stage I support people using vector sets directly.
I'll link here the documentation I wrote myself as it is a bit hard to find, you know... a README inside the repository , in 2025, so odd: https://github.com/redis/redis/blob/unstable/modules/vector-...
P.S. in the README there is stale mention about replication code being not really tested. I filled the gap later and added tests, fixed bugs and so forth.
So basically, I'd love to have my storage provider give me a vector search API, which I guess is what Amazon S3 vectors is supposed to be (https://aws.amazon.com/s3/features/vectors/)?
Curious to hear what experience people have had with this.
[2] https://dev.to/cocoindex/how-to-build-index-with-text-embedd...
Is this really how it works? That seems like it’s returning an incorrect result.
Chroma implements SPANN and SPFresh (to avoid the limitations of HNSW), pre-filtering, hybrid search, and has a 100% usage-based tier (many bills are around $1 per month).
Chroma is also apache 2.0 - fully open source.
For pre-filtering "You’re still searching millions of vectors" isn't valid argument, because the author does not relate to any alternative, and post-filtering is even worse.
For example, give me product listings that match the search term (by vector search), and are made by company X (copanies being a separate table). Sort by vector similarity of the search term and give me top 100?.
We have even largely moved away from ElasticSearch to Postgres where we can, because it's just so much easier to implement with new complex filters without needing to add those other tables' data to the index of e.g. "products" every time.
Edit: Ah I guess this is touched a bit in the article with "Pre- vs. Post-Filtering" - I guess you just do the same as with ElasticSearch, predict what you'll want to filter with, add all of that to metadata and keep it up to date.
I this taste with most posts about Postgres that don’t come from “how we scaled Postgres to X”. It seems a lot of writers are trying to ride the wave of popularity, creating a ton of noise that can end up as tech debt for readers
From what I've seen is fast, has excellent API, and is implemented by a brilliant engineer in the space (Antirez).
But not using these things beyond local tests, I can never really hold opinions over those using these systems in production.
Especially in the AI and startup space.
ANN-Benchmark exists but it’s algorithm-focused rather than full-stack database testing, so it doesn’t capture real-world ops like concurrent writes, filtering, or resource management under load.
Would be great to see something more comprehensive and vendor-neutral emerge, especially testing things like: tail latencies under concurrent load, index build times vs quality tradeoffs, memory/disk usage, and behavior during failures/recovery
clickbench has 100m rows of data only, which makes it not comprehensive benchmark at all.
It’s worth recognising the strengths of pgvector:
• For small-to-medium scale workloads (e.g., up to millions of vectors, relatively static data), embedding storage and similarity queries inside Postgres can be a simple, familiar architecture.
• If you already use Postgres and your vector workloads are light (low QPS, few dimensions, little metadata filtering / low concurrency), then piggy-backing vector search on Postgres is attractive: minimal added infrastructure.
• For teams that don’t want to introduce a separate vector service, or want to keep things within an existing RDBMS, pgvector is a compelling choice.
From our experience helping users scale vector search in production, several pain-points emerge when scaling vector workloads inside a general-purpose RDBMS like Postgres:
1. Index build / update overhead • Postgres isn’t built from the ground-up for high-velocity vector insertions plus large-scale approximate nearest neighbour (ANN) index maintenance, for example, lacking RaBitQ binary quantization supported in purpose built vector db like Milvus.
• For large datasets (tens/hundreds of millions or beyond), building or rebuilding HNSW/IVF indices inside Postgres can be memory- and time-intensive.
• In production systems where vectors are continuously ingested, updated, deleted, this becomes operationally tricky.
2. Filtered search
• Many use-cases require combining vector similarity with scalar/metadata filters (e.g., “give me top 10 similar embeddings where user_status = ‘active’ AND time > X”).
• Need to understand low level planner to juggle pre-filtering, post-filtering, and planner’s cost model wasn’t built for vector similarity search. For a system not designed primarily as a vector DB, this gets complex. Users shouldn't have to worry about such low level details.
3. Lack of support for full-text search / hybrid search
• Purpose built vector db such as Milvus has mature full-text search / BM25 / Sparse vector support.
Speaking of "production" -- in what world is "10+ GB" a lot of RAM for a database server?
I have to agree: the author should definitely not use Postgres or pgvector in production...
From my non-production experiments with it, the main limitation is that you can only retrieve up to 30 top_k results, which means you can't use it with a re-ranker, or at least not as effectively. For many production use cases that will be a deal breaker.
Yup, I think this here explains the popularity of pgvector. If $64/month seems like a lot to you, just use pgvector. If it seems cheap, then your usage is complex enough to want a proper vector DB.
Yesterday, I had a conversation with someone whether it's better to squeeze it all into the system prompt. Someone else argued that even with large context, it's difficult to fit 10k questions into a system prompt.
So I think there's just this mismatch in usage.
Yes, young engineers get all hot and bothered over the most recent tools but - they have no idea how things work and run.
I worked on a project that wanted to use a hot and frothy vector database. The issue - ok, where are we getting the 1/4-1/2 time person to manage it? Product engineers - derp? what? People who live in node and python cutting edge don't really think about the actual production implications of their choices.
> None of the blogs mention that building an HNSW index on a few million vectors
> can consume 10+ GB of RAM or more (depending on your vector dimensions and
> dataset size). On your production database. While it’s running. For potentially
> hours.
10 GB? Oh jolly gosh! That will almost show up as a pixel or two on my metrics dashboard.Who are these people that run production Postgres clusters on tiny hardware and then complain? Has AWS marketing really confused people into believing that some EC2 "instance size" is an actual server?
The question is, at what point does it not work?
Vacuuming is grief enough. Rebuilding the index sounds like more of a nightmare than with Solr/Lucene. And what happens when indexing fails? In Solr/Lucene it used to mean you were dead. I'm sure they fixed that, but at some level you need to either be behind on one while you reindex or figure out some queueing system that works like transactions.
https://github.com/neuml/txtai/blob/master/examples/78_Acces...
How hard is it to move that process to another machine? Could you grab a dump of the relevant data, spin up a cloud instance with 16GB of RAM to build the index and then cheaply copy the results back to production when it finishes?
> The problem is that index builds are memory-intensive operations, and Postgres doesn’t have a great way to throttle them. You’re essentially asking your production database to allocate multiple (possibly dozens) gigabytes of RAM for an operation that might take hours, while continuing to serve queries.
> You end up with strategies like:
Write to a staging table, build the index offline, then swap it in (but now you have a window where searches miss new data)
Maintain two indexes and write to both (double the memory, double the update cost)
Build indexes on replicas and promote them
Accept eventual consistency (users upload documents that aren’t searchable for N minutes)
Provision significantly more RAM than your “working set” would suggest
> None of these are “wrong” exactly. But they’re all workarounds for the fact that pgvector wasn’t really designed for high-velocity real-time ingestion.short answer--maybe not that _hard_, but it adds a lot of complexity to manage when you're trying to offer real-time search. most vector DB solutions offer this ootb. This post is meant to just point out the tradeoffs with pgvector (that most posts seem to skip over)
Question is if that tradeoff is more or less complexity than maintaining a whole separate vector store.
The repo includes plpgsql_bm25rrf.sql : PL/pgSQL function for Hybrid search ( plpgsql_bm25 + pgvector ) with Reciprocal Rank Fusion; and Jupyter notebook examples.
As for inserts being difficult, we basically don't see that because we only update the vector store weekly. We're not trying to index rapidly-changing user data, so that's not a big deal for our use case.
Ok yeah there's PGVector. Then you need something to do full text search. And if you put all that together, you have a complex Postgres deployment.
It seems to make sense for simple operations, but I'd rather just get a search engine / vector database, than try to twist Postgres's arm into a weird setup.
search is also just extension? So, its a strong point: you have one self-contained server with simple installation/maintenance story.
please ask your RDS rep to support it
we (tiger data) are also happy to help push that along if we can help
> You rebuild the index periodically to fix this, but during the rebuild (which can take hours for large datasets), what do you do with new inserts? Queue them? Write to a separate unindexed table and merge later?
What is wrong with REINDEX CONCURRENTLY?> It’s a compelling story. And like most of the AI influencer bullshit that fills my timeline, it glosses over the inconvenient details.
Furthermore, when all the hipster vector database die or go into maintenance mode or get the license rug-pull when the investors come looking for revenue, postgres will still be chugging along and getting better and better.
Anyways, all this vector stuff is going to fade away as context windows get larger (already started over the past 8 months or so).
People who say this really have not thought this through, or simply don't understand what the usecases for vector search are.
But even if you had infinite context, with perfect attention, attention isn't free. Even if you had linear attention. It's much much cheaper to index your data than it is to reprocess everything. You don't go around scanning entire databases when you're just interested in row id=X
As an example, if one is chunking inputs into a RAG, one is basically hardcoding a feature based on locality - which may or may not work. If it works - as in, it is a good feature (the attention matrix is really tail-heavy - LSTMs would work, etc...) - then hey, vector DBs work beautifully. But for many things where people have trouble with RAG, the locality assumption is heavily violated - and there you _need_ the full-on attention matrix.
We're searching across millions of documents, so i doubt it
No. No one in production is trying to use the same instance for all of these use-cases at scale. The fundamental misunderstanding here is assuming or even "demanding" that one instance should be able to provide OLTP, OLAP and vector ops with no compromises. The workloads are fundamentally different and doing serious work requires architecting the solution much more intelligently.
this is a big problem in programmer blog posts. It used to be I could find blog posts by peopel who had actually done the thing ("in anger").
Now it's someone who decided writing up the thing would draw clicks, and googled just enough to write the thing, may or may not have actually even fired it up at all -- may not have even written it, perhaps had AI write it.
It makes any of these blog posts pretty terrible guides.
I used to try at least downvoting these on say reddit when it was obviously not written by someone who had their own actual earned knowledge about the thing, but just gave up, because it's nearly everything.