Postgres Just Cracked the Top Fastest Databases for Analytics(www.mooncake.dev)

366 pointsby moonikakiss4 days ago25 comments

bhoustona day ago
Just to be clear, standard SQL databases are not great for large-scale analytics. I know from first hand experience and a lot of pain.
We tried using Postgres with large analytics at my previous company https://threekit.com but it is an absolute pain. Basically we started to collected detailed analytics and thus had a rapidly growing table of around 2B records of user events during their sessions. As it grew past a 500 million records it turned out to be impossible to query this table in any thing close to real-time - it was basically untouchable because it was so slow.
I know I could have used some type of daily aggregation combined with a weekly aggregation, etc to roll up the data incrementally. A dev tried this and yeah, it hide the slow queries but then it became inflexible in terms of reporting. And writing and maintaining these cronjobs is a lot of work.
But then I tried using BigQuery on my https://web3dsurvey.com side project and I just recorded raw data and then wrote queries to do real-time aggregation (with a caching layer) in a large variety of ways. And it was near instant and easy and also very cheap.
So then I deployed this strategy over at Threekit.com and it also was easy and fast and cheap. Even more flexible queries than Web3DSurvey at basically no developer cost or infrastructure costs. No more need for aggregating cron-jobs or trying to decide ahead of time how users want to slice and dice the data. Real time, flexible queries on the fly via BigQuery is the way.
Also BigQuery bill for https://web3dsurvey.com is like $0.25 month and it is dealing with millions of records in its 3 month window of stored data. Where as just running the cheapest Postgres SQL server on Google Cloud is like >$25/month and it is a slow one.
I would never go back to traditional SQL for analytics - it was hard, slow, expensive and inflexible. Worst of all worlds.
- atombendera day ago
  Did you use plain Postgres tables or a columnar extension like Timescale, Citus, or pg_mooncake?
  Nobody in their right mind would argue that Postgres without columnar storage is good for analytics. However, it looks like these extensions can be quite decent, at least at queries (you might still run into write performance due to Postgres' OLTP architecture, but I've never benchmarked it). In OLAP terms the size of your data is tiny and this would probably work just fine.
  Personally I'm a huge fan of dedicated databases like ClickHouse. BigQuery gets very expensive as your data and query volume grows.
  - bhoustona day ago
    > Did you use plain Postgres tables or a columnar extension like Timescale or pg_mooncake?
    Plain Postgres tables hosted on Google Cloud SQL on a decent machine.
    > Nobody in their right mind would argue that Postgres without columnar storage is good for analytics.
    Which is what I am saying. Standard means without extensions.
    > Personally I'm a huge fan of dedicated databases like ClickHouse.
    Where do you run it? What type of machine? Can it be server less?
    > BigQuery gets very expensive as your data and query volume grows.
    I didn't experience this, it has been hard for me to make BigQuery cost a lot. Storage costs were basically nothing and we just set the clustering/partition on DB creation to be users/sessions/events and dates respectively. It aligned really well with general query patterns.
    atombendera day ago
    Your comment is confusing to me because it appears to have nothing to do with the article, which is not about using Postgres' heap tables. In fact, you're arguing for a purpose-built storage engine (BigQuery), which is what pg_mooncake also is, except the latter is integrated with Postgres.
    At my company we self-host ClickHouse, but if we started out today we'd definitely adopt ClickHouse Cloud. One big advantage of CH is that tables can be backed by ordinary cloud storage like S3 and GCS, which is incredibly cheap. BQ has support for external Iceberg tables in GCS, but (from what I recall) no write support.
    We are shuffling several TBs/week into BigQuery, and that is extremely expensive. Querying this amount of data (billions of rows per day) is also very expensive unless you purchase flat pricing.
    If your needs are really modest, then BQ is definitely very reasonable.
    bhouston21 hours ago
    I will investigate Clickhouse. It is probably an even better solution that BigQuery when you have a lot of data and have a lot of queries.
    atombender17 hours ago
    Clickhouse is amazing. So is Postgres, but CH excels at its particular use case where Postgres does not.
    While these new Postgres extensions will make Postgres more competitive, CH is built from the ground up for these kinds of analytics workloads in a way that Postgres isn't, e.g. in terms of scalable data ingestion.
    CH also has a rich set of features like materialized views, aggregate function types, and dictionaries, none of which have counterparts in Postgres of these extensions.
- dig1a day ago
  As someone dealing with billions of records on it, BigQuery is far from cheap; G will not charge you much for storage as they will charge you for queries and data transfer.
  AFAIK, the cheapest Postgres server on GCP is very expensive compared to the usual Postgres installation (price/performance).
  - sgarlanda day ago
    Yep. I used it once for a personal project involving GHTorrent [0], and each query was about $20. That may be peanuts for a business, but for someone playing around on their own, it's incredibly prohibitive.
    [0]: https://github.com/ghtorrent/ghtorrent.org/blob/master/gclou...
    sgarland20 hours ago
    EDIT: The site no longer exists, it seems. The project (which is not my creation) was a regularly-updated dump of all GitHub metadata (PR comments, Issues, etc.). Here's an archive snapshot: https://web.archive.org/web/20220126060859/https://ghtorrent...
    bhouston21 hours ago
    On billions of records my query costs were generally less than a cent. But I partitioned and clustered on the key fields.
    chrisldgk21 hours ago
    Either the website this is referring to got hacked or you sold it. Either way, this probably shouldn’t be linked on HN.
    LorenzoGood21 hours ago
    What is hardboiled studios?
  - renegade-ottera day ago
    That and RDS are kind of a rip-off if you are trying to have a sandbox out there, short of running your own instance.
    For that, I found Digital Ocean to be very reasonable.
    eastbounda day ago
    DO’s pricing has increased a lot. I was surprised, I couldn’t start a $5 vm anymore, it’s worth checking again.
    jmholla21 hours ago
    You can, but I think you have to be in one of the regions that supports it. The generally (globally?) available lowest priced VM is $6 a month.
    https://www.digitalocean.com/pricing/droplets
  - michaelmiora day ago
    What do you mean by "the usual Postgres installation"?
- rapfariaa day ago
  > 2B records of user events during their sessions. As it grew past a 500 million records it turned out to be impossible to query this table in any thing close to real-time - it was basically untouchable because it was so slow.
  This is a solved problem, and it seems the technical folks over there lacked the skills to make it work. Having indexes is just the tip of the iceberg. Composite indexes, partitioning, sharding, caching, etc, can lower reads to a few seconds on disk.
  - bhoustona day ago
    > This is a solved problem, and it seems the technical folks over there lacked the skills to make it work. Having indexes is just the tip of the iceberg. Composite indexes, partitioning, sharding, caching, etc, can lower reads to a few seconds on disk.
    Or just use BigQuery and it is works, it is cheaper to run (by 10x to 100x) and can be done by a junior dev rather than a PhD in Database configuration.
    I prefer simple solutions though - I also hate Kubernetes: https://benhouston3d.com/blog/why-i-left-kubernetes-for-goog...
    majkinetora day ago
    I have 50 columns in a table with 10B records in a partitioned table. People can search using any combination. There are a huge number of possible composite indexes, so we don't have any, but we have every column indexed. Most queries take a minute or two, and we also experienced a sudden drop in performance even for queries that used to work OK. Not sure what to do next to improve query speed.
    moonikakissa day ago
    This is an interesting workload. We've heard similar pains for fast filtering on many large column tables.
    For this workload, having a columnstore version of your table will help.
    DM us: https://join.slack.com/t/mooncakelabs/shared_invite/zt-2sepj.... We can help.
    zhousuna day ago
    Yea this is indeed a repeated pattern we saw people requesting (filter on many columns) and we are trying to solve with pg_mooncake. If you are interested, feel free to join mooncake-devs.slack.com to chat more about your use case.
    DrFalkyna day ago
    Indices are one the first things you learn in any decent DB course, so you don’t need a PhD
    But if BigTable just solves the problem it seems the way to go
    PostGres is popular for a reason, it’s ACID unlike BigTable
    You may run into these problems later on, you may not
    furstenheima day ago
    ACID is good and it has an implementation cost (vacuum for example). When doing analytics you do not care for ACID
    bhoustona day ago
    I love Postgres and use it a lot but not for analytics.
  - Isn0guda day ago
    Doesn't sound like solved problem to me if you have to employ more than four different mitigation strategies.
- TuringNYCa day ago
  >> Just to be clear, standard SQL databases are not great for large-scale analytics.
  What do you mean by "standard" SQL? Were you using an OLTP or OLAP db when you faced these difficulties? Also, what makes BigQuery not a "standard SQL database" -- they use GoogleSQL, true, but many DBs have slight SQL variants.
  - bhoustona day ago
    > What do you mean by "standard" SQL?
    No extensions, just the default Postgres Cloud SQL offering from Google Cloud.
    > Also, what makes BigQuery not a "standard SQL database" -- they use GoogleSQL, true, but many DBs have slight SQL variants.
    I view BigQuery as not a standard DB because it really doesn't do transactions, rollbacks, foreign key constraints, indices, stored procedures, etc. I view it as a non-SQL database pretending to be SQL for query convenience.
    gbina day ago
    I think you mean a relational database.
    layer8a day ago
    All the features mentioned are part of SQL.
    jandrewrogersa day ago
    SQL (the DML) is just a query language. It could be implemented on anything.
    Parts of SQL DDL is implementation specific. The SQL standard hardcodes the assumption of (by modern standards) a small-scale system and therefore will deviate significantly on large-scale systems. There is no practical way to make this "standard", you have to understand the implementation it reflects.
    noma day ago
    SQL is a language to interact with a relational database. It is not about the language.
  - braiampa day ago
    According to their other comment standard means what it comes in the box by default.
- zhousuna day ago
  Thanks for the comment but you are mixing some terminologies.
  The core idea of mooncake is to built upon open columnar format + substitutable vectorized engine, while natively integrate with Postgres.
  So it is indeed closer to BigQuery (especially the newer bigquery with iceberg tables) than a 'standard SQL database'. It has all the nice properties of BigQuery (ObjectStore-native, ColumnStore, Vectorized execution...) and scaling is also not impossible.
- Simon_O_Rourkea day ago
  > it was basically untouchable because it was so slow.
  I've worked with a 5B row table on Snowflake with (maybe) no indexes, and while somewhat slow you could still run reasonable queries on it in a minute or two.
  - bhoustona day ago
    I guess I liked the convenience with Bigquery of being able to do significant queries in a second or two on large data sets. It means I can do the query live for the user as they navigate the analytics dashboard and this simplifies design and implementation complexity significantly.
- Galanwea day ago
  > a rapidly growing table of around 2B records of user events during their sessions. As it grew past a 500 million records it turned out to be impossible to query this table in any thing close to real-time
  I mean, I don't know what you call "close to real time", and what kind of query you did, but I have Postgres serving requests from a 20B rows table just fine, with some light tweaking of indexes and partitions (I'm by no means a DBA).
  Unless we are digging much deeper than that in the specifics of the data and queries, I don't think any conclusion can be taken from either experience.
  - jandrewrogersa day ago
    The OP is trying to run a mixed analytic workload which Postgres does poorly even at relatively small scales. No amount of "light tweaking" can fix it, it is intrinsic to the architecture. You can put 100B records in a Postgres table (I have) and it works if your workload is trivial but not well. The Postgres internals are not designed for data this large, it is an OLTP system.
    You can do fast-twitch analytical queries (sub-second, a few seconds at the tail) on complex operational data models where you are inserting millions of records per second on 100B record tables while those queries are running. Just not on Postgres.
  - Eikona day ago
    Same for https://www.merklemap.com/
    The biggest table contains 30B records. A query that uses a B-tree index completes in a few microseconds.
    EXPLAIN ANALYZE SELECT * FROM table_name WHERE id = [ID_VALUE]; Index Scan using table_name_pkey on table_name (cost=0.71..2.93 rows=1 width=32) (actual time=0.042..0.042 rows=0 loops=1) Index Cond: (id = '[ID_VALUE]'::bigint) Planning Time: 0.056 ms Execution Time: 0.052 ms
    jodrellblanka day ago
    If the original comment is talking about saving money and running in Google cloud, could they be using virtual machines and HDD while both you and the parent talking about good speeds (and not money) are using physical servers and SSD?
    Eikona day ago
    I am indeed running PostgreSQL on bare metal with a 20TB NVMe pool, the DB itself is around 18 TB :)
    But that's the way to go, isn't it?
    Galanwea day ago
    I'm just your average Joe using RDS. The whole database itself is 3TB.
    mmsimangaa day ago
    Not OP but OP seemed to be talking about analytical queries. Which typically summarise data over a period of time. For year on year comparisons this is summarising data over two years. So the query would have at least one aggregate function.
    tomnipotenta day ago
    Index-backed point look-ups are not the problem for analytical queries, but rather minimizing disk I/O for large scans with high column or predicate selectivity.
    Once you've optimized for the more obvious filters like timestamp and primary key, like using partitions to avoid using indexes in the first place, you're left with the situation where you need to aggregate over many gigabytes of data and an index doesn't help since your query is probably going to touch every page within the filtered partitions.
    You can solve some of these problems in Postgres, like partitioning, but now you're stuck with random I/O within each page to perform non-SIMD aggregations in a loop. This approach has a ceiling that other implementations like ClickHouse do not.
  - crazygringoa day ago
    These are two totally different use cases.
    Analytics generally means a full table scan or similar for each query, because you're e.g. taking an average of every value.
    You're talking about indexes which are to optimize retrieving a handful (or few thousand) of rows.
  - bhoustona day ago
    Real time means a user can load a web page and see a query results without significant wait. So less than a few seconds.
- sgarlanda day ago
  > Basically we started to collected detailed analytics and thus had a rapidly growing table of around 2B records of user events during their sessions. As it grew past a 500 million records it turned out to be impossible to query this table in any thing close to real-time
  Analytics isn't typically something that needs real-time capabilities, for one.
  > a rapidly growing table [emphasis mine]
  I think I see part of the problem here. If you had a single table, that means it's completely denormalized, so your schema probably looked something like this (or wider):
  CREATE TABLE UserEvent ( id UUID PRIMARY KEY, user_id UUID NOT NULL, user_ip_address TEXT NOT NULL, user_agent TEXT NOT NULL, event_data JSONB, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), event_type TEXT ); CREATE INDEX UserEvent_user_id_idx ON UserEvent (user_id); CREATE INDEX UserEvent_created_at_idx ON UserEvent (created_at);
  The JSON blob might be anywhere from a few hundred bytes to well over a kilobyte, and probably duplicates data already present as a scalar, like IP address, user agent string, timestamp, etc. I'll use the middle ground and say the JSONB objects are on average 500 bytes when stored. Now, the rest.
  A UUID, if stored as its native type (or BINARY(16) in MySQL - don't sleep on this, MySQL folks; it makes a huge difference at scale) is 16 bytes. That's double the size of a BIGINT, and quadruple the size of an INT4. Also, unless you're using UUIDv7 (or UUIDv1, but no one does), it's not k-sortable. Since Postgres doesn't cluster tuples around the PK [yes, I know all indices in Postgres are technically secondary] like MySQL/InnoDB does, this doesn't immediately thrash the B+tree in the same way, but it does thrash the visibility map, and it does bloat the WAL. There are various arguments for why you shouldn't use a monotonic integer as a surrogate key, but IMO they're largely overblown, and there are workarounds to not publicly disclose it.
  IPv4 addresses, stored in dotted-quad as a string, are a maximum of 15 characters, storing in 16 bytes as TEXT or VARCHAR. If stored instead in the Postgres native INET type, that drops to 7 bytes, plus you get built-in validation. If you had INT4 UNSIGNED available (as MySQL does natively), you could even store them in their numeric form and save another 3 bytes, though you lose the validation.
  User Agent strings are huge, usually over 100 bytes. They're also not that unique, relatively speaking. Even if you need to know the patch version of the browser, anyone with a browser doing automatic updates is going to stay more or less in sync. The point is this could easily be a lookup table, with either a SMALLINT (2^15 - 1 maximum values, or 2^16 - 1 if you use unsigned values; possible with an extension in Postgres) or an INT (2^31 -1 maximum values) as the PK.
  Not going to touch on JSON objects because the things you might want to know are endless. TOAST and de-TOAST can be slow; if you need low latency, you should normalize your data.
  There may or may not be extracted scalars, which can be beneficial during queries. Again, though, lookup tables (or even native ENUM types, if the values are limited) are crucial at scale.
  As it stands, the table will have an average row size of 664 bytes (assuming an average of 12 bytes stored for the IP, 100 bytes stored for the UA, 500 bytes stored for the JSONB, and 12 bytes stored for the event type). That's 332 GB for 500,000,000 rows. You could shave a couple of bytes off by aligning columns [0], which saves 1 GB. If the IP addresses and UA strings were lookup tables, each with an INT4, that saves 104 bytes per row. If you made the PK for the table a BIGINT, that saves another 8 bytes per row. The total savings between column alignment and basic normalization is 114 bytes per row, or 57 GB.
  This doesn't touch on the indices, either. If you're using PG 13+, you get B+tree de-duplication [1] for free, which can help with some denormalized data, but not if you have anything with high cardinality, like a timestamp, or a UUID. With lookup tables, you would of course need to index those FKs (whether or not you're enforcing constraints), which adds some size, but is still a huge net gain.
  > I know I could have used some type of daily aggregation combined with a weekly aggregation, etc to roll up the data incrementally. A dev tried this and yeah, it hide the slow queries but then it became inflexible in terms of reporting. And writing and maintaining these cronjobs is a lot of work.
  And shifting your entire analytics workload isn't a lot of work? Between ROLLUP [2] and MATERIALIZED VIEW [3], which can automatically refresh itself with a cron, this doesn't seem that burdensome.
  > Also BigQuery bill for https://web3dsurvey.com is like $0.25 month and it is dealing with millions of records in its 3 month window of stored data.
  Then you're in the free tier (<= 1 TiB/month of processed data), because after that it's $6.25/TiB. Also worth noting there is a massive difference between millions of rows and billions of rows. The former can be handled by practically any RDBMS on any hardware, with a completely unoptimized schema. The latter requires some thought if you want it to be performant.
  This isn't at all to say that specialized DBs don't have their place, because they absolutely do. If you need a KV store, use a KV store, not an RDBMS. If you need OLAP, use something designed for OLAP. The difference is scale. At startup or side project scale, you can easily do everything (including pub/sub) with an RDBMS, and if you put thought into its design and usage, you can take it a lot farther than you'd think. Eventually, you may hit a point where it's counter-productive to do so, and then you should look into breaking tasks out.
  The issue I see happening time and time again is devs have little to no expertise in DBs of any kind, but since everyone says "Postgres is all you need," they decide to use it for everything, except they don't know what they're doing. If you do that, yeah, you're gonna have problems fairly early on, and then you'll either throw your hands up and decide you really need a bevy of specialized DBs, caches, and message queues (which introduces a lot of complexity), or you'll vertically scale the DB. If you choose the latter, by the time you hit scaling limits, you're easily spending $25K/month on the DB alone. If you opt to hire someone with DB expertise at this point, you'll spend about that if not more in personnel costs, and not only will it take them weeks if not months to unravel everything, your devs will be constantly complaining that queries are now "too complex" because they have to do some JOINs, and they're being told to stop chucking everything into JSON. If instead, you took at most a week to learn some RDBMS basics by a. reading its manual front-to-back b. hands-on experience, trying things out you could almost certainly get much farther on much less.
  [0]: https://www.enterprisedb.com/blog/rocks-and-sand
  [1]: https://www.postgresql.org/docs/current/btree.html#BTREE-DED...
  [2]: https://www.postgresql.org/docs/current/queries-table-expres...
  [3]: https://www.postgresql.org/docs/current/rules-materializedvi...
- cyptusa day ago
  same. started writing data into parquet to analyze from there with a big gain in performance and hosting costs
saisrirampura day ago
Sai from PeerDB/ClickHouse here. Nice to see the progress on this project! I wanted to leave a few notes:
For analytics on transactional data, it looks like you'd still need to use logical replication (https://github.com/Mooncake-Labs/pg_mooncake/issues/90). Logical replication is somewhat similar to an ETL/CDC experience, though it's more Postgres-native. Managing logical replication at a production grade isn't trivial — it's quite common for customers to use PeerDB for homogeneous replication due to performance, manageability and observability issues with logical replication.
One potential value of extensions is Postgres compatibility for your queries. However, I'm curious about the SQL coverage on columnar tables — for example, advanced constructs like partitioned tables, prepared statements, correlated subqueries, RCTEs, triggers, and more. While it seems there’s ongoing work to improve compatibility, achieving full Postgres compatibility is challenging. In some cases, you might need to fall back to Postgres (instead of DuckDB), sacrificing performance for certain queries.
The biggest challenge we faced at Citus was the constant trade-off between Postgres compatibility and performance — both are complex and constantly moving targets. This was the key reason why Citus couldn't compete with purpose-built databases like Snowflake, ClickHouse, SingleStore. While DuckDB didn’t exist when we built Citus, it's still fundamentally different from Postgres. Even though the DuckDB dialect is somewhat similar to Postgres (as is Snowflake’s), retrofitting two distinct databases — each evolving independently for world's best performance — isn't trivial.
In short, relying on ETL (logical replication) without providing full Postgres compatibility raises the question: is it better to lean on a purpose-built database with a laser focus on performance, or adopt a more generalized approach?
Anyway, I come from a different school of thought — using the right tool for the right job. That said, I love seeing all the progress and evolution in the Postgres community — Postgres will always be my first love!
- zhousuna day ago
  Hi, Zhou From Mooncake labs here.
  Love your work on PeerDB and it's inspiring the evolvement of pg_mooncake (logical replication will be the killing feature for V2)
  The core idea of mooncake is to built upon open columnar format + substitutable vectorized engine, while natively integrate with Postgres:
  1. For small devs, we allow the whole stack to be embedded as a Postgres extension for ease of use
  2. For enterprise, our stack is also purpose-built stack similar to PeerDB + ClickHouse, not a more generalized approach
  We allow a gradual transition from 1 to 2.
  - saisrirampur21 hours ago
    Thank you for the kind words! :)
    1, makes sense.
    On 2, I understand your thinking around purpose-built — but you're retrofitting an analytical database into a transactional database without fully supporting all the features (both in terms of functionality and performance) of either. It's really hard to be truly purpose-built this way. As a result, users might not get the best of both worlds.
    PeerDB is different. We keep Postgres and ClickHouse separate and just move data reliably between them. Users get to query Postgres and ClickHouse in isolation and make the best of each of them.
    Anyway, keep up the good work! Just wanted to share some challenges we've seen before when building an analytics extension (Citus), particularly around chasing both Postgres compatibility and performance.
    zhousun20 hours ago
    Yep what I want say is the line between the two designs is indeed very blur.
    Logical replication with mooncake will try to create a columnar version of a postgres heap table, that can be readable within postgres (using pg_mooncake); or outside postgres (similar to peerdb + clickhouse) with other engines like duckdb, StarRocks,Trino and possibly ClickHouse.
    But since we can purposely build the columnstore storage engine to have postgres CDC in mind, we can replicate real-time updates/deletes(especially in cases traditional OLAP system won't keep up).
    saisrirampur19 hours ago
    I understand. In that scenario, why can't users just use these other query engines directly instead of the extension. You're heavily relying on DuckDB within your extension but may not be able to unleash its full power since you're embedding it within Postgres and operating within the constraints of the Postgres extension framework and interface.
    zhousun18 hours ago
    lol spot-on comment and stay tuned for our v2 :)
    The focus of mooncake is to be a columnar storage engine, that natively integrate with pg, allowing writing from pg, replicating from pg, and reading by pg using pg_mooncake. We want people to use other engine to read from mooncake, and here they are effectively stateless engine, that's much easier to manage and avoids all data ETL problems.
    saisrirampur18 hours ago
    Sounds good. I'm still a bit confused. But will wait for your next version. :) ETL problems still aren't avoided — replicating from Postgres sources using logical replication is still ETL. One topic we didn't chat much is, be careful about what you're signing up for with logical replication — we built an entire company just to solve the logical replication/decoding problem. ;)
- theLiminatora day ago
  Do you believe that things like cedardb might one day remove the distinction between OLAP and OLTP dbs?
  - zhousuna day ago
    Interesting enough, the mooncake team was building SingleStore before, and so far it is the best production-ready HTAP system. One lesson I really learned is, people don't want to switch their system-of-record OLTP system.
    Spoiler-alert: Mooncake will be supporting HTAP use-cases soon, and it comes with better trade-offs: keep your OLTP postgres tables as is, and mooncake adds analytics capacity to those table on up-to-date data.
dleeftinka day ago
> To enhance query execution speed, we embedded DuckDB as the execution engine for columnstore queries
So is it Postgres or DuckDB that cracked the analytics top ?
- moonikakissa day ago
  well, pg_mooncake is a Postgres extension, and Postgres + pg_mooncake is still just Postgres. Users deploy pg_mooncake as a Postgres extension and write and query all tables through psql.
  Fast analytic databases need two key things: columnar storage and a vectorized execution engine. We introduce a columnstore table access method in Postgres with data stored in Parquet) and execute queries on those tables using DuckDB.
  By leveraging DuckDB's execution engine, we avoid reinventing vectorized query execution while keeping everything managed through Postgres.
  More on our architecture: https://www.mooncake.dev/blog/how-we-built-pgmooncake
  - moonikakissa day ago
    One thing I forgot to mention. We're actually faster than DuckDB on parquet. We implemented segment elimination based on parquet metadata.
    We blogged about this: https://www.mooncake.dev/blog/duckdb-parquet
    porkera day ago
    Have you upstreamed the change to DuckDB?
    fmbba day ago
    Reading the linked blog it seems the speed up comes from offloading some lookups to Postgres tables.
    That doesn’t seem like something that makes sense to in DuckDB.
    > pg_mooncake leverages Postgres to store table metadata, eliminating the need to query an external catalog (as DuckDB would require when scanning external Iceberg or Delta Lake tables). Instead, it directly queries the Postgres table to identify which Parquet files to scan, reducing overhead.
    > Additionally, pg_mooncake stores detailed Parquet metadata for each file in Postgres, including column statistics for every row group. These statistics, such as MIN/MAX values for each column, are aggregated to optimize scans. When querying a columnstore table, pg_mooncake uses these statistics to filter out data files and row groups that cannot produce matches.
  - jbergensa day ago
    I'm personally on the side that thinks that pg + x != pg.
    It might not even be possible to install x on some hosted pg servers.
    It is great that you can expand pg and that you and others have but I don't rate it as high as a buil-in solution.
  - gavinraya day ago
    Can I suggest you put this sentence somewhere in the README near the beginning?
    > Fast analytic databases need two key things: columnar storage and a vectorized execution engine. We introduce a columnstore table access method in Postgres with data stored in Parquet) and execute queries on those tables using DuckDB.
  - fifiluraa day ago
    What is the cost of scaling up number of CPUs for parallel processing?
    That was always the culprit for me compared to AWS/Athena and BigQuery. They are dirt cheap on analytics workloads when you can parallelize the calculations to 100 CPUs without really paying any extra.
    With postgres you are stuck with linear cost for scaling up number of CPUs, so everything is slow anyway.
    zhousuna day ago
    zhou from mooncake labs here.
    Good point! Normally for postgres extension it won't be solvable, but for mooncake it is actually not the case!
    The core idea of mooncake is to built upon open columnar format + substitutable vectorized engine, while natively integrate with Postgres.
    So right now it is using duckdb within postgres to run the query, but we can and we will support ad-hoc using other 'stateless engines' like Athena, StarRocks or even spark to run a big query.
  - nextna day ago
    Fast analytics databases need a third key technology. I can't say what. You seem unaware of it.
- tonyhart7a day ago
  both??? I mean seeing their homepage this is clearly their business model no?
- rubenvanwyka day ago
  DuckDB
dsiegel2275a day ago
A question that I have had for a while that I can't seem to find an answer: for teams that are using various columnar store extensions to turn Postgres into a viable OLAP solution - are they doing so in the same instance of their Postgres that they are using for OLTP? Or are they standing up a separate Postgres instance?
I'm trying to understand if there is any potential performance impact on the OLTP workload by including the OLAP in the same process.
- bach4antsa day ago
  And further, with this pg_mooncake extension allowing you to store the data in S3, is Postgres simply providing compute to run DuckDB? I suppose it's also providing a standardized interface and "data catalog."
  - zhousuna day ago
    transactions are also managed by postgres as if they are native table, so that you don't need to worry about coordinating commits between postgres and the S3 data.
- a day ago
  undefined
owenthejumpera day ago
What's the business model? It's a extension that's MIT licensed, yet it has a company and a VC behind it. Sounds like a rug pull waiting to happen
- tesch1a day ago
  Counterpoint: It's actually very astute of them to start like this.
  In order to have a viable business model they need to create value for users. Users are intelligent and will not even consider trying something that has no exit path, should the company disappear.
  What every company hopes to have is customers who are thrilled with the value they deliver and tell everyone what a great deal it is so the company can grow.
  What no company hopes for is to end up like hashicorp, where they end up spending more investment $ than the value they provide, never achieve profitability, and eventually just piss everyone off, and everyone is trying to make the best of a leveraged situation, they end up having to pull the rug. The user's leverage in that situation is something like opentofu, made possible by the license, same as what's being offered here.
  The price of the big columnar dbs is very high, so there's a lot of ground to capture / value to arbitrage/offer in this space - as evidenced by other comments in this thread, how fast the benchmarks are changing, the likelihood of memory prices coming down, etc.
  Aside from that, you have to wonder big picture if the AI space will put significant downward pressure on memory and compute prices with everyone wanting to run local LLMs, might change some fundamental tradeoffs made in db systems. If in 10 years I can get a phone with a 1 TB of fast access RAM to ask siri how to sauté mushrooms, what will 99% of us need columnar store for?
- zhousuna day ago
  Zhou from mooncake labs here.
  Mooncake is built upon open-table formats and substitutable query engines. So it don't need to be just a postgres extension.
  PG_mooncake will stay open-source under MIT, for small devs where everything fits in their postgres, we hope everyone to enjoy the !
  And we would love to help companies outgrown postgres to transition into the modern stack with postgres + mooncake potentially outside pg + iceberg/delta.
- a day ago
  undefined
spapas82a day ago
As others have mentioned the results are from an extension, not postgres.
Unfortunately using an extension that's not "part" of postgres (like pg_stat_statements) is not trivial for most people since there are both technical and legal issues.
- moonikakissa day ago
  try us on Neon Postgres https://neon.tech/docs/extensions/pg_mooncake.
  It should be fairly trivial to get started
- kyrraa day ago
  What are th legal issues?
  - kbolinoa day ago
    Some extensions, like TimescaleDB, have licenses explicitly forbidding their use in hosted products which compete with the companies backing them. In concrete terms, this means they can't be used in any of the major cloud providers' managed database solutions (e.g AWS RDS).
    Others, like Citus, are licensed under the AGPL, which is basically an even stronger version of the GPL that requires anyone hosting a service to provide their modifications to the source code. While this doesn't strictly rule out use in e.g. RDS it does basically say that Amazon would be required to open source any changes they make to get it working in RDS, which is a burden they don't seem interested in taking on, and which might reveal details about how their services work under the hood which they are generally fairly cagey about doing.
    The AGPL is also viewed even more suspiciously than the GPL by corporate lawyers who fear it may lead to virally forcing the company to reveal all of its source code. This fear is probably unfounded, but all copyleft licenses are primarily about expanding end-user freedom and so are treated as undesirable in most businesses that make money from selling software.
  - spapas82a day ago
    Licencing. Postgres has a well known, open source licence making it easy to use. Some extensions do not follow that practice making it more difficult to use.
    Even if the extension has a liberal licence eg Apache or MIT nothing guarantees that the licence won't change in the future.
xkgta day ago
I read the title a couple of times and I'm still not sure it isn't misleading. The benchmarks are not just for Postgres but for Postgres with the Mooncake extension. There are also other results for Postgres with different extensions. While it does rank among the top fastest databases, it is not the fastest and not even within the top 10.
bigtonesa day ago
Looks like they're currently No. 12 on their cited Clickbench benchmark - so not quite in the top 10 three weeks later.
- moonikakissa day ago
  I know! It's crazy how quickly everyone is moving in this space.
  There are some still low-hanging fruit optimizations we're working on (https://github.com/Mooncake-Labs/pg_mooncake/issues/82), and we'd love to get back to the Top 10.
dcreatera day ago
All you need is postgres: part 73
- menaerusa day ago
  Or all you need is a database that allows anyone to skip the hard part and simply write a plugin for.
- cpursleya day ago
  https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...
antonmksa day ago
It is not really Postgres, the queries are run on DuckDB. Yeah, DuckDB is really fast for analytical queries.
- moonikakissa day ago
  Well, pg_mooncake is a Postgres extension. So it's really just Postgres :)
  We embed DuckDB in our extension as the vectorized execution engine on columnstore tables.
  Why rebuild a vectorized execution, when DuckDB has a lot of GREAT properties for this workload (similar syntax to Postgres, embedability).
  Here's our architecture: https://www.mooncake.dev/blog/how-we-built-pgmooncake
- moonikakissa day ago
  oh and I totally forgot. pg_mooncake is actually faster than DuckDB on parquet.
  We implement segment-elimination to do so. We even blogged about it: https://www.mooncake.dev/blog/duckdb-parquet
- dcreatera day ago
  Yes but this is an extension that enables abstracting that away for an end user and allows them to just use psql
jota day ago
How is this different from Crunchy Warehouse which is also built on Postgres and DuckDB?
https://www.crunchydata.com/products/warehouse
- moonikakissa day ago
  Architecturally, it's very similar. We have the same vision! And they've done a great job, especially around writing to Iceberg. Some differences:
  1. Our extension is fully open-source. I believe they've open-sourced bits of their stack. 2. We are unopionated about open table formats: Iceberg and Delta. 3. In v0.2, we will support small write workloads. This will open up time-series and HTAP workloads.
- mslot17 hours ago
  It's a similar idea, but Crunchy Data Warehouse was built by several founding engineers of Citus, which lets us speedrun through it :)
  It's a generally available (very solid) product powering some large production workloads, with fully transactional Iceberg, and auto-compaction. All SQL queries and almost all Postgres features are fully supported on Iceberg tables.
  We are also seeing interesting patterns emerging with the ability to load/query csv/json/parquet/shapefile/... directly from S3 in combination with pg_parquet and pg_incremental. For instance, incrementally & transactionally loading CSV files that show up in S3 into Iceberg, or periodically exporting from Postgres to Parquet and then querying with data warehouse.
- rubenvanwyka day ago
  Think that use pure parquet, not iceberg tables.
  - mslot17 hours ago
    We first launched it as "Crunchy Bridge for Analytics" in April last year. At the time, it could mostly query/import/export Parquet/CSV/JSON. Our goal was to build a data warehouse, but getting early feedback was helpful and we did not want to misrepresent the offering.
    In November last year we added Iceberg support and managed storage, so it became a full data warehouse experience with writes and transactions and we renamed it to Crunchy Data Warehouse.
  - jota day ago
    They list "Managed Iceberg tables" top of list of features on that page.
polskibusa day ago
How does mooncake work with std oltp workloads? Can I use Postgres with OLTP , add mooncake and expect duckdb-level of performance for OLAP queries? I know that SAP HANA has some sort of several layers of storage and automatic movement of data between them to allow for such performant OLTP/OLAP hybrid, and I think this is the holy grail for cheap/open source db. Users need OLTP first but don’t want to add Kafka + clickhouse or similar pipelines when OLAP is needed.
- moonikakissa day ago
  In v0.2, this is exactly what we're building.
  Keep your OLTP as it. Deploy Mooncake with logical replication into columnstore tables, and get DuckDB like perf.
  A big part of this is not replacing your OLTP.
mirekrusina day ago
Why timescaleDB is not mentioned anywhere? Genuine question.
- sam_goody20 hours ago
  Actually, it is, in a diagram right at the top of https://github.com/Mooncake-Labs/pg_mooncake.
  However, the Venn diagram does not tell me anything useful, so I agree with the sentiment. Would love it if someone could explain what mooncake is good for and what its strengths and weaknesses are vs Timescale.
sciroba day ago
Any differentiation vs Hydra ? They also put duckdb inside pg https://www.hydra.so/
- zhousuna day ago
  hydra/pg_duckdb embeds duckdb to query existing data on S3. So it is kind of targeting a completely different use case (someone already prepares and shared a dataset and you just want to query it).
  pg_mooncake (&crunchyData) is implementing columnstore tables in postgres, so you can actually use postgres as a data-warehouse (to ingest/ update and run analytics queries)
rubenvanwyka day ago
Looking at the effort being put in it seems there’s a massive market for proper Postgres analytics / columnstore, which is very telling of the pain involved in moving data between systems.
- mosselmana day ago
  Yes this has been my experience too. It isn’t just the costs and complexities involved in the moving of the data, but needing the expertise in your team to do so or at least investing the time of someone in your team to figure it out and maintain it. All the while they can’t do anything that is actually useful.
pier25a day ago
They're already working on 0.2
https://www.mooncake.dev/blog/pgmooncake-v02discussion
- moonikakissa day ago
  oh yeah. We're baking 0.2. Extremely excited about the small-writes. Today, each insert will create a new parquet file.
  This will be fixed in v0.2. Stay tuned, we will have some announcements here coming soon.
xiphias2a day ago
It's cool proof that it's worth to spend the effort for Postgres authors to add vectorization to the default columnstore.
tarun_ananda day ago
Will this work alongside Citus? If yes, what benefits/disadvantages would it have? I presume at some point of time workloads do need a cluster.
- moonikakissa day ago
  In v0.1, we have a dependency on pg_duckdb which does not support citus (https://github.com/duckdb/pg_duckdb/issues/444).
  With what we're working in v0.2, we re-architecture some of our dependencies there, and will be able to scale out and support Citus.
nikitaa day ago
This is an exciting project. Few highlights: - Query processor is DuckDB - as long as it translates PG type system to DuckDB typesystem well - it will be very fast. - Data is stored on S3 in Parquet with Delta or Iceberg metadata. This is really cool. You don't need to push analytical data through WAL - only metadata goes into WAL. This mean fast loading at least in theory, and compatibility with all the Delta/Iceberg ecosystem. - Once they build real-time ingest, you can just push timeseries into this system and you don't need a second system like Clickhouse
- tarun_ananda day ago
  Data is also stored in the local filesystem. How does that compare with vanilla PG and/or S3 based storage?
timsuchaneka day ago
Congrats on the launch! Does it support ordinary db migrations with the normal Postgres interface?
osigurdsona day ago
It would be great if Postgres had native columnar storage.
stoicfungia day ago
what are the differences between pg_mooncake and pg_duck?
- moonikakissa day ago
  Both add the DuckDB extension. But the use-case is slightly different:
  1. pg_duckdb is great for querying existing columnar files (parquet) in Postgres.
  2. Our focus is on helping you write and query your existing Postgres table into a columnar format.
  We spent most of our time on building the table access method for columnstore tables.
rafael3a day ago
Noob question: why is there no Hadoop cluster on Clickbench?
- zhousuna day ago
  people tried to run spark (better hadoop) and failed lol. https://github.com/ClickHouse/ClickBench/pull/139
glass_doora day ago
[dead]
a day ago
undefined