OpenAI: Scaling PostgreSQL to the Next Level(www.pixelstech.net)

190 pointsby thunderbonga month ago15 comments

samwillisa month ago
I was at PGConf last week, and this was one of the most packed talks - a great insight into using Postgres, where most of the conference was fairly inward facing, with talks around the development of Postgres itself (pgconf.dev is very much that one, out of all the others each year).
What you have to remember is that for many teams, when their product takes off, they are not equipped with the deep internal knowledge of how to scale a particular part of their stack. This was an awesome story from a small team having to tackle those challenges, and how they were learning as they went. So, while there are some of those "can't you just", and "whats interesting about this?" comments here, with the narrative of the growth rate, and the very high profile of the product, it was the perfect user talk for a internal development focused conference.
The key insight, and main message, of the talk was that if you are not too write heavy you can scale Postgres to very high read throughput with read replicas and only a single master! That is exactly the message that needs to be spelled out as that covers the vast majority of apps.
As an observation, in the Q&A at the end of the talk the questions, primarily from core Postgres developers, were focused on learning about the use case, and not an opportunity to suggest that they were doing anything wrong (not quite the same as this thread could get). A genuinely awesome group of very friendly and welcoming people in the Postgres community.
- Swizeca month ago
  > if you are not too write heavy you can scale Postgres to very high read throughput with read replicas and only a single master
  The number of interviewees (I do the sys design question) who want to jump straight into massively distributed eventually consistent complicated systems for 5 reads/second is too damn high. 1,000,000 users is not a lot.
  I wish we did better with teaching folks that while we (as an industry) were focused on horizontal this and that, computers got fast. And huge. Amazon will rent you a 32TB RAM server these days. Your database will scale just fine and ACID is far too valuable to throw.
  - cryptonectora month ago
    > The number of interviewees (I do the sys design question) who want to jump straight into massively distributed eventually consistent complicated systems for 5 reads/second is too damn high. 1,000,000 users is not a lot.
    Not only can you get quite far w/ PG and w/o sharding, but if you run out of write bandwidth you might be able to (depending on what your schema looks like) shard your PGs. For example, suppose you have a ride sharing type app: you might have one DB per market with only user and driver data for that market, then have the apps go to the user's home market for authentication, and to the local market for rides (similarly if a driver ends up driving someone to a different market, you can add them there temporarily).
- bohanoaia month ago
  > The key insight, and main message, of the talk was that if you are not too write heavy you can scale Postgres to very high read throughput with read replicas and only a single master! That is exactly the message that needs to be spelled out as that covers the vast majority of apps.
  This is exactly the message I wanted to convey in the talk—thank you so much! -Bohan
  - samwillisa month ago
    No worries, it was a really great talk - one of the highlights of the conference for me!
- Vonnga month ago
  Author here. The post here is a 3rd-party translation and may drifts from original wording in a few spots. Azure and OpenAI sent me some corrections today, so I published an updated English version myself here: https://pigsty.io/blog/db/openai-pg/
  - samwillisa month ago
    Hey! It was great to meet you at the conference, it's a great writeup!
- kremboa month ago
  Do you know where I can find the slides/records of the talks?
  - samwillisa month ago
    I believe the videos of the talks will be published at some point - I think "a few weeks" was mentioned.
- beltera month ago
  You seem to be implying this thread is unkind to the team. Given most of the experience here, probably many are intellectually attracted by the architectural problem of scaling a product like ChatGPT, while examining how a company with almost unlimited funding approaches hiring.
  Statements like these: "The presentation also specifically mentioned that using ORM can easily lead to inefficient queries and should be used cautiously."
  Shows they are not experienced enough, to run this type of infrastructure at scale.
  - andrewguenthera month ago
    How does that statement convey inexperience? The presentation doesn't try to present that as a novel conclusion, it's just a true statement.
VWWHFSfQa month ago
> The presentation also specifically mentioned that using ORM can easily lead to inefficient queries and should be used cautiously.
Every ORM is bad. Especially the "any DB" ORMs. Because they trick you into thinking about your data patterns in terms of writing application code, instead of writing code for the database. And most of the time their features and APIs are abstracted in a way that basically means you can only use the least-common-denominator features of all the database backends that they can support.
I've sworn off ORMs entirely. My application is a Postgres application first and foremost. I use PG-specific features extensively. Why would I sacrifice all the power that Postgres offers me just for some conveniences in Python, or Ruby, or whatever?
Nah. Just write good SQL for your database and the whole system will be very happy.
- bwfan123a month ago
  At one time we had to switch our service out of DB2 into psql. We had to do this with minimum downtime !! ORM helped us since it abstracted the DB (to the orm the sql dialect was a plugin). The application was largely untouched. Also, not all developers can roll their own sqls and sqls embedded in code make it harder to refactor or reason with and eventually sqls will be abstracted into a crud library of sorts anyway.
  - sgarlanda month ago
    > Also, not all developers can roll their own sqls
    If you can't write SQL, don't use an RDBMS until you learn it. This sounds like gatekeeping because it is: I don't understand why so many people are willing to accept that they need to know their main language to use it, but not that they need to know SQL to use it.
- jillesa month ago
  I've been using the Django ORM for years and it feels like an amazing piece of software.
  But recently I started using sqlc. Which turns my queries into Go (simplification). I think this is actually the sweet spot between ORM and rawdogging SQL.
  - VWWHFSfQa month ago
    Django ORM is great if you want it to manage your entire data model and are OK with living entirely within it's little world. But it can't support even basic stuff like composite primary keys, complex filters in a join clause, or any kind of lateral joins at all. It's also among the biggest offenders at hidden (and explosive) N+1s all over your codebase and templates unless you pay very special care. Also, the SQL it generates is usually very naive and often-times just not even performant.
- groonea month ago
  You just never seen a good ORM such as Entity Framework Core
tux3a month ago
Self-hosting postgres is appealing from a flexibility standpoint (don't be locked out of superuser or advanced features), but it sounds a little bit nerve wracking.
I'm now hoping all the cloud providers read this article and start exposing the feature to try disabling an index in the query planner before you drop it for real, that should really become standard procedure. Just in case.
But if you're a large scale company to the point of wanting to own and customize your stack, it can definitely make sense to self-host.
- bohanoaia month ago
  Speaker here — Bohan from OpenAI.
  Just to clarify, we're using Azure Database for PostgreSQL, not a self-hosted setup. I mentioned "Azure Postgres" multiple times during the talk, but I should have been more specific that we're referring to Microsoft’s managed PostgreSQL service. Apologies for the confusion.
  - tux3a month ago
    Sorry, I see this has caused some confusion, I should have clarified in my comment. I was responding to the section by Lao Feng at the end, where their conclusion tries to make the case for Pigsty and self-hosted pg.
- oa335a month ago
  There are already several ways to force postgres to use or not use a particular index. We use cloud-hosted and managed Postgres instances on one of the major provider, and I've used a couple of these strategies to massage the query planner.
  1. Fiddle with the Query Planner settings (this can be done on a per query level as well, so its not global). E.g. enable_indexscan=off, enable_indexonlyscan=off
  2. Add a trivial calculation to the filter clause of the query. E.g. select * from table where indexed_col + 0 = 12345 shouldn't use the index, as the planner won't do the arithmetic.
  3. Use the pg_hintplan extension, which allows you to add comments to your code to urge the planner to use certain data access strategies, including specifying which indices to use. See: https://pg-hint-plan.readthedocs.io/en/latest/hint_table.htm...
  - bostika month ago
    That second trick is neat, I'll need to remember that. I also really wish I had known about it back in 2015, when we had to massage an inconveniently big (at the time) PG database and started to carve it out into smaller, purpose-specific instances.
    Being able to verify that an index was either useless or inefficient without jumping through hoops would have saved quite a lot of time.
- affandara month ago
  (Disclosure: I'm from Azure's Postgres team)
  Clarification: OpenAI does not self-host Postgres. They use Azure's managed PostgreSQL offering (aka Azure Database for PostgreSQL Flexible server).
- evaneliasa month ago
  > I'm now hoping all the cloud providers read this article and start exposing the feature to try disabling an index in the query planner before you drop it for real
  I'm surprised there isn't non-superuser DDL to handle this. For example in MySQL you can ALTER an index to make it INVISIBLE (or equivalently in MariaDB, IGNORED) and the planner won't use it.
- lossoloa month ago
  > but it sounds a little bit nerve wracking.
  > But if you're a large scale company ... it can definitely make sense to self-host.
  I'm not a large company like OpenAI and I've been running various PostgreSQL setups for years—ranging from single-node instances without replication to multi-node, fault-tolerant, highly available configurations with automatic failover and backups, serving 4-5 digits of updates, selects, and inserts per second. I'm not sure what you're referring to, but in my experience, once it's set up, it's almost maintenance-free.
  - hobsa month ago
    No large scale database heterogeneous cluster is maintenance-free - plans change because data changes, query patterns change, resource utilization of certain features grows or shrinks, new application features are launched, new indexes need to be added, things are constantly migrated and might need a special case because the default migration strategy doesn't perfectly handle giant bulk changes, etc etc etc etc
    lossoloa month ago
    You need to add indexes on cloud-hosted instances too, perform migrations there as well, and launch new features in your application in both cases. Almost everything you're talking about has little to do with whether you're self-hosting or using a cloud provider. You still have to deal with most of it either way.
    EDIT: Don't get me wrong, I've also managed Kafka clusters, ClickHouse clusters, Elasticsearch clusters, etc. and I have my share of Zookeeper horror stories. Some of the tools I just mentioned are definitely not maintenance-free. But in my experience, you can't really compare PostgreSQL to them.
- cryptonectora month ago
  > Self-hosting ~~postgres~~ is appealing from a flexibility standpoint [...]
- whalesalada month ago
  self hosting psql is trivial - what is the scary part? thats how we used it for decades until things like RDS came around.
  - sgarlanda month ago
    I would never say that self-hosting anything is trivial. Linux administration, tuning, and troubleshooting can be learned, obviously, and the same is true of RDBMS. Neither is a trivial skillset if you want to actually be able to run stuff at scale, though. There's a massive difference between 25 QPS and 25,000 QPS (TFA states ~40 replicas, and an aggregate of 1,000,000 QPS).
    That shouldn't deter anyone from trying, though. You can't learn if you don't try.
  - olalondea month ago
    Back in the 2000s, Database Administrator (DBA) was a pretty popular job title, and they usually got paid way more than regular software developers. It probably wouldn’t have been like that if managing databases was "trivial".
  - lossoloa month ago
    Yeah, that's baffling to me too. I started self-hosting my databases with MySQL 21 years ago, before the cloud even existed.
    edoceoa month ago
    Before the cloud? More than 21 years ago that damn cloud icon was in every Visio network diagram.
    dragonwritera month ago
    The popularity of the cloud icon in diagrams well precedes the term “cloud computing” being coined for on-demand scalable (usually, but not always, remote-hosted) infrastructure and services that accompanied the explosion of such services in about the mid-00s.
    edoceoa month ago
    IME folk were saying move it to the cloud, with the meaning of a manged provider, before 2005.
    lossoloa month ago
    > Before the cloud?
    Yes, the cloud, in the modern sense (as in, on-demand scalable infrastructure like AWS), was just beginning to emerge back then, AWS launched S3 and EC2 in 2006 so 19 years ago. Other cloud services followed over the next several years.
    dborehama month ago
    Before that it was on whiteboards.
- gjvca month ago
  this sort of thing https://www.oracle.com/uk/autonomous-database/what-is-autono... ?
  "An autonomous database is a cloud database that uses machine learning to automate database tuning, security, backups, updates, and other routine management tasks traditionally performed by DBAs. Unlike a conventional database, an autonomous database performs all these tasks and more without human intervention."
  - tux3a month ago
    That sounds like chaos engineering with extra steps! Have the AI randomly tweak your DB, and see if your infra holds up? :)
    mike_hearna month ago
    It's not AI based. "Autonomous" is just the term Oracle uses to communicate that the database has a ton of self-management features, and that it was a big area of focus over time. If you use the recent versions it's a much less admin-heavy database than is conventional for an RDBMS. For example it can notice what queries are slow and add indexes for you, automatically. That's ordinary statistics and planning, GOFAI if you like.
    Disclosure: I have a part time job with Oracle in the research division, so I very much have conflicts of interest, but OpenAI should listen to me anyway (and this post is my own opinion, obviously). Postgres is great and I use it in personal projects, but the amount of effort our industry collectively puts into working around its limitations and problems is a bit concerning. According to the blog post, their "solutions" to the problems mostly involve just not using the database, which is no solution at all. They could just subscribe to a managed Oracle DB in Azure and every problem they're having would go away. The cost is surprisingly not much different to a managed Postgres. Specifically, in the current version:
    - Database nodes scale horizontally and elastically. There is not such a thing as a primary master and read replicas in an Oracle cluster, although you can have read-through cache nodes (they are not replicas). And because of the unusual way Oracle clusters are integrated with the hardware there also isn't such a thing as replication lag within the cluster: writes are fully synchronous even when scaling horizontally and reads are always up to date.
    - Database clusters are HA. Node lists are automatically distributed to client drivers. If a node fails or you do a rolling upgrade on the cluster, it's transparent to the client beyond a minor elevation in latency as they retry. You can not only do rolling upgrades but also roll them across major versions.
    - The MVCC engine in Postgres is strange and unique. Other databases don't have concepts like vacuuming or attendent table/index bloat. This isn't unique to Oracle, but nonetheless switching to something else means a massive productivity drain gone, just like that.
    - You can do schema changes and do them online. They say they don't allow users to create new tables which I'm sorry but that's ridiculous. It's not their fault but I'd consider a database to which you can't add tables to be basically broken. You can even do arbitrary schema changes online in Oracle because the db can create a new table and copy the data across to the new schema whilst doing incremental sync, then do an in-place rename at the end. That works under heavy load.
    - You can disable indexes, marking them as invisible to the planner whilst still maintaining them.
    - You can multiplex transactions onto sessions and do db-side connection pooling, without things like "bouncers", so idle connection management is less of an issue.
    - You can derive 95/99th percentile latencies from v$active_session_history because the wait times for every SQL statement are available.
    - You can audit schema changes along with everything else.
    On pricing, at these scales it just doesn't matter. An ExaData cluster might actually be cheaper. I knocked up an estimate using Azure's pricing calculator and the numbers they provide, assuming 5TB of data (under-estimate) and HA option. Even with a 1 year reservation @40% discount they'd be paying (list price) around $350k/month. For that amount you can rent a dedicated Oracle/ExaData cluster with 192 cores! That's got all kinds of fancy hardware optimizations like a dedicated intra-cluster replication network, RDMA between nodes, predicate pushdown etc. It's going to perform better, and have way more features that would relieve their operational headache. Even if it was a lot more expensive, OpenAI is opportunity-cost constrained right now, not capital constrained. All time their devs are finding complicated ways to not put things in their melting database is time lost to competitors.
    joshsm5a month ago
    Exadata would definitely be an option and it has it's pros (I use it at work but that decision wasn't mine). Some of the features you forgot to mention are: - The built in bloom filters for reducing number of blocks scanned for FTS
    - Oracle (not specific to Exadata) supports creating reverse key indexes to help address high write loads causing contention on indexes, Postgres you can use a reverse function to reverse a string but not the actual byte representation so doesn't work on int/bigint types
    Oracle doesn't come without it's drawbacks:
    - Oracle is infamous for their licensing costs and audits, TDE for example I believe is a separate licensing cost despite that fact that for most companies encryption as rest is required
    - You have to include the cost of DataGuard into your calculations for TCO because there isn't replication built in like Postgres provides for free to handle DR use cases
    - You have to include the cost of Oracle's point in time archiving backups for restoration in case of failure for TCO, the Postgres community has come up with a few solutions for this that are free. At this point in time, I wouldn't consider only daily backups acceptable.
    - Applications typically work with grapheme clusters, a user doesn't need nor care how many bytes the emojis they filled an input field is, you need to store it and user interfaces are going to enforce 1 emoji as 1 character generally. In existing databases clusters, a lot of places have not turned on extended strings so Oracle strugles to handle Unicode due to its limitation on VARCHAR2 of 4000 BYTES until you turn this on.
    - Oracle completely ignores SQL standards isolation levels, whereas Postgres adheres to them pretty well (ignoring repeatable read interpretations which happens with other RDBMS as well)
    > it can notice what queries are slow and add indexes for you, automatically
    This potentially can cause performance differences in DEV/STG/PROD where an index gets created in STG but then isn't there in PROD. Because of this possibility, it would violate the change management processes where I work but likely wouldn't be an issue for OpenAI.
    > Database nodes scale horizontally and elastically. There is not such a thing as a primary master and read replicas in an Oracle cluster
    This comes at the cost of the possibility of cluster waits, and I've seen them happen for 10 minute time period and are difficult to track down or optimize for in my experience.
    > Database clusters are HA. Node lists are automatically distributed to client drivers. If a node fails or you do a rolling upgrade on the cluster
    Everywhere I have worked does not use the full Oracle driver which is what I believe is required for this. Using the full client requires organization coordination that sometimes doesn't exist or won't exist because not every team wants to use Oracle and DevOps doesn't want to support the multitude of ways to use different database drivers. I primarily work in Java and I've only ever seen the JDBC thin client used which as far as I know doesn't support this, you can create connection strings with a list of servers for production with failover to a list of servers in another region designated as your DR database, but I don't believe they're dynamic.
    > You can multiplex transactions onto sessions and do db-side connection pooling
    This is HIGHLY dependent on what programming language/framework you're using. Spring Boot for example uses Hikari and generally avoids using the internal driver pool which likely prevents this. I also believe using the reactive drivers is required to do this in Java in general. Using this directly means you have a hard dependency to the specific database. While it's rare to change database software it can happen, my work is undergoing a switch from MSSQL to Oracle.
    I could be wrong on any of these, considering I have to use Oracle daily at work I'm more than happy to be wrong and learn something since I can apply it immediately.
    mike_hearna month ago
    Thanks for the insight! Some of these things were new to me :)
    I didn't mention audits and the like because OpenAI are in the cloud, and there are obviously no audits for managed databases. That's relevant for self-hosted setups.
    Both options I specced out included a replica, iirc. At least that's what Microsoft's HA option appears to be (it doubles the price, indeed).
    One of the problems in comparing with Postgres is people tend to advertise mutually exclusive options, sometimes in different comments by different people so it's not their fault. Anything that requires an extension might as well not exist for most Postgres users because they want to outsource DB management to the cloud, and there the cloud provider chooses the extensions not the user. Oracle doesn't have this problem because all the features are built-in and managed databases have access to everything.
    Postgres offers strictly serializable isolation indeed, but IIUC it's basically a form of read locking so will tank performance.
    Re: cluster waits. Multi-node clusters will inevitably have failure modes a single machine doesn't unfortunately, I'd prefer to always scale up first before scaling out. One of the advantages of a managed database is that this becomes the cloud provider's problem.
    Re: node list distribution. I think that's what the SCAN feature does, which is supported in the thin driver: https://docs.oracle.com/en/database/oracle/oracle-database/2...
    Yes, the advanced features are typically not exposed by generic middleware and you will have to change things to take advantage of them, but this is a general issue that affects all kinds of software. Features specific to the Linux kernel are often not exposed in Python or Java either, for instance. Some features are easily enabled by just dropping in UCP, which should be an easy upgrade from Hikari. Very new stuff like tx multiplexing requires more work by the app developer to exploit currently, but hopefully with time frameworks will catch up. Point is, the support is there if you need it. Compared to the gymnastics OpenAI are going through, it's easy stuff.
    anarazela month ago
    > Postgres offers strictly serializable isolation indeed, but IIUC it's basically a form of read locking so will tank performance.
    Postgres' SSI [1] does not block reads. If you have lots of transactions reading and updating a lot of rows the granularity of tracking will become coarser to keep memory usage in bound though.
    [1] https://drkp.net/papers/ssi-vldb12.pdf
    mike_hearna month ago
    You're right, I mis-remembered - Postgres calls them "SIREAD locks" but they aren't actually locks. The model is based on aborting transactions.
    The issue with this - and I'm not saying it's a bad feature because it's not and I'd like to have support for every isolation level everywhere - is that most apps can't tolerate transaction aborts. They need to be written to expect it and loop. Looping can easily turn into livelock if you aren't careful and there aren't great tools or known best practices for handling such a system.
    You also have to be careful because such transactions are only strictly serializable vs other transactions run at the same isolation level. So if you accidentally allow in a mix of transactions you are still exposed to isolation anomalies.
    So it's a useful feature but not something most apps can easily port to.
    jose_zapa month ago
    > Postgres offers strictly serializable isolation indeed, but IIUC it's basically a form of read locking so will tank performance.
    It will not. We use it at work for all transactions and its is very performant.
    joshsm5a month ago
    Sorry, holiday, but I was referring to having a separate region for failover, which is called "Geo redundant backup" in Azure Database for Postgres. Whereas with Exadata you need an entirely separate Exadata rack in a different region with the licensing of DataGuard on top of it.
    There are a lot of extensions that most cloud providers support, auto range partitioning isn't built into Postgres like Oracle but it's supported with an extension for example. Azure Database supports more than AWS last I checked. https://learn.microsoft.com/en-us/azure/postgresql/extension...
    I will have to check with our DBAs if we use the scan feature or not, appreciate you pointing it out.
- diggana month ago
  > but it sounds a little bit nerve wracking.
  As long as you're doing backups (you are doing backups, right?), and validate that those backups work (you are validating that those backups work, right?), what's making you nervous about it?
  - mmcnla month ago
    Doing backups and validating backups is very error-prone and time consuming. To me this reads as: "If you do all the hard complex work yourself, what's making you nervous about it?"
    It's far easier to do backups and database hosting at scale. Database failures are rare, so it's this one-off situation that you have to be prepared for. That requires clearly defined processes, clearly defined roles and responsibilities, and most importantly: feedback from unfortunate incidents that you can learn from. All that is very hard to accomplish when you do self-hosting.
    ownagefoola month ago
    It's actually probably a more difficult problem at scale.
    When you have a single smallish schema, you export, restore, and write automated tests that'll probably prove that backups in 10 minutes ( runtime, development time few days / weeks ). Either the transaction runs or errors, and either the test passes or not.
    The problem when small is obviously knowledge, skills, and procedures.
    Things like:
    - What if the monitoring that alerts me that the backups are down, is also actually down. - What do you mean it's no longer "safe" to kubectl delete pvc --all? - What do you mean there's nobody around with the skills to unfuck this? - What do you mean I actually have to respond to these alerts in a timely manner?
    The reality is, when the database is small, it typically doesn't cost a whole lot, so there's a lack of incentive to really tool and skill for this when you can get a reasonable managed service.
    I typically have those skills, but still use a managed service for my own startup because it's not worth my time.
    Once the bill is a larger than TCO of self-hosting you have another discussion.
    diggana month ago
    > Doing backups and validating backups is very error-prone and time consuming
    Right, but regardless of using a managed database service or self-hosted database, this is something you probably are doing anyways, at least for a production service with real users. Sure, the managed service probably helps with you with the details of how the backup is made, where it's stored, and how the restore process happens, but you still need to validate your backups and the rest, so replicate that process/experience with your self-hosted setup and you're nearly there.
  - mewpmewp2a month ago
    Yeah, that's what is a bit odd to me. I feel like AWS and everything like that is much more of a black box compared to some things like Postgres that is so fully tested, proven to be reliable, etc.
  - tux3a month ago
    I think managing stateless infrastructure is much easier, if anything goes haywire you can expect a readiness probe to fail, k8s quietly takes down the instance, and life continues with no downtime.
    It is also perfectly possible to roll your own highly-available Postgres setup, but that requires a whole another set of precise configuration, attention to details, caring about the hardware, occasionally digging into kernel bugs, and so forth that cloud providers happily handle behind the scene. I'm very comfortable with low-level details, but I have never built my own cloud.
    I do test my backups, but having to restore anything from backups means something has gone catastrophically wrong, I have downtime, and I probably have lost data. Everything to prevent that scenario is what's making me sweat a little bit
    lossoloa month ago
    > occasionally digging into kernel bugs
    No, it doesn't. I've been self-hosting a multi-node, highly available, and fault-tolerant PostgreSQL setup for years, and I've never had to go to that level. After reading your whole post, I'm not sure where you're getting your information from.
    tux3a month ago
    Horror stories stick with me more than success stories, but I'm happy to take the feedback. I'm glad it went well for you, that's a small update for me.
    JimBlackwooda month ago
    > occasionally digging into kernel bugs
    Haha, been there! We recently had outages on kube-proxy due to a missing `—set-xmark` option in iptables-restore on Ubuntu 24.04.
    On any stateful server we always try to be several major versions behind due to issues like above - that really avoids most kernel bugs and related issues.
  - anal_reactora month ago
    In Spain when you want to travel by high-speed train, you need to though security check, just like at an airport. Do the security checks make sense? No. But nobody wants to be the politician that removes the security checks, and then something bad happens. So the security checks stay.
  - fipara month ago
    Backups with periodic restore validation (which is not trivial) are a must, but don’t make database maintenance any less nerve wracking.
    Sure, you won’t lose data, but the downtime …
  - evantbyrnea month ago
    How do you prefer to collect backups when self-hosting postgres?
    ziea month ago
    We use barman too, but we do hourly and daily restores into different database instances.
    So for example our prod db is tootie_prod We setup another instance the restores from barman every hour and renames the db to tootie_hourly.
    We do the same thing daily.
    This means we have backup copies of prod that are great for customer service and dev troubleshooting problems. You can make all the changes you want to _daily or _hourly and it will all get erased and updated in a bit.
    Since _hourly and _daily are used regularly, this ensures that our backups are working too, since they are now a part of our daily usage to ensure they never break for long.
    JimBlackwooda month ago
    Hey, this is a pretty neat idea! I might just use this :)
    ziea month ago
    Please do!
    JimBlackwooda month ago
    Not OP, but:
    Barman on the host with a cronjob for physical backups and as archive/restore command for wal archiving and point in time recovery.
    Another cronjob for logical backups.
    They all ship to some external location (S3/SFTP) for storage.
    I like the above since it adds minimal complexity, uses mainly native postgres commands and gives pretty good reliability (in our setup, we’d lose the last few minutes of data in the absolute worst case).
cryptonectora month ago
> Feature Requests
> Concerning schema changes: they desire PostgreSQL to record a history of schema change events, such as adding or removing columns and other DDL operations.
You can do this right now today by using `EVENT TRIGGER`s. You can check out things like Aquameta[0] (if I remember correctly) to see how it's done.
[0] https://github.com/aquametalabs/aquameta
- weitendorfa month ago
  We are building similar functionality for our own postgres setup right now.
  Of course, postgres is very powerful and you can implement anything like this in many different ways. But at the same time, maintaining DDL history and tracking major changes to the database is a very common requirement, and unfortunately many people don't realize that until they learned that lesson the hard way.
  Relatedly are not DDL changes per-se, but big/important db operations that you want to also keep a record of so that you can look back and understand why something changed. I am not sure if this is the right term, but basically when we update our pricing model or skus, or set custom pricing for someone, we want those updates to be "auditable".
  Actually, I think this is a relatively common use case too: a fully relational model often leaves you with a large number of "static" tables that only change when you're making updates to your application. They support the smaller number of big, important, dynamic tables that your application is regularly modifying. If you had the foresight to recognize that you'd likely need to change those static tables in the future, you probably organized them so you could do so by adding new rows. It is not quite a DDL change but it is a big, risky change to your application logic that only happens rarely, and you basically just want to keep a list of all those big changes in case things get messed up or you find yourself unable to make sense of older data.
  - cryptonectora month ago
    There's lots of extensions that automatically create audit tables and record history in them. The trick is to a) create audit tables for all _existing_ tables, b) create an event trigger so you can have audit tables created automatically for any future new tables. Here's an example of how to do it: https://github.com/twosigma/postgresql-contrib/blob/master/a...
    Another thing you might do is to go with a schema that follows the event shipping pattern. In this pattern you have the "truth" held in insert-only tables (deletes and updates not allowed), then turn those "event" tables into ones you can query naturally using VIEWs, MATERIALIZED VIEWs, or live tables that you update with triggers on the event tables. Then your event tables _are your history/audit_ tables.
    weitendorfa month ago
    Yeah, I guess what I mean is, it's good for very common use cases to have fully supported features in the product itself even if there are third party tools to handle it for you or I already know how to implement it myself.
    I have been working on database features and functionality that I felt I moderately to fully understood and just needed to sit down and implement in the next week, for close to 3 weeks now.
    > In this pattern you have the "truth" held in insert-only tables (deletes and updates not allowed), then turn those "event" tables into ones you can query naturally using VIEWs, MATERIALIZED VIEWs, or live tables that you update with triggers on the event tables.
    This is almost exactly what I'm doing, with an additional versioning column (primary key on (id, version_num)). After I did that I realized that it'd be better to correlate changes with a push_id too because if some operations didn't modify all ids then I wouldn't be able to easily tell which versions were updated at the same time. But then I realized most of my "auditable pushes" would be operations on 3-4 related tables and not just an individual table, so push_ids would be performed on all tables. And also, since not every push modifies every value, it makes sense to model pushes as additions + diffs to the existing table. But then after several pushes constructing the MATERIALIZED VIEW of active values becomes rather complex because I have to convert a sparse tree of diffs across multiple tables into a flat table recursively...
    So yeah it would be pretty nice for postgres to have something that mostly just works to audit changes at either the user, function, or table level.
    cryptonectora month ago
    > it's good for very common use cases to have fully supported features in the product itself
    There are a _lot_ of incredibly useful extensions to PG. What you find useful and necessary someone else might find to be unnecessary bloat. Over time the industry's demands will become clear. Another issue is that different users want auditing done differently, and so it might be difficult for PG to have one solution that fits all use-cases.
  - aftbita month ago
    CREATE TABLE public.audit ( id uuid NOT NULL, created_time timestamp without time zone DEFAULT now() NOT NULL, schema_name text NOT NULL, table_name text NOT NULL, record_id uuid NOT NULL, user_name text, action text NOT NULL, old_data jsonb, new_data jsonb ); CREATE OR REPLACE FUNCTION audit_if_modified_func() RETURNS TRIGGER AS $body$ DECLARE v_old_data JSONB; v_new_data JSONB; BEGIN IF (TG_OP = 'UPDATE') THEN v_old_data := to_jsonb(OLD.*); v_new_data := to_jsonb(NEW.*); IF (TG_TABLE_NAME::TEXT = 'users') THEN v_old_data = v_old_data - 'last_login_time'; v_new_data = v_new_data - 'last_login_time'; END IF; IF (v_old_data <> v_new_data) THEN INSERT INTO audit (id,record_id,schema_name,table_name,user_name,action,old_data,new_data) VALUES (uuid_generate_v4(), NEW.id, TG_TABLE_SCHEMA::TEXT, TG_TABLE_NAME::TEXT, session_user::TEXT, substring(TG_OP,1,1), v_old_data, v_new_data); END IF; RETURN NEW; ELSIF (TG_OP = 'DELETE') THEN v_old_data := to_jsonb(OLD.*); INSERT INTO audit (id, record_id,schema_name,table_name,user_name,action,old_data) VALUES (uuid_generate_v4(), OLD.id, TG_TABLE_SCHEMA::TEXT,TG_TABLE_NAME::TEXT,session_user::TEXT,substring(TG_OP,1,1),v_old_data); RETURN OLD; ELSIF (TG_OP = 'INSERT') THEN v_new_data := to_jsonb(NEW.*); INSERT INTO audit (id, record_id,schema_name,table_name,user_name,action,new_data) VALUES (uuid_generate_v4(), NEW.id, TG_TABLE_SCHEMA::TEXT,TG_TABLE_NAME::TEXT,session_user::TEXT,substring(TG_OP,1,1),v_new_data); RETURN NEW; ELSE RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - Other action occurred: % at %',TG_OP,now(); RETURN NULL; END IF; EXCEPTION WHEN data_exception THEN RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - UDF ERROR [DATA EXCEPTION] - SQLSTATE: % SQLERRM: %',SQLSTATE,SQLERRM; RETURN NULL; WHEN unique_violation THEN RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - UDF ERROR [UNIQUE] - SQLSTATE: % SQLERRM: %',SQLSTATE,SQLERRM; RETURN NULL; WHEN OTHERS THEN RAISE WARNING '[AUDIT_IF_MODIFIED_FUNC] - UDF ERROR [OTHER] - SQLSTATE: % SQLERRM: %',SQLSTATE,SQLERRM; RETURN NULL; END; $body$ LANGUAGE plpgsql SECURITY DEFINER; CREATE TRIGGER audit_logger_accounts AFTER INSERT OR UPDATE OR DELETE ON accounts FOR EACH ROW EXECUTE PROCEDURE audit_if_modified_func(); CREATE TRIGGER audit_logger_users AFTER INSERT OR UPDATE OR DELETE ON users FOR EACH ROW EXECUTE PROCEDURE audit_if_modified_func();
    weitendorfa month ago
    It's not that maintaining some kind of audit history is an insurmountable problem, it's that it's both a very common need and the "right" way to do it can take significantly longer to set up than a quick naive approach. Once you take into account batching (many if not all of your pushes to auditable tables that are typically static will happen in batches within that table and across multiple other tables, because you are updating your application logic or implementing some new feature) and the need to not blow up your storage with audit logs by recording diffs, and the need to be able to "replay" or "rollback", etc. it's quite a lot to handle
    And again, it's not impossible to do any of that. I just think it seems repeatable enough that it could be a first-party feature, and I would prefer to just enable or configure that feature and move on to other problems.
    Going from something like your naive approach which I assume an LLM generated for you, to something productionized enough to bet your business on it, is not always trivial.
    aftbita month ago
    >Going from something like your naive approach which I assume an LLM generated for you, to something productionized enough to bet your business on it, is not always trivial.
    I'll admit, this hurts to hear a bit. This is real production code I've been running in a webapp that initially launched targeting PostgreSQL 9.5. This is not some crazy web-scale situation - these specific tables only experience at most 1 qps of write load - but this has solved the problem for this business as it stands here.
- tudorga month ago
  We use event triggers both in pgroll [0] and pgstream [1] to capture DDL changes. It's a feature of pgroll that it captures the schema migration history, regardless if you do the changes via pgroll or with another tool. And pgstream uses it to put the schema changes in the logical replication stream.
  One thing to be aware is that on many Postgres DBaaS EVENT TRIGGERS are not allowed, because they generally require superuser. But RDS and Aurora do support them, we (Xata) support them of course, and I think Supabase is working on adding support for them.
  [0]: https://github.com/xataio/pgroll
  [1]: https://github.com/xataio/pgstream
- erichansona month ago
  Thanks for remembering Aquameta and stay tuned! Some cool new stuff coming out shortly.
  - cryptonectora month ago
    :)
    Hey, I see you've added a web server. How does it compare to PostgREST?
    erichansona month ago
    Big difference is it's implemented in pl/pgsql so as Aquameta evolves, there's no external deps. There's a function called endpoint.request(http_verb, URL, post vars) and then it does the rest in plpgsql. A thin Go daemon just takes the request and throws it at the function.
    It does a lot of the same stuff PostgREST does. Automatic REST interface to any database, but it also hosts static resources, and dynamic mapping of URL templates to functions.
    Needs a rewrite though. That's probably the next big dev push. Right now we're rewriting the system catalog (meta) and the data VCS.
    cryptonectora month ago
    Sounds really good. Your vision is excellent, and I admire it and your work.
    BTW, it's very important to get the details of authentication and authorization right. PostgREST does: it validates JWTs and exposes the claims to the SQL application via set_config()s, just like every detail of every request, and it does a `SET` to set the role to whatever the authenticated user is, though this could be done by a SECURITY DEFINER function instead that looks at the claims.
    erichansona month ago
    Real talk.
    https://www.postgresql.org/message-id/flat/894C0144-5BCC-41C...
    set_user extension is AFICT essential for scaling user-per-role.
    cryptonectora month ago
    I'm tempted to build something similar (because I can't use GPL libraries at $WORK and I also can't use PostgREST for reasons), but somewhere between what you do in Aquameta and what PostgREST does, with as much logic as possible in SQL or PlPgSQL.
    Basically I'd have a thin HTTP/REST layer that does a set_config() for every request header and q-param, and which compiles the resource path to a SELECT (for GETs) or a DML (for POST/PUT/DELETE/PATCH), with similar restrictions to PostgREST, though perhaps a bit more liberal.
    For DMLs the local-part would have to be just a table/view, or a function call (what to do with the request body in the case of a function call? reject it).
    For a GET... I've ideas. The simplest would be to have a very simple "language" to compile to SQL, not too unlike PostgREST. The craziest one would be to have a transliteration of literal-value-free (certainly no strings, no quotes, no arrays, no row values, maybe only just integer and boolean literals) SQL statements to local-part, and the code to convert it back to a SELECT would parse the query and make sure it uses no tables/views/functions outside the public schema (and if functions, only ones marked pure), with q-params as... query parameters -- no SQL injection attacks here.
    As in PostgREST the SQL code in the database (but not the caller's SQL) could set_config() things like response headers and status-code. Authentication would be done via SECURITY DEFINER functions that consume whatever Negotiate or Bearer token in the Authorization: header and SET the session role (which would always get RESET before the next request).
    Except for the need for a parser for validating any queries (for the "crazy" design, if I go there) this would make the Java/Go/Rust/Whatever code quite simple, and w/o requiring any dynamic SQL in the DB.
venriqa month ago
I don't deny how powerful PostgreSQL is, but it appears that the decision to choose PostgreSQL was the most significant issue all along. It's shocking how often we find teams selecting Postgres to implement a solution that it's not suitable for. Yet, you see a document with a forced and non-logical narrative trying to justify the decision.
With the NewSQL options available today, which provide distributed relational databases with multi-masters out of the box, it seems to me that many teams select Postgres simply because that's all they know, and that's the source of the problem.
williamdclta month ago
Not super interesting, this is fairly basic stuff that you'll encounter at orders of magnitude smaller scale than OpenAI. Creating indexes CONCURRENTLY, avoiding table rewrites, smoothing out traffic, tx timeouts, read replicas... It's pretty much table stakes, even at 10000x smaller scale.
Their requests to Postgres devs aren't anything new either, everyone has wished for it for years.
The title is kind of misleading: they're not scaling it to the "next level", they're clearly struggling with this single-master setup and trying to keep it afloat while migrating off ("no new workloads allowed"). The main "next scale" point is that they say they can "scale gracefully under massive read loads" - nothing new, that's the whole point of read replicas and horizontal scaling.
Re: "Lao Feng Q&A":
> PostgreSQL actually does have a feature to disable indexes. You can simply set the indisvalid field to false in the pg_index system catalog [...] It’s not black magic.
No. It's not documented for this use, so it's not a feature. It's fooling around with internals without guarantees of what this will do (it might do what you want today, it might not in the next release). Plus as they point out, managed Postgres providers don't let you fiddle with this stuff (for good reasons, as this is not a feature).
> there’s a simpler solution [to avoiding accidental deletion of used indexes]: just confirm via monitoring views that the index is not being used on either primary or replicas
That doesn't quite solve all the same problems. It's quite frequent that an index is in use, but is not _needed_: another index would also work (eg you introduced a new index covering an extra column that's not used in this query). Being able to disable an index would allow checking that the query plan does use the other index, rather than praying and hoping.
- sgarlanda month ago
  > they're clearly struggling with this single-master setup and trying to keep it afloat while migrating off ("no new workloads allowed").
  TFA states they’re at 1 million QPS, in Azure. 1 million QPS with real workloads is impressive, doubly so from a cloud provider that’s almost certainly using network-based storage.
  EDIT: they have an aggregate of 1 million QPS across ~40 read replicas, so 25K QPS each, modulo writes. I am less impressed.
  > That doesn't quite solve all the same problems. It's quite frequent that an index is in use, but is not _needed_: another index would also work (eg you introduced a new index covering an extra column that's not used in this query). Being able to disable an index would allow checking that the query plan does use the other index, rather than praying and hoping.
  Assuming your table statistics are decently up to date and representative (which you can check), this basically comes down to knowing your RDBMS, and your data. For example, if it's a text-type column, do both indices have the same operator class (or lack thereof)? Does the new index have a massive column in addition to the one you need, or is it reasonably small? Do the query projections and/or selections still form a left-most prefix of the index (especially important if any queries perform ordering)?
  - williamdclta month ago
    > EDIT: they have an aggregate of 1 million QPS across ~40 read replicas, so 25K QPS each, modulo writes. I am less impressed.
    Yeah that's my point! That's the load per instance that I see at my current company, we just have fewer replicas.
    > Assuming your table statistics are decently up to date and representative (which you can check), this basically comes down to knowing your RDBMS, and your data
    I'm pretty good at this stuff, and I don't often dabble with complex indexes. And yet I don't have 100% confidence. No-one is perfect: maybe I made a mistake in assessing index equivalence, maybe I forgot to check replicas, maybe there's _something somewhere_ that depends on this index without me being aware of it... It's a destructive operation where the only confidence you can have is _theoretical_, not operational: it's kind of crazy and people have been requesting this feature for years for good reasons. If you get it wrong (and getting it right is not trivial), production is on fire and it's potentially hours of downtime (or days, if it's a massive table!).
    For example, RDS forces you to shutdown an instance before deleting it. At this point, if anything was relying on it then alarms go off and you can quickly turn it back on. This should be standard functionality of anything stateful.
    sgarlanda month ago
    > Yeah that's my point! That's the load per instance that I see at my current company, we just have fewer replicas.
    Yup. At a previous company and current, I had single instances handling 120K QPS.
    > If you get it wrong (and getting it right is not trivial), production is on fire and it's potentially hours of downtime (or days, if it's a massive table!).
    You’re not wrong. Hopefully stage is representative enough to gain confidence. For self-hosted, I use the indisnotvalid method, but I do get that it’s not a feature per se.
- jfima month ago
  I'm pretty perplexed as well. They mention that they're not sharding PostgreSQL, without mentioning why in the article, but isn't that an obvious issue to many of their scaling problems?
  I don't really see what it is that they're doing that requires a single master database, it seems that sharding on a per user basis would make things way easier for them.
  - yen223a month ago
    I can sympathise with wanting to stay on a single master architecture as long as possible. Sharding is a major step change in terms of maintenance burden.
  - sarchertecha month ago
    My guess is that they didn’t design their schema around multi tenancy, and going back to pick apart all the shared tables, rewrite queries etc… will take them more time than they want to spend.
davidkuennena month ago
They seem to be using physical replication. I'm currently thinking of switching to logical replication to reduce inter region egress cost.
Do you think that's a good idea? There seems to be many improvements to native logical replication since Postgres 17.
vb-8448a month ago
I wonder how much their performance can improve if they put the write-instance on dedicated servers (with local and very fast ssd) and use managed services only for read-replicas.
redwooda month ago
What I find odd about this is there's no mention of all the other engines that must be in the mix powering different types of queries: I have no doubt they're using a little of everything, from scaling key-value to search, vector search, caches... They must be doing summersaults to avoid over-saturating this over-saturated Postgres env... yet only Postgres is discussed here.
airstrikea month ago
Title should really be "Scaling PostgreSQL to the Next Level at OpenAI", which is the actual title of the talk
bhoustona month ago
Argh. Shard the damn database already.
Why are they not sharing by user/org yet? It is so simple and would fix the primary issue they are running into.
All these work arounds they go through to avoid a straight forward fix.
- samwillisa month ago
  The message of the talk was very much that you can scale to massive throughput without having to shard and having only a single master.
  Of course they considered it, but the tradeoffs didn't match what they wanted to do - plus they found you could scale to this level without sharding.
  - vanviegena month ago
    The talk seems to be mostly about all the limitations and workarounds they've had to deal with, because they choose not to shard. Apparently, they have a policy of adding no new functionality to the database, which presumably means additional separate database services being setup for each new feature. That sounds a lot like accumulating tech debt very rapidly, just because sharding is not on the table, for whatever reason.
    bhoustona month ago
    Yeah, when they mentioned that they couldn't put any more services on their main DB because of this issue I did a facepalm. They are building out explicit tech debt now because they are not sharding.
- bohanoaia month ago
  Speaker here — Bohan from OpenAI.
  Our application has hundreds of endpoints, which makes sharding non-trivial. We've already offloaded shardable workloads—particularly write-heavy ones—from PostgreSQL. What remains is primarily read-only and would require substantial effort to shard. Currently, the workload scales well on Azure Database for PostgreSQL, and we have sufficient headroom to support future growth.
  That said, we're not ruling out sharding in the future—it’s just not a near-term priority.
- iampimsa month ago
  Not sure I would qualify sharding a DB that get 1M qps as straight forward. I agree with you that it seems that an org would be a natural sharding key, but we know that at this scale, nothing really is ever straight forward, especially when it's your first rodeo.
  - bhoustona month ago
    > Not sure I would qualify sharding a DB that get 1M qps as straight forward.
    Sharding at the application layer (basically figure out the shard from org/user in your application code prior to interacting with the DB), will scale to any QPS rate. This is what I was referring to.
  - evaneliasa month ago
    That's true, but that's also why you really should shard long before hitting that point...
    If your company is growing at this insane rate, it should be obvious that eventually you must shard. And the longer you delay this, the more painful it will be to accomplish.
- mike_hearna month ago
  Sharding is often not simple. The whole reason you're using a powerful database in the first place is that you want its ability to analyze data and answer complex questions. If you didn't you might as well just use a bunch of NFS mounts: it's sharding and even simpler than a database.
- levkka month ago
  That's exactly what I'm saying!
harisund1990a month ago
Seems like a perfect use case for a distributed database like YugabyteDB. Have you looked into it?
kabesa month ago
"He finally concluded with some requests to the postgres developer community"
... You're one of the most well funded companies in the world, you shouldn't be asking for features to open aource devs, but you should be opening PRs
- onion2ka month ago
  That assumes that every problem can be solved with money, and that the money (or the PRs the money funds) will be welcomed by the community. By talking about what OpenAI needs Bohan is hopefully starting a conversation with the community in order to engage with everyone respectfully and to work with them.
  Railroading an open source project with money or dev time in order to force it to go in the direction you want is not the right way. Those things should be available if the community asks, but they shouldn't be the opening offer.
  - deadbabea month ago
    It’s open source, you make your own fork and move on with or without the community.
    airstrikea month ago
    OK but then you lose the benefit of all future upstream changes and now you have to manage constant rebasing
    Engaging the community and intelligently advocating for improvements is a way to contribute to projects as well, especially if you're willing to use a disposable forks to explore the design space, put forth RFCs, PRs, etc.
    WesolyKubeczeka month ago
    > OK but then you lose the benefit of all future upstream changes and now you have to manage constant rebasing
    I thought they were swimming in enough money to hire someone to do the rebasing. Or dogfood their models to do the same.
    deadbabea month ago
    Oh the horror of having to do some rebasing in a world where LLMs do all the hard work for you anyway. And at an AI company with unlimited access to compute resources no less.
    cryptonectora month ago
    You really don't want to do that with PG. Keeping your patches rebased will be a huge pain.
    deadbabea month ago
    Use LLMs.
    cryptonectora month ago
    No, not for that.
    deadbabea month ago
    For anything, they probably excel at these sort of tasks.
    cryptonectora month ago
    I use https://gist.github.com/nicowilliams/ea2fa2b445c2db50d2ee650... for this.
    deadbabea month ago
    You could expose this as a tool for an LLM.
  - croesa month ago
    I thought OpenAI has an AI that can code.
- samwillisa month ago
  Going to PGconf.dev, explaining what you are doing to the core devs, the problems you faced, and what you would love to have as feature in Postgres, is exactly the right way to go about improving Postgres.
  Blindly opening "PRs" (Postgres doesn't work in this way, it's diffs on a mailing list) would not get you anywhere very fast. You need buy in and consensus from the wider development team.
- ziea month ago
  Just like with any other OSS project, you have to convince the upstream developers that your PR is worth merging.
  You don't do that by throwing PR's over the wall and then moving on. You do that by being part of the community.
  That said sometimes you just don't have the resources to engage another community at the moment, so you push the PR over the wall anyway, assume it won't ever land and act accordingly.
  The smaller and less impactful the change, the bigger chance of it landing. I'm always clear with my PR's that I push over the wall though: I probably won't be around to maintain this, feel free to not merge, etc. I also try to thank them for their service to the community and share how their code made my life easier.
- victorbjorklunda month ago
  It does not matter if you open a PR if the project is not open to that solution. You can't just add any random thing to Postgres as an outsider. You first have to convince the people in the project that X is a good thing to add and only after that does it make sense to actually implement it (and even then unlikely that OpenAI has a developer on staff with enough Postgres experience to actually write it. More likely they will then just sponsor a dev from the project).
- cryptonectora month ago
  > ... You're one of the most well funded companies in the world, you shouldn't be asking for features to open aource devs, but you should be opening PRs
  PG is really difficult to contribute to because it's such a fast moving target. You get your patches into one commitfest and then you don't get them accepted in time, now you're into the next commitfest, and now you have to rebase across 1,000 commits, lather, rinse, repeat.
  Contributing to PG is nearly a full-time job!
  I bet it's much easier to find an existing committer to PG and pay them as consultants to do the work you need.
  And as siblings point out, you have to figure out what the upstream might be willing to accept, and they might have to tell you that. This requires a conversation. Presenting to them is a way to start that conversation.
- dahcryna month ago
  that's not how that works.
  Money doesn't mean "I built whatever I want and Postgres will evolve into whatever I want by pushing my code". They still need to align and plan. Sure, they'll build the things and contribute, but they don't own it and still need to accomodate the wishes of the project
philosophtya month ago
OpenAI and these companies hires inexperienced people with zero operational experience and this is how they run things. It's almost funny if you didn't see how unreliable the end result was.
Postgres is powerful but just not suited for this role. But if your only tool is a hammer...
- phillipcartera month ago
  OpenAI and these companies do nothing of the sort.
  - a month ago
    undefined
  - philosophtya month ago
    Yes, they absolutely do and this is an example of it.
    phillipcartera month ago
    I would advise actually clicking through to see who the speaker is, who is categorically not a new grad nor someone with no experience in this domain.
    philosophtya month ago
    I don't want to single anyone out but if you think this is "expert" level experience you don't have the relevant experience yourself.
    I actually don't want to be overly critical but I do find the arrogance of these companies annoying.
    phillipcartera month ago
    Based on your replies I have zero reason to believe you know any better at all, considering the false statements you've already made (trivially proven, no less!) and lack of any meaningful critique of the source material.
beltera month ago
Many of these scaling issues would be solved, if they simply used AWS and Aurora Postgres.
https://pages.cs.wisc.edu/~yxy/cs764-f20/papers/aurora-sigmo...
https://youtu.be/pUqVCK7Ggh0
- sjdidhshaba month ago
  Aurora is Postgres compatible but it’s not equivalent. It was my understanding that the underlying implementation was distinctly different from regular Postgres. So while maybe not as risky as dropping in MySQL, dropping in aurora is still akin to switching to a brand new db and certainly not simple.
  Or are you saying they should have started on Aurora from the start?
  - beltera month ago
    Yes they should have started with Aurora Postgres. Note that for Aurora Postgres you can just import a Postgres backup and its the same SQL Code. Its Postgres+plus....
- sgarlanda month ago
  Oh yeah, separating compute and storage by a large physical distance was a great idea that certainly had no downsides! It's so awesome that they added "Optimized Reads" as an option, which is literally just running the DB on a server with a local NVMe drive - you know, how people used to do things.
  The only feature that Aurora (MySQL) has that is remotely impressive is its ability to restart the DB process without losing the buffer pool. Aurora (Postgres) has no interesting differentiations.
  I've benchmarked both, with prod-like workloads, against some 12 year old Dell R620s I have, which have NVMe drives exposed via Ceph over Infiniband. The ancient servers handily beat Aurora and RDS on everything except when the latter had an instance type with a local NVMe drive, at which point it's just superior clock speed and memory throughput.
  I despise Aurora with a burning passion. AWS successfully hoodwinked companies everywhere with bullshit, and are absolutely raking in cash because of it.
  - evaneliasa month ago
    > The only feature that Aurora (MySQL) has that is remotely impressive
    Aurora is one of the only options if you need low-lag physical replication in a MySQL-compatible environment. That makes it operationally feasible to execute large/heavy writes or DDL which would normally cause too much replication lag on traditional (async binlog-based) MySQL replicas.
    Granted, there's some important fine print: long transactions will still block InnoDB purge of old row versions, and in Aurora that's cluster-wide. But in any case, personally I'd consider nearly-lag-free replication to be an important differentiator. This can be leveraged in interesting ways, for example CashApp's `spirit` OSC tool (https://github.com/block/spirit) can do online schema changes blazing-fast because it doesn't need to throttle its write rate to avoid replication lag.
    Scale-to-zero is also nice for dev/test environments.
    That said, I do agree with your overall point that Aurora was majorly over-marketed. And Amazon's capture of so much revenue in the MySQL space has been extremely detrimental for the MySQL ecosystem, especially considering Aurora's modifications are proprietary/closed-source.
    sgarlanda month ago
    Re: lag, that’s a fair point, but IME devs don’t care if it’s 10 msec or 1000 msec, they don’t trust it either way. “I have to read after write.” I know MySQL doesn’t have RETURNING, maddeningly, but I’ve tried explaining how they could use LAST_INSERT_ID() to no avail.
  - beltera month ago
    > The only feature that Aurora (MySQL) has that is remotely impressive is its ability to restart the DB process
    I dont really care about Aurora MySQL...only Aurora Postgres, but you forgot about Parallel Query and Clones. For clones you dont pay for the extra storage for the new database, only the delta if you add new data...
    https://aws.amazon.com/blogs/aws/new-parallel-query-for-amaz...
    https://aws.amazon.com/blogs/aws/amazon-aurora-fast-database...
    "...AWS successfully hoodwinked companies everywhere with bullshit, and are absolutely raking in cash because of it."
    Really?...
    "How Twilio modernized its billing platform on Amazon Aurora MySQL" - https://aws.amazon.com/blogs/database/how-twilio-modernized-...
    "No observable Aurora downtime taken in over 5 months of experimentation, and almost 2 months of running shadow production..
    Steady state metrics on over 40 accumulated days of live production data across all Aurora clusters:
    - Over 46 billion transaction records indexed and available, compared to less than one billion stored in the former online Redis system
    - 4.8 TB of data across all tables
    - Over 11,000 active database connections to all clusters
    - Less than 10 milliseconds median end-to-end transaction run latency
    - Less than 60 milliseconds 99th percentile end-to-end transaction run latency..."
    "Increasing Scalability and Reducing Costs Using Amazon Aurora Serverless with BMW" - https://aws.amazon.com/solutions/case-studies/bmw-group-auro...
    "FINRA CAT selects AWS for Consolidated Audit Trail" - https://aws.amazon.com/blogs/publicsector/finra-cat-selects-...
    https://aws.amazon.com/rds/aurora/customers/
    sgarlanda month ago
    > For clones you dont pay for the extra storage for the new database, only the delta if you add new data...
    Considering how much they’re charging you just to query storage, that’s still a net negative. If anything, you’re going to pay MORE since you’re probably querying more.
    > No observable Aurora downtime taken in over 5 months of experimentation
    I manage somewhere north of 500 Aurora instances spread across dozens of clusters. We have one drop out at least weekly, if not more often.
    > Over 46 billion transaction records indexed and available, compared to less than one billion stored in the former online Redis system
    This isn’t unique to Aurora.
    > 4.8 TB of data across all tables
    Neither is this; also, it’s honestly not that big.
    I doubt we’re going to convince each other of anything here.
    beltera month ago
    > We have one drop out at least weekly, if not more often.
    You mean an instance? A cluster wont go down because of that.
    I dont work for AWS :-) and dont want to convince you of anything. But there is a reason why they developed Aurora, and DynamoDB and it was not because some software developer had hours to waste...
    sgarlanda month ago
    Yes, an instance. You know you can set up hot standby for vanilla MySQL and Postgres as well and achieve the same thing, right?
    They developed those to make a shit ton of money, and on that front, they succeeded.
- beltera month ago
  Surely someone can articulate the flaw...Unless, of course, there isn’t one worth mentioning....
  - cbg0a month ago
    They're deployed on Azure and have a deep partnership with Microsoft, so they can't "simply" use a different cloud.
    Also, recommending a black box managed solution isn't an option for some large companies that have their own hardware & datacenters and which may want to use open source solutions they can easily deploy, fork and support themselves to keep costs under control.
    beltera month ago
    They are one of the most well capitalized company/startup/foundation/non-profit in the planet and just spent 6,5 billion to hire a designer.
    They should be using the best technical and cheapest solution, and they owe it to their investors. At their scale they will never be able to use anything else than a cloud solution.
    They could solve these issues at the number of users they report, for a monthly bill below 25 million dollars.
    "6,311 database instances running the PostgreSQL-compatible and MySQL-compatible editions of Amazon Aurora processed more than 376 billion transactions, stored 2,978 terabytes of data, and transferred 913 terabytes of data" - https://aws.amazon.com/blogs/aws/how-aws-powered-prime-day-2...
    CharlesWa month ago
    > At their scale they will never be able to use anything else than a cloud solution.
    That's definitely not true, and there are many companies doing higher volumes at a fraction of the cost-per-query.
    Although scale doesn't force companies into public-cloud database systems, considerations like capital, time-to-market, and business strategy often do. In this case, OpenAI is trading a significantly higher per-query cost for benefits like improved agility, turnkey compliance, etc.
    iampimsa month ago
    but that'd be real money, not the Monopoly money they used to buy Ive/Windsurf...
- v5oa month ago
  [dead]