pg_usleep(1000L);
Virtually any time you put a fixed sleep in your program, it's going to be wrong. It is either too long or too short, and even if it is perfect, there is no guarantee your program will actually sleep for exactly the requested amount of time. A condition variable or something similar would be the right thing to use here.Of course, if this code path really is only taken very infrequently, you can get away with it. But assumptions are not always right, it's better to make the code robust in all cases.
Unfortunately somebody decided that transaction locks don't need to be maintained when running as a hot standby. Which turns that code into a loop around the usleep...
(I do agree that sleep loops suck and almost always are a bad idea)
This is technically true, but in the same way that under a preemptive operating system there's no guarantee that you'll ever be scheduled. The sleep is a minimum time you will sleep for, beyond that you already have no guarantee about when the next instruction executes.
On a more serious note, any reasonable sleep API should either report all wake ups, no matter the reason, or none except for the timeout expiration itself. Any additional "reasonable" filtering is a loaded footgun because different API users have different ideas of what is "reasonable" for them.
Makes me wonder how many other Postgres processes might ignore SIGTERM under edge conditions. Do folks here test signal handling during failovers or replica maintenance? Seems like something worth adding to chaos tests.
Timeout in sysvinit and service files, for a graceful exit, is typically 900 seconds.
Most modern DBMS daemons will recover if SIGKILL, most of the time, especially if you're using transactions. But startup will then be lagged, as it churns through and resolves on startup.
(unless you've modified the code to short cut things, hello booking.com "we're smarter than you" in 2015 at least, if not today)
The "SIGKILL or keep waiting?" decision comes down to whether you believe the DBMS is actually making progress towards shutdown. Proving that the shutdown is progressing can be difficult, depending on the situation.
Sure, it's more like fuzzing then, but running in production on systems that are willing to send in big reports for logged watchdog alerts would give nearly free monitoring of very diverse and realistic workloads to hopefully cover at least relevant-in-practice codepaths with many samples before a signal ever happens to those more-rare but still relevant code paths...
The Linux kernel works the same way.
What are other big open source projects using if they are not using mailing lists?
I could imagine some using GitHub but IME it's way less efficient than mailing list. (If I were a regular contributor to a big project using GitHub, me being me, I would probably look for or write myself a GitHub to e-mail gateway).
I can understand not wanting to use GitHub/GitLab/etc. for various reasons. But I don't understand how usability vs mailing lists is one.
How is a set of 9+ mailing lists any better? It has significantly worse discovery and search tools unless you download all the archives. So you're creating a hurdle for people there already.
Then you have people use all kinds of custom formatting in their email clients, so consistent readability is out the window.
People will keep top-posting (TOFU), transforming the inconsistent styles in the process. Creating an unnecessarily complicated problem for your email client to "detect quotes", or you have to keep reminding people.
Enforcing structure of any kind in email lists seems so tedious. I'm not advocating for bugzilla style "file out these 20 nonsensical fields before you can report anything" but some minimal structure enforced by some tooling as opposed to manual moderation seems very helpful to me.
Call me old fashioned but there is no better way of discussing stuff online in an asynchronous way (OTOH video calls are better, but face to face doesn't scale) than a threaded medium. We don't have a better tool to manage threaded discussions than what mail clients (or Usenet clients but they are almost identical in handling).
Linear discussion forums (which GitHub issues are) are just inferior.
This isn't the only place Postgres can act like this, though. I've seen similar behavior when a foreign data wrapper times out or loses connection, and had to resort to either using kill -9 or attaching to the process using a debugger and closing the socket, which oddly enough also worked.
Might be worth generalizing this approach to also handle that kind of failure
The Postgres team needs to go back to the drawing board and avoid polling altogether and more importantly have event-listener based approaches for primary and replicas separately
It's great to see ClickHouse contributing to Postgres though. Cross-pollination between two large database communities can have a multiplying effect.
This isn't hit in ongoing replication, this is when creating the resources for a new replica on the publisher. And even there the poll based thing is only hit due to a function being used in a context that wasn't possible at the time it was being written (waiting for transactions to complete in a hot standby).
> The Postgres team needs to go back to the drawing board and avoid polling altogether and more importantly have event-listener based approaches for primary and replicas separately
That's, gasp, actually how it already works.
Right?
Armchair software engineers are so incredibly annoying.
Postgres codebase is a shining example of good engineering. They walk a very fine line between many drawbacks. Code is battle tested (and surprisingly readable). Not many people (or groups) can do much better - and if they think they can, they are probably going to learn a lesson in humility.
To be fair, we have/PG has plenty crud-y code... Some of it even written by yours truly :(. Including some of the code involved here.
But just saying "go back to the drawing board" without really understanding the problem isn't particularly useful.
As mentioned, there should be a read-replica specific code that works using event listeners. Repurposing the primary's code is what causes the issue.
The following pseudocode would give a better picture on the next steps
// Read replica specific implementation
void StandbyWaitForTransactionCompletion(TransactionId xid) {
// Register to receive WAL notifications about this transaction RegisterForWALNotification(xid);
while (TransactionIdIsInProgress(xid)) {
// Wait for WAL record indicating transaction completion
WaitForWALEvent();
CHECK_FOR_INTERRUPTS();
// Process any incoming WAL records
ProcessIncomingWAL();
}
UnregisterForWALNotification(xid);
}
On the other hand "quirks like these" are the reason you should not update too often mission critical systems.
If you think leaking memory or handles is bad, watch what happens when a turbo starts leaking oil in to the intake tract of a diesel engine. It's exciting, as long as you're not paying for it or in the shrapnel zone.