Database Protocols Are Underwhelming(byroot.github.io)

114 pointsby PaulHoule3 months ago8 comments

tanelpoder3 months ago
The title says Database Protocols, but the author writes only about MySQL and Postgres. There are other databases out there, for example Oracle that I’m the most familiar with and it has had a break/reset mechanism built in to its protocol for at least 30 years [1]. Its a 2 step operation (2 roundtrips), one sending an error back to the client instead of the expected result and the second step will reset “current call” (not current session or connection) state on the server side and then request the client do the same.
Databases/protocols that differentiate between connection, session and call scopes, can reliably perform resets without requiring to tear down the entire connection.
Retrying is safe with no client side transaction idempotency tokens needed, as the error handling will roll back any changes done only by the failed call, not the entire transaction or session.
[1]: https://tanelpoder.com/2008/04/10/sqlnet-breakreset-to-clien...
Edit: added the comment about retry and idempotency.
- bob10293 months ago
  Works in SQL Server as well.
  In SSMS you can press Alt + Break to send an OOB attention message to the active TDS that will cancel the running command.
  https://learn.microsoft.com/en-us/openspecs/windows_protocol...
  I believe DB2 supports a similar technique.
atombender3 months ago
Postgres does have "reset" support.
There are several ways commands depending on what you want to reset. If you're only interested in mutable state changeable with SET, you can use RESET ALL instead. This resets configuration settings such as the time zone, statement timeout, and so on.
Postgres has different fine-grained ways to reset other aspects of a session, such as closing cursors (CLOSE ALL), unlistening from notifications (UNLISTEN), deallocating prepared plans (DEALLOCATE), and so on. Finally, there's DISCARD ALL, which resets everything.
I agree about error handling. Most clients just disconnect on a connection error.
One gotcha about Postgres queries is that running queries do not get cancelled when the client disconnects. This has always been a weird one that's quite dangerous, as it can lead to dog-piling — if a client starts a slow query, then times out or cancels its connection, the query lives on; then if another identical request starts, same thing happens, and soon you have 100 queries running that are doing the same thing and getting slower and slower. You have to be very careful with retrying.
- anarazel3 months ago
  > One gotcha about Postgres queries is that running queries do not get cancelled when the client disconnects
  You can configure that these days, at least if the server is running on some common platforms:
  https://www.postgresql.org/docs/current/runtime-config-conne...
  - atombender3 months ago
    Thank you, that's the one. I kept looking for the option and couldn't find it, and ChatGPT confidently hallucinated an option that doesn't exist.
- klysm3 months ago
  I can see the argument for letting a query finish if it is doing any kind of mutation. For a read-only query though, surely it would be a good addition in postgres to cancel the query? Cancelling things can be quite difficult though in practice.
  - atombender3 months ago
    My understanding is that the difficulty is knowing the client disconnected. Postgres doesn't know until it tries to write the result to the stream. That's the core team's explanation in this [1] ancient thread, at least.
    I don't know why Postgres can't write keepalive messages while executing the query; speculating here, but it's possible that the architecture is synchronous and doesn't support doing anything with the connection while the query is executing. It's an old threading model where one process is started per connection.
    I could have sworn I had read about a new configuration option introduced in Postgres 14, but I can't find anything about it.
    [1] https://www.postgresql.org/message-id/e09785e00907280622v9b1...
    nextaccountic3 months ago
    > I don't know why Postgres can't write keepalive messages while executing the query
    Someone linked this
    https://www.postgresql.org/docs/current/runtime-config-conne...
    > client_connection_check_interval (integer)
    > Sets the time interval between optional checks that the client is still connected, while running queries. The check is performed by polling the socket, and allows long running queries to be aborted sooner if the kernel reports that the connection is closed.
    > This option relies on kernel events exposed by Linux, macOS, illumos and the BSD family of operating systems, and is not currently available on other systems.
    > If the value is specified without units, it is taken as milliseconds. The default value is 0, which disables connection checks. Without connection checks, the server will detect the loss of the connection only at the next interaction with the socket, when it waits for, receives or sends data.
    > For the kernel itself to detect lost TCP connections reliably and within a known timeframe in all scenarios including network failure, it may also be necessary to adjust the TCP keepalive settings of the operating system, or the tcp_keepalives_idle, tcp_keepalives_interval and tcp_keepalives_count settings of PostgreSQL.
    So it appears that they can now indeed use tcp keepalive but only if you configure it
    atombender3 months ago
    Indeed, that was added in Postgres 14! (I was looking for that option earlier and couldn't find it, and ChatGPT confidently hallucinated an option that doesn't exist, so I gave up.)
    anarazel3 months ago
    Link to the option in https://news.ycombinator.com/item?id=43598217
  - rini173 months ago
    If you're doing mutations, you're supposed to explicitly use transactions anyway, with well defined logical units of work. "DELETE FROM table LIMIT 10" does not look like anything well defined.
    In case of connection issues you can check whether it was all actually committed or rolled back. Often it was propagated to the user, they got error message, refreshed and checked and retried and all was okay in the end. That's probably why the DB engines did not bother micromanaging individual SQL statemens.
  - marcosdumay3 months ago
    The database must stay correct if you cut power to the machine instead of disconnecting the client, so I don't see how mutation would make it impossible to drop partial work.
kiitos3 months ago
> In both cases the idea is the same, we’re configuring the connection, making it behave differently.
This is the impedance mismatch. The _connection_ exists at a layer beneath the database/application layer. You're not configuring the _connection_, you're configuring the _session_! And the relationship between connections and sessions is one-to-many, not one-to-one.
majkinetor3 months ago
I am interested in "Idempotency Keys" that are mentioned in the article. Anybody using those? How are you saving them? What about responses? Are they cached or recalculated..
- shoo3 months ago
  Idempotency keys can also be known as request ids [1] or correlation ids. They can help in situations where you have a distributed system with an API provider and an API client and some operation needs to be performed "once-and-only-once". Since the network can drop requests or responses, the client must retry requests that appear to have failed, but perhaps might have succeeded. Since the client might retry an operation that actually succeeded and send multiple messages for the same operation, either the operation must be inherently idempotent, or if not, the API provider must be able to de-dupe requests.
  Adding an idempotency key is a way to allow the API provider to detect and de-dupe requests. It requires work and collaboration on the part of the API client and the API provider. The API client must generate a unique idempotency key for each logically different operation it attempts to execute, and it must retry requests for operations that appear to have failed using that fixed idempotency key. If the operation is not inherently idempotent, the API provider needs to implement some way of de-duping repeated requests - perhaps storing requests in a durable cache for N hours/days keyed by the idempotency key.
  One industry where these "once-and-only-once" execution guarantees are useful is banking. Customers expect their payments to be executed exactly once, so their accounts are debited once and not zero times or 2+ times.
  There's an interesting infoq podcast from 2018 with Jason Maude of Starling Bank [2] talking about how they design their bank systems for "once-and-only-once" execution. From memory, Jason decomposes the responsibilities as: API clients are responsible for ensuring operations execute at least once. API providers are responsible for ensuring operations execute at most once.
  [1] see e.g. google's API design guidelines for request identification: https://google.aip.dev/155
  [2] https://www.infoq.com/podcasts/cloud-based-banking-startup-j...
  - majkinetor3 months ago
    Just the info I needed, you are awesome.
    I am creating a bank currently, and I was asking this with double payments in mind.
frumplestlatz3 months ago
Most these issues seem specific to, and created by, the impedance mismatch between ORMs (like ActiveRecord) and databases.
To work correctly, application code must be aware of transactions, and be prepared to roll back state and safely retry a transaction on failure.
ORMs like ActiveRecord struggle to do this because they attempt to present a transactional database as a low-friction, in-memory, mutable, non-transactional data model that does not support rollback.
If you lose a connection to the database prior to successfully committing your transaction, this is just another form of transaction failure.
You need to discard any pending application state changes, open a new connection and transaction, requery data required to determine the actual current state, and then if necessary, retry your transaction.
ActiveRecord can’t do that, hence the ill-advised desire to be able to automatically retry statements on failure — which abandons database-mediated transaction isolation, and incorrectly shifts responsibility for database state preservation/management to the client.
If SQL was trivially idempotent, we wouldn’t need transactions or different levels of transaction isolation in the first place.
anarazel3 months ago
FWIW, you can parametrize statements on postgres without preparing them.
throwanem3 months ago
"I would like to share a prepared statement across database client sessions" is nearly the exact use case for a view, or some other possibly partially materialized decomposition. You are of course still going to use prepared statements in your clients to avoid naïve SQL injection, but that's always going to be true; by the time the engine has parsed a query sufficiently to validate it can be safely executed with given parameters, it has effectively been "prepared," so you're not actually getting a meaningful perf benefit there. Where you do see that benefit is, as almost always, in query complexity: if your statements cost so much to prepare and your sessions are so unreliable that this becomes a genuine problem, optimizing query preparation is very much the wrong place to start solving.
If you find yourself so frequently hitting socket timeouts attempting to execute database queries that you need to modify your ORM to work around this problem automatically, your problems likewise almost certainly begin at the schema layer. I'm not going to blame an ORM maintainer for having to deal with that concern in a way that avoids irritating library consumers, and I suppose I can see where it does no good arguing that one's own clients (said consumers) are using the database wrong in order to need such functionality.
I'd have thought to see perf considered to some extent beyond blaming the engine, but I suppose there is a kind of honesty in simply omitting any discussion of causes for these apparently so common "network errors," in cases where they are so common and thus must be so transparently retried that the entire existence of this behavior could be totally overlooked by the application developer nominally responsible for the database interaction.
(I'm curious why a library maintainer needs to assert responsibility over session variables at all times, but I suppose it makes sense from a perspective that one's users are not especially trustworthy, as I suppose most ORM maintainers and especially ActiveRecord's must conclude. I guess it's fair to say that connections needing to manipulate the session that granularly should exist outside the ORM entirely, but then we're back to the whole tiresome architectural question that's vexed the field for now nearly two decades since Neward's "Vietnam" paper: https://web.archive.org/web/20060703024451/http://blogs.tedn....)
- atombender3 months ago
  > I'm curious why a library maintainer needs to assert responsibility over session variables at all times
  When pooling connections, you need to reset the connection when putting it back in the pool. For example, if you grab a connection from the pool and do:
  SET SESSION statement_timeout to '1s'; SET timezone TO 'Europe/London';
  …and then release the connection, you don't want the next connection to inherit these settings.
  It has nothing to do with the ORM (although the ORM itself may want to set some baseline settings based on config), and everything to do with sharing connections between different parts of a single application.
  - throwanem3 months ago
    I mean I get that, what I don't get is why screwing around that way is not explicitly defined as "don't do this, here's why it isn't supported and here's what you should try to do instead; if you ignore all that and still break something, you get to keep both its pieces" in the documentation, rather than trying to work around it with some kind of global reset.
    Why should the engine burden itself with an undo stack (even if only one frame deep) for every client, on behalf of a minority so incoherently implemented that it can't keep straight what it's doing across the span of two (application and ORM library) codebases? A metaphor already exists in the interface for getting a fresh connection: you do this by creating a fresh connection. If you want some other way to accomplish the same task, it's fair to require justification for the effort of providing that second option, and no such justification is presented in the article under discussion.
    I don't like ORMs, I think it's long obvious. But to be excruciatingly clear, I grant them the same courtesy as any other library in that they absolutely may declare things explicitly out of scope. Seeing that not done, in a case where the consumer is pretty explicitly screwing around with internals of a database connection belonging to the library, is what's surprising, even in the context of ORMs as a category, where maximalism seems so constantly the order of the day.
    atombender3 months ago
    It has nothing to do with ORMs. It's the same issue if you don't use an ORM. It's about connection pooling, as I explained.
    Postgres has no native awareness of connection pooling, and so you need to issue certain reset commands to ensure that the connection is "clean" for the next piece of code that claims it. It should be possible to do things like "SET SESSION statement_timeout" without worrying about who the next user of the connection is.
    throwanem3 months ago
    Right, again, I get that, and the database libraries that I use tend to warn in their documentation about what you should and shouldn't do with connections belonging to a pool, cf. node-postgres docs warning about pooled connections and transactions [1]. The unit of connection pooling is the pool not the connection, etc, that's all fine.
    What confuses me is that the Active Record maintainer should argue for modifications to several database engines, so this poor behavior - namely, treating pooled connections as if distinct - can be safe. Or, better said, I can see several justifications both technical and social for such an argument, none of which compels me.
    It's a philosophical difference, I suspect. Users who remove a cover labeled "no user-serviceable parts inside" and end up sorry they did so, in my view, learn a valuable lesson about what such covers and labels are for. Others take a kinder or less grandmotherly view. Fair enough; it's a big enough world for everyone, last I checked.
    (That said, it is worth noting that the only way any ORM could actually protect itself from this misuse would be by reimplementing enough of each of its supported databases' command parsers to recognize and reject session state mutations. Obviously no one would be mad enough to do so, but from a perspective of what we could call software design or architecture or craftsmanship or even taste, this kind of maximalism in necessary implication could and in my view should be taken to indict the entire paradigm: 'Why worry about the lesion? It's only pre-cancerous.')
    [1] https://node-postgres.com/apis/pool
    atombender3 months ago
    Cleaning up the connection removes a serious foot gun that you may not consider when writing code. For example, I work on an application that sets statement_timeout defensively to avoid accidentally holding onto locks for too long. This used SET on a pooled connection, causing a bug where later connections inherited the timeout.
    I have no particular opinions about ActiveRecord, which I don't use, but in my opinion all poolers ought to do this. The pooler should hand out clean connections so that you can treat them as distinct.
    It's about separation of concerns: A pooled connection should behave like a distinct connection, otherwise every single connection has to worry about getting an "unclean" connection.
    Sure, Postgres offers SET LOCAL, which reverts the mutated state on commit/rollback, but that requires a transaction, which is not always desirable. For example, let's say you are doing many UPDATEs in batches. You can do this:
    SET SESSION ... UPDATE ...; UPDATE ...; -- etc.
    As opposed to:
    BEGIN; SET LOCAL SESSION ...; UPDATE; COMMIT; BEGIN; SET LOCAL SESSION ...; UPDATE; COMMIT; BEGIN; SET LOCAL SESSION ...; UPDATE; COMMIT; -- etc.
    This saves three roundtrips per batch, which can significantly increased throughput. Of course you can save roundtrips by bundling several statements in a single statements string separated by semicolons, but that's painful to work with and doesn't work with prepared statements or SELECTs.
    And sure, you could yourself call RESET ALL before releasing a connection, but if the pool always does this for you, you've solved the problem everywhere and don't need to think about it ever again. (In the apps I work on, we install a global "after release" hook to automatically do RESET ALL.)
    Parsing statements isn't needed for this.
    You keep mentioning ORMs, but again, it has nothing to do with them.
    throwanem3 months ago
    No, I agree it isn't ORM-specific. I mention it in that context only because for a library dev to take it on themself to try to save the library consumer from themself in this way, strikes me as a good example of the philosophical maximalism - the do-everything, Swiss-army-knife attitude that tries ultimately to abstract away everything about the underlying datastore - that seems to me to characterize the fundamental perspective taken by ORM developers and maintainers.
    I also mention it in this context because an ORM multiplies the problem. Everything you describe is indeed very easy in Postgres, but what about the other engines Active Record (or Sequelize, SQLAlchemy, etc) has to support? Now we face a choice between violating our engine-independent abstraction, and declining to offer a useful capability even where it would be no effort - a dilemma entirely imposed, in my view, by the same philosophical maximalism I've been decrying, and one impossible of satisfactory solution: I have long experience of ORMs preferring both solutions, and both are frankly lousy.
    We also don't agree on pooled connection semantics, in that I'm okay with expecting callers to know what is and isn't safe to do with that abstraction, but that's a separate issue and not very interesting to me; I'm a grownup, as long as we're using a decent RDBMS and the same convention everywhere, it's fine. Really, I'm just here to grind my axe about ORMs, in hopes someone in their twenties will discover there is a rich and contentious history here, and there are worse problems to have than finding SQL's somewhat archaic syntax and uncommon rigor a little intimidating at first.
- hobs3 months ago
  This very much depends on the SQL engine you are talking about - many early sql engines literally compiled stored procedures and didn't allow the dynamism you imply - some still offer such features.
  Some SQL engines are more sensitive (due to caching plans in the first place or not) to this problem as well - SQL Server famously utilizes parameter sniffing for performance, which has positive implications of skipping work, and the negative of skipping work you might need to do.
  - throwanem3 months ago
    A stored procedure and a prepared statement aren't the same, though. I'm not sure how persuasive an argument from how one is optimized is meant to be for the other.
    My experience has long been that in almost every case where a "database performance" problem occurs while an ORM is in use, presumptively refactoring to eliminate the ORM almost immediately reveals the buried n+1 or other trivially pathological pattern which, if the ORM's implementor knows to try to avoid it, the ORM's interface typically is not sufficiently expressive to disambiguate in any case. (Fair do's: In the uncommon case where that interface is so expressive, one does usually find the implementation accounts for the case.)
    Hence my earlier reference to the "Vietnam" paper, which dissects in extensive (if unfortunately mostly fruitless) detail the sunk cost fallacy at the heart of this problem.
    throwanem3 months ago
    (Belatedly revisiting after the edit window to note, when I say 'refactoring to eliminate...' above, I should be clear I mean experimentally eliminating ORM calls for the poorly performing query and not across an entire codebase, or any other such ocean-boiling nonsense. In a sane world this would all be reasonably implicit, but we live here, so.)
sgarland3 months ago
Some of this doesn’t make sense. If you have common settings that you always want enabled, like sql_mode, then set it on the server’s config, and stop wasting time doing so on every session.
If you do need some session-specific variables, use a connection pooler (you should anyway) that handles that. ProxySQL, for one, tracks the per-session variables that are set, so when a client requests a connection, it hands it one already set to its requested specification (if it’s available).
> The reason this is important is that whenever the write or the read fails, in the overwhelming majority of cases, you don’t know whether the query was executed on the server or not.
Yes, you definitely can know this in almost all cases. Did you receive a result? It succeeded. Did you receive a warning? It succeeded, but maybe read the warning. Did you receive an error? It did not succeed. Parse the error message to determine what went wrong, and if a retry makes sense. Did you receive nothing? Now, and only now, do you not know the state.
It blows my mind the number of devs who haven’t read the API specification for their RDBMS. There is a ton of useful information you can glean that’s often ignored.
> These statements also have a cost on the server side. Each statement requires some amount of memory in the database server. So you have to be careful not to create an unbounded amount of them, which for an ORM isn’t easy to enforce.
Which is why you can limit them. MySQL has a default of 16382; not sure about Postgres.
Also, to the complaint about them being session-scoped, this is yet another reason to use a connection pooler. They can often share them across connections. I know PgBouncer can.
Read. Application. Manuals.