I mean I get that, what I don't get is why screwing around that way is not explicitly defined as "don't do this, here's why it isn't supported and here's what you should try to do instead; if you ignore all that and still break something, you get to keep both its pieces" in the documentation, rather than trying to work around it with some kind of global reset.
Why should the engine burden itself with an undo stack (even if only one frame deep) for every client, on behalf of a minority so incoherently implemented that it can't keep straight what it's doing across the span of two (application and ORM library) codebases? A metaphor already exists in the interface for getting a fresh connection: you do this by creating a fresh connection. If you want some other way to accomplish the same task, it's fair to require justification for the effort of providing that second option, and no such justification is presented in the article under discussion.
I don't like ORMs, I think it's long obvious. But to be excruciatingly clear, I grant them the same courtesy as any other library in that they absolutely may declare things explicitly out of scope. Seeing that not done, in a case where the consumer is pretty explicitly screwing around with internals of a database connection belonging to the library, is what's surprising, even in the context of ORMs as a category, where maximalism seems so constantly the order of the day.
It has nothing to do with ORMs. It's the same issue if you don't use an ORM. It's about connection pooling, as I explained.
Postgres has no native awareness of connection pooling, and so you need to issue certain reset commands to ensure that the connection is "clean" for the next piece of code that claims it. It should be possible to do things like "SET SESSION statement_timeout" without worrying about who the next user of the connection is.
Right, again, I get that, and the database libraries that I use tend to warn in their documentation about what you should and shouldn't do with connections belonging to a pool, cf. node-postgres docs warning about pooled connections and transactions [1]. The unit of connection pooling is the pool not the connection, etc, that's all fine.
What confuses me is that the Active Record maintainer should argue for modifications to several database engines, so this poor behavior - namely, treating pooled connections as if distinct - can be safe. Or, better said, I can see several justifications both technical and social for such an argument, none of which compels me.
It's a philosophical difference, I suspect. Users who remove a cover labeled "no user-serviceable parts inside" and end up sorry they did so, in my view, learn a valuable lesson about what such covers and labels are for. Others take a kinder or less grandmotherly view. Fair enough; it's a big enough world for everyone, last I checked.
(That said, it is worth noting that the only way any ORM could actually protect itself from this misuse would be by reimplementing enough of each of its supported databases' command parsers to recognize and reject session state mutations. Obviously no one would be mad enough to do so, but from a perspective of what we could call software design or architecture or craftsmanship or even taste, this kind of maximalism in necessary implication could and in my view should be taken to indict the entire paradigm: 'Why worry about the lesion? It's only pre-cancerous.')
Cleaning up the connection removes a serious foot gun that you may not consider when writing code. For example, I work on an application that sets statement_timeout defensively to avoid accidentally holding onto locks for too long. This used SET on a pooled connection, causing a bug where later connections inherited the timeout.
I have no particular opinions about ActiveRecord, which I don't use, but in my opinion all poolers ought to do this. The pooler should hand out clean connections so that you can treat them as distinct.
It's about separation of concerns: A pooled connection should behave like a distinct connection, otherwise every single connection has to worry about getting an "unclean" connection.
Sure, Postgres offers SET LOCAL, which reverts the mutated state on commit/rollback, but that requires a transaction, which is not always desirable. For example, let's say you are doing many UPDATEs in batches. You can do this:
SET SESSION ...
UPDATE ...;
UPDATE ...;
-- etc.
As opposed to:
BEGIN; SET LOCAL SESSION ...; UPDATE; COMMIT;
BEGIN; SET LOCAL SESSION ...; UPDATE; COMMIT;
BEGIN; SET LOCAL SESSION ...; UPDATE; COMMIT;
-- etc.
This saves three roundtrips per batch, which can significantly increased throughput. Of course you can save roundtrips by bundling several statements in a single statements string separated by semicolons, but that's painful to work with and doesn't work with prepared statements or SELECTs.
And sure, you could yourself call RESET ALL before releasing a connection, but if the pool always does this for you, you've solved the problem everywhere and don't need to think about it ever again. (In the apps I work on, we install a global "after release" hook to automatically do RESET ALL.)
Parsing statements isn't needed for this.
You keep mentioning ORMs, but again, it has nothing to do with them.
No, I agree it isn't ORM-specific. I mention it in that context only because for a library dev to take it on themself to try to save the library consumer from themself in this way, strikes me as a good example of the philosophical maximalism - the do-everything, Swiss-army-knife attitude that tries ultimately
to abstract away everything about the underlying datastore - that seems to me to characterize the fundamental perspective taken by ORM developers and maintainers.
I also mention it in this context because an ORM multiplies the problem. Everything you describe is indeed very easy in Postgres, but what about the other engines Active Record (or Sequelize, SQLAlchemy, etc) has to support? Now we face a choice between violating our engine-independent abstraction, and declining to offer a useful capability even where it would be no effort - a dilemma entirely imposed, in my view, by the same philosophical maximalism I've been decrying, and one impossible of satisfactory solution: I have long experience of ORMs preferring both solutions, and both are frankly lousy.
We also don't agree on pooled connection semantics, in that I'm okay with expecting callers to know what is and isn't safe to do with that abstraction, but that's a separate issue and not very interesting to me; I'm a grownup, as long as we're using a decent RDBMS and the same convention everywhere, it's fine. Really, I'm just here to grind my axe about ORMs, in hopes someone in their twenties will discover there is a rich and contentious history here, and there are worse problems to have than finding SQL's somewhat archaic syntax and uncommon rigor a little intimidating at first.
Why should the engine burden itself with an undo stack (even if only one frame deep) for every client, on behalf of a minority so incoherently implemented that it can't keep straight what it's doing across the span of two (application and ORM library) codebases? A metaphor already exists in the interface for getting a fresh connection: you do this by creating a fresh connection. If you want some other way to accomplish the same task, it's fair to require justification for the effort of providing that second option, and no such justification is presented in the article under discussion.
I don't like ORMs, I think it's long obvious. But to be excruciatingly clear, I grant them the same courtesy as any other library in that they absolutely may declare things explicitly out of scope. Seeing that not done, in a case where the consumer is pretty explicitly screwing around with internals of a database connection belonging to the library, is what's surprising, even in the context of ORMs as a category, where maximalism seems so constantly the order of the day.