Don't scale everything in units of rem; we want "zoom text only" to continue to do what it says on the tin. Only scale the things that need to be text-sized.
Is there a reason for certain CSS frameworks like Tailwind to scale absolutely everything in units of rem and ignore px like the plague? Or is that just too far?
Incompetence, and thus obedience to a mistakenly-generalised caricature of "good practice", is my guess. I used to use em everywhere for exactly this reason. (The only reason I didn't shift to "rem everywhere", "flexbox everywhere", "web fonts everywhere", etc. was obedience to some different caricature of "good practice".)
I've never seen a CSS framework that's good. (Unless you count https://simplecss.org/, but that's of very narrow usefulness.) They're all made by the kinds of people who don't understand web technologies, and seek to hide them behind a layer of abstraction so they don't have to think about them any more. Unfortunately for them, there's a reason the web's how it is. Some of it's backwards-compatibility, but most of it is that a large group of very clever people failed to find a way to make it any simpler without breaking something important; and so, the framework people tend to break important things.
If you find yourself reaching for something like Tailwind or Bootstrap, just use inline style attributes. It's easier for everyone involved.
If you need any of those things, you should be using a proper stylesheet. Use <style scoped='scoped'> in the <body>, if you really must.
A crude "not that different from writing inline styles" classy framework just makes it harder to refactor your CSS in future, because it's all tucked away under incomprehensibly-named CSS classes, and there are otherwise-useless <div>s all through the HTML.
Everyone jumped on the only-use-rem train when mobile started dominating because it provides relative units that scale better rather than absolute pixels.
In practice it doesn't really mean much outside of blog-like text so when doing layouts it's more of just a standard everyone agrees on using everywhere because it's easier to stick to one for consistencies sake and it does the job. I don't think many frontend devs ever test zooming text using a browser anyway, their main concern is variations in screen sizes, basically device variation not manually zooming text. But I get the arguments to limit it to text.