This might be a stupid question, but why isn't zeroing 8KB of memory a single instruction? It must be so common as to be worthy that all the layers of memory (and indirection) to understand that.
For 8kb? Syscalling in to the kernel, updating the processes’s memory map and then later faulting is probably slower by an order of magnitude or more compared to just setting those bytes to zero.
Memcpy, bzero and friends are insanely fast. Practically free when those bytes are in the cpu’s cache already.
Probably still cause a page fault when the memory is re-accessed though. I suspect even using io_uring will still be a lot slower than bzero if you're just zeroing out 2 pages of memory. Zeroing memory is really fast.
128-bit or 256-bit memsets via SIMD instructions are sufficient to saturate RAM bandwidth, so there wouldn't be much of a gain from having a dedicated instruction.
(By the way, x86 does have a dedicated instruction--rep stosb--but compilers differ as to how often they use it, for the reason cited above.)