This might be a stupid question, but why isn't zeroing 8KB of memory a single in...

astrange · on June 15, 2024

If the memory is above the size of a page, you can tell the VM to drop the page and give you a new zero filled one instead.

josephg · on June 15, 2024

For 8kb? Syscalling in to the kernel, updating the processes’s memory map and then later faulting is probably slower by an order of magnitude or more compared to just setting those bytes to zero.

Memcpy, bzero and friends are insanely fast. Practically free when those bytes are in the cpu’s cache already.

astrange · on June 15, 2024

So don't syscall. Darwin has a system similar to io_uring for this.

(But it also has a 16KB page size.)

josephg · on June 16, 2024

Probably still cause a page fault when the memory is re-accessed though. I suspect even using io_uring will still be a lot slower than bzero if you're just zeroing out 2 pages of memory. Zeroing memory is really fast.

pcwalton · on June 15, 2024

128-bit or 256-bit memsets via SIMD instructions are sufficient to saturate RAM bandwidth, so there wouldn't be much of a gain from having a dedicated instruction.

(By the way, x86 does have a dedicated instruction--rep stosb--but compilers differ as to how often they use it, for the reason cited above.)

anonymoushn · on June 16, 2024

Supposedly rep movsb is faster than SIMD stores on very recent chips, for cases where you aren't actually hitting RAM with all your writes.

tubs · on June 16, 2024

The gain is in power efficiency.

Arm64 provides `dc zva` for this.

saagarjha · on June 15, 2024

Zeroing something that large is not typical. That said, some architectures have optimized zeroing instructions, such as dc zva on ARM.