Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This might be a stupid question, but why isn't zeroing 8KB of memory a single instruction? It must be so common as to be worthy that all the layers of memory (and indirection) to understand that.


If the memory is above the size of a page, you can tell the VM to drop the page and give you a new zero filled one instead.


For 8kb? Syscalling in to the kernel, updating the processes’s memory map and then later faulting is probably slower by an order of magnitude or more compared to just setting those bytes to zero.

Memcpy, bzero and friends are insanely fast. Practically free when those bytes are in the cpu’s cache already.


So don't syscall. Darwin has a system similar to io_uring for this.

(But it also has a 16KB page size.)


Probably still cause a page fault when the memory is re-accessed though. I suspect even using io_uring will still be a lot slower than bzero if you're just zeroing out 2 pages of memory. Zeroing memory is really fast.


128-bit or 256-bit memsets via SIMD instructions are sufficient to saturate RAM bandwidth, so there wouldn't be much of a gain from having a dedicated instruction.

(By the way, x86 does have a dedicated instruction--rep stosb--but compilers differ as to how often they use it, for the reason cited above.)


Supposedly rep movsb is faster than SIMD stores on very recent chips, for cases where you aren't actually hitting RAM with all your writes.


The gain is in power efficiency.

Arm64 provides `dc zva` for this.


Zeroing something that large is not typical. That said, some architectures have optimized zeroing instructions, such as dc zva on ARM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: