Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

128-bit or 256-bit memsets via SIMD instructions are sufficient to saturate RAM bandwidth, so there wouldn't be much of a gain from having a dedicated instruction.

(By the way, x86 does have a dedicated instruction--rep stosb--but compilers differ as to how often they use it, for the reason cited above.)



Supposedly rep movsb is faster than SIMD stores on very recent chips, for cases where you aren't actually hitting RAM with all your writes.


The gain is in power efficiency.

Arm64 provides `dc zva` for this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: