Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> This complicated decoding makes a pipeline longer. This means that in case of branch misprediction there would be a large penalty.

PDEP / PEXT are single-clock tick instructions and are far more complex than what I'm proposing here. As is AESRound.

I think you're underestimating the number of gates you can put in parallel and execute in a single stage of the pipeline. 64-bit PDEP / PEXT are more complicated than say... a 64-byte parallel adder in terms of depth. (PDEP / PEXT need both a butterfly circuit forward + inverse butterfly back + a decoder in parallel. 64-byte prefix sum is just one butterfly forward).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: