Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m surprised so much branching isn’t more costly.


Branch predictors have gotten really good and it often now makes more sense to rely on it rather than working away the branches.

For example, modern compilers will very rarely introduce conditional moves (cmov) x86 because they are nearly always slower than simply branching. It might be counter intuitive, but a branch prediction breaks the dependencies of the micro-ops between the conditional and the clause. So if your cmov's conditional depends on a load, you need to wait for that load complete before it can execute.

Always benchmark with at-scale data and measure.


> For example, modern compilers will very rarely introduce conditional moves

For conditionally-selected data that lives in registers (and occasionally, on the stack), GCC seems to always use cmov (as it is much cheaper than a branch with possibly p=0.5 after all)

You do have a very good point about data dependencies.

Here a Aarch32 vs x86 vs Aarch64 vs x64 comparison: https://godbolt.org/z/hEac4sz7h

- for "c ? a : b" (where a, b, c are func args), all 4 versions use their version of cmov

- for "c ? *a : *b", x64 version uses cmov on the address whereas Aarch64 uses a "full" branch

- Aarch32 always use conditional instructions in these 2 expressions, and additionally, "a * (b & 1)" gets optimized into "a & ((b & 1) ? ~0 : (b & 1) /* = 0 */)"


Depends on the branch predictor: correct branch, everything’s loaded and set. Wrong branch: flush it all and load again.

If you know the branch predictor algorithm you can optimise for it.

Edit: it’s on p27




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: