You have five bodies and 16 sse registers. The entire state of your simulation c... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		petermcneeley on April 25, 2020 \| parent \| context \| favorite \| on: I translated a simple C program to x86_64 and it w... You have five bodies and 16 sse registers. The entire state of your simulation can fit into register space and you dont need to ever access memory during the stepping part of your code. You can loop unroll all gravity interactions so you end up with one large branchless memoryless block of code. Now that its completely inline you can rearrange your dependencies based on expected latency and throughput of operations (https://software.intel.com/sites/landingpage/IntrinsicsGuide...) Then after that you can merge the operations where you can. (for SSE4 at most you are going to get is 2x because you are using doubles) You may think the full inlining is cheating but the compiler has the same information as your bodies list is entirely constant. (since your dt and your masses are constant they can also potentially be folded).

sesuximo on April 25, 2020 | [–]

Full inlining might be slower if the branch is easy to predict

chj on April 26, 2020 | [–]

This. You should avoid memory access and branch like plague.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact