The cache line latency is strongly associated with the number of hash call repetitions. At 1, 2, 3, 4 repeats, the latency is not present in the timestamp counter. From 5 -> 9 repeats, the latency builds, adding 2 cycles of latency each repeat step. 9 is the max latency added, for a total of 10 additional cycles. I can make the latency go away with a memory fence added after each repetition, but the total number of cycles added is about 50.
Graphs:
1. The original, with repeat at 32
2. The repeats, 1 -> 9
3. The repeats + memory fence, 1 -> 9
Graphs:
https://gist.github.com/injinj/138543ccc6a23ceb1fcdc05f46288...