Pathfinder uses horizontal SIMD everywhere (e.g. vec4 stores x/y/z/w in a single...

Pathfinder uses horizontal SIMD everywhere (e.g. vec4 stores x/y/z/w in a single SSE register). I measured about a 10% improvement across the board when enabling SSE+SSE2+SSSE3+SSE4.2+AVX vs. using scalar instructions only. The primary wins were from operations like de Casteljau curve subdivision which are very efficient when expressed in vector operations.

So, at least in this particular application, horizontal SIMD didn't make an enormous difference, but a 10% boost is nice.