Step 1) Write the function using high level abstractions
Step 2) Glance over the generated assembly and make sure that it vectorized the way you wanted.
> Glance over the generated assembly and make sure that it vectorized the way you wanted.
Isn't that sth you would also need to do in Fortran? IMO Julia makes this so easy with its `@code_*` macros and is one of the main reasons why I use it.
If you write Julia similar to Fortran, with explicit argument types and for loops and avoiding allocations it shouldn’t be too far off. Fortran IIRC has a few semantics which might make it more optimal in a few cases like aliasing
But indeed there are almost certainly less performance surprises in Fortran
This is the default workflow in every high-level language. Even if I’m writing explicit SIMD intrinsics in C targeting a specific processor, I still have to benchmark and maybe look at the assembly to make sure it did what I intended (or something better).
Step 1) Write the function using high level abstractions Step 2) Glance over the generated assembly and make sure that it vectorized the way you wanted.