Sure, but I'm not sure if that is what the parent poster was saying (that nvcc g...

saagarjha · on July 17, 2024

Yes. Take a look at, say, CUTLASS: you'll see that they use PTX instructions because there are no intrinsics, much less automatic compiler lowering, for the accelerators they target.

HarHarVeryFunny · on July 17, 2024

Yes, but that's an NVIDIA project, so would be expected to be hand optimized, same as their cuDNN kernels.

I'm more curious about what types of model people in research or industry are developing, where NVIDIA support such as this is not enough, and they are developing their own PTX kernels.