Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> write their stuff in PTX

I wonder if you vould you point me to concrete examples where people write PTX rather than CUDA? I'm asking because I just learned CUDA since it's so much faster than Python!



Here's a rather trivial example of using PTX: https://docs.nvidia.com/cuda/parallel-thread-execution/#spec...

For various micro-bench reasons I wanted to use a global clock instead of an SM-local one, and I believe this was needed. Also note that even CUDA has "lower level"-like operations, e.g. warp primitives. PTX itself is super easy to embed in it like asm.


There isn't a lot of easily accessible examples outside of the corporate world.

Open source authors typically shy away from Nvidia closed source APIs, and PTX is tied to how Nvidia hardware works, so you won't see it implemented for other hardware.

To do what Deepseek did, but didn't want to waste your time and money with Nvidia, you'd use Vulkan. Theres more Vulkan in the world than CUDA.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: