Haha, what a shoddy headline. "Bypasses" and "industry-standard" have no place h...

solidasparagus · on Jan 29, 2025

In what world is CUDA not an industry standard?

sigmoid10 · on Jan 29, 2025

You can ignore it, the commenter clearly has no idea what they are talking about. PTX is literally the instruction set that Cuda, Vulcan and OpenGL compile to on Nvidia cards in the end. It's assembly for GPUs. And it's infinitely harder to work with. Go to an average technical university and you'll probably find quite a few people who can write Cuda (or OpenGL or Vulcan for that matter). But it would be very surprising if you can find even a single person that can comfortably write PTX.

DiabloD3 · on Jan 29, 2025

"Compile to" isn't exactly the correct phrase either.

PTX is not the IL used by Nvidia's drivers, but does compile directly to it with less slop involved. If you had said "PTX's instructions are analogous to writing assembly for CPUs or any other GPUs (ala Clang's AMDGPU target)", that would have probably been the better way.

Arguably, PTX is closer to being the SPIR-V part of their stack (more than just an assembler compiler, but similar in concept). None of Nvidia's tools really ever line up with good analogies with the outside world, the curse of Nvidia's NIH syndrome.

Generally, you're not going to be writing all of your code in PTX, but I find it wild you think people going to "an average technical university" would be unable to use it for the parts they need it for. That says more about you than it does them.

All of Nvidia's docs for this are online, it isn't that hard. Have you tried?

sigmoid10 · on Jan 31, 2025

>PTX's instructions are analogous to writing assembly for CPUs

How else would you have understood it? At this level it's literally just pedantics. In the same way you can say C doesn't technically compile to assembly for CPUs. The point is that it's the lower abstraction level that is still (more or less) human readable. But just like in CUDA, you may want to write parts of your code in it if you want to benefit from things that the higher level language doesn't expose. The terminology might seem different, but in practice it is pretty analogous.

DiabloD3 · on Feb 1, 2025

With assembly for a CPU, the instructions you're writing are verbatim being sent (compiled, obviously) to the CPU.

With PTX, you are not writing in the native assembly of the Nvidia GPU, but yet another abstraction, just one more similar in nature.

bxtt · on Jan 29, 2025

This is somewhat untrue as well. HFT because constrained similarly have to optimize on this level akin to HFT crypto doing optimizations not within solidity, nor yul but on opcode in huff. That’s the issue with these big tech companies. Just endless budget and throw bad code into larger distributed clusters to overcompensate.

orf · on Jan 29, 2025

CUDA is obviously and clearly the industry standard…

Cladode · on Jan 29, 2025

> write their stuff in PTX

I wonder if you vould you point me to concrete examples where people write PTX rather than CUDA? I'm asking because I just learned CUDA since it's so much faster than Python!

winwang · on Jan 31, 2025

Here's a rather trivial example of using PTX: https://docs.nvidia.com/cuda/parallel-thread-execution/#spec...

For various micro-bench reasons I wanted to use a global clock instead of an SM-local one, and I believe this was needed. Also note that even CUDA has "lower level"-like operations, e.g. warp primitives. PTX itself is super easy to embed in it like asm.

DiabloD3 · on Jan 29, 2025

There isn't a lot of easily accessible examples outside of the corporate world.

Open source authors typically shy away from Nvidia closed source APIs, and PTX is tied to how Nvidia hardware works, so you won't see it implemented for other hardware.

To do what Deepseek did, but didn't want to waste your time and money with Nvidia, you'd use Vulkan. Theres more Vulkan in the world than CUDA.

shihab · on Jan 29, 2025

I had the impression that gpu isn’t a good fit for ultra low latency usecases. Can you please elaborate on what sort of work hft firms do with gpu?

jakewins · on Jan 29, 2025

Not in HFT, but I guess maybe for being very fast running optimization solvers and forecast models etc? Essentially compute models for ultimately driving market decisions based on lots of input data

We do a lot of forecasting and solvers where I am, just run them on CPUs though.. but maybe if you’re wanting to compete on speed you would?

sterlind · on Jan 29, 2025

Optimization solvers usually don't benefit from GPUs. I think it's because it's sparse matrices and a sequential series of pivots.

aleph_minus_one · on Jan 29, 2025

> Optimization solvers usually don't benefit from GPUs. I think it's because it's sparse matrices and a sequential series of pivots.

This depends a lot on the problem and the algorithm that is used. For example interior point methods are clearly better suited to be running on GPUs than the primal or dual simplex algorithm.

eichi · on Jan 29, 2025

I like this kind of geeky poetical comment while I'm not certain whether it is true or not.