I'm guessing it's the right PyTorch and FlashAttention and TransformerEngine and xformers and all that for the machine you're on without a bunch of ninja-built CUDA capability pain.
They explicitly mention PyTorch in the blog post. That's where the big money in Python is, and that's where PyPI utterly fails.
They explicitly mention PyTorch in the blog post. That's where the big money in Python is, and that's where PyPI utterly fails.