Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why does it take 60 seconds to load data from RAM to VRAM? Shouldn't the PCIE bandwidth allow it to fully load it in a few seconds?


Because ML infra is bloatware beyond belief.

If it was engineered right, it would take:

- transfer model weights from NVMe drive/RAM to GPU via PCIe

- upload tiny precompiled code to GPU

- run it with tiny CPU host code

But what you get instead is gigabytes of PyTorch + Nvidia docker container bloatware (hi Nvidia NeMo) that takes forever to start.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: