Why does it take 60 seconds to load data from RAM to VRAM? Shouldn't the PCIE ba...

throw_me_uwu · 2025-10-21T18:21:31 1761070891

Because ML infra is bloatware beyond belief.

If it was engineered right, it would take:

- transfer model weights from NVMe drive/RAM to GPU via PCIe

- upload tiny precompiled code to GPU

- run it with tiny CPU host code

But what you get instead is gigabytes of PyTorch + Nvidia docker container bloatware (hi Nvidia NeMo) that takes forever to start.