I knew the model would have difficulty fitting into a 16GB VRAM GPU, but "you need to load and unload parts of the model pipeline to/from the GPU" is not a workaround I expected.
At that point it's probably better to write a guide on how to set up a VM with a A100 easily instead of trying to fit it into a Colab GPU.
At that point it's probably better to write a guide on how to set up a VM with a A100 easily instead of trying to fit it into a Colab GPU.