The model weights (the thing being updated by the training process) stay loaded in gpu memory during training (the slow part). This could be useful to serialize the model weights to disk when checkpointing or completed, but it's a drop in the bucket compared to the rest of the time spent training.