Clearly we should train a diffusion model to denoise the weights of LLM transformer models. Yo dawg.