pardon my ignorance - what exactly is involved in reimplementing these models?
i assume there's only a superficial description of the architecture, and no weights to load in, so you'll have to train everything from scratch? do we even have their dataset?
Generally it's without weights, but MusicLM is also a WIP. More mature implementations have descriptions on how to train them and follow ups on small scale/crowd-sourced experiments & research[1].
i assume there's only a superficial description of the architecture, and no weights to load in, so you'll have to train everything from scratch? do we even have their dataset?