A system was needed for video, turns out it was a good fit for audio.
Audio and video aren't that different, TBH (audio just has more alpha/blending rules, and lower tolerance on missed frames; video has higher bandwidth requirements). Wouldn't surprise me if both pipelines eventually completely converge. Both "need" compositors anyways.
Consumer audio already works reasonably well but this apparently has massive improvements for bluetooth, especially the HFP profile which is used when using the built in headphones mic.
The main benefit imo is to pro audio so you don't need to configure separate tools and manually swap between pulse and jack every time you want pro audio.
It also manages permissions to record audio and the screen for wayland users.