I am curious about the nature of the output being rasterised bitmaps. I would ha...

I am curious about the nature of the output being rasterised bitmaps. I would have expected that it was easier for a model to generate output based on primitives that it learned as geometric shapes with spatial relationships (what does an "arm" look like, etc. as a shape). I would like to know if the model does have a layer that represents these and then it effectively "renders" them as rasterised images, or is it really computing at the level of pixels. So far I have not seen anything other than rasterised pixels.

I guess it matters because most of these images are unusable for further purposes because they can't really be edited and touched up easily to fix up all the flaws or do the final adaptation. Are there any options that generate the images in anything like vector art that would then facilitate the downstream finishing process rather than fully rasterised bitmaps?