I’m not quite sure. I think that adversarial network works pretty well at image ...

I’m not quite sure. I think that adversarial network works pretty well at image generation.

I think that the problem here is that svg is structured information and an image is unstructured blob, and the translation between them requires planning and understanding. Maybe if instead of treating an svg like a raster image in the prompt is wrong. I think that prompting the image like code (which svg basically is) would result in better outputs.

This is just my uninformed opinion.