Yesterday someone posted an example of the same prompt but changing it to a human and it was basically trash, the example you've posted actually looks good all things considered. So yeah I do think its something they train on, same way they train on things in the benchmarks.
The easy way to tell is to try it yourself - run "Generate an SVG of a pelican riding a bicycle" and then try "Generate an SVG of an otter riding a skateboard" and see if the quality of the images seems similar.