What I'm saying is that: 1) Novel words are handled because they are just sequen...

What I'm saying is that:

1) Novel words are handled because they are just sequences of common tokens

2) Token -> letter sequence associations are either:

a) Deliberately added to the training set, and/or

b) Naturally occurring in the training set, which due to sheer size almost inevitably contains many, many, examples of word to letter sequence associations

Given how models used to fail badly on tasks related to this, and now do much better, it's quite likely that model providers have simply added these to the training set, just as they have added data to improve other benchmark tests.

That said, what I was pointing out is that words are represented as token sequences, so a word spelling sample is effectively a seq-2-seq (tokens to letters) sample, and we'd expect the model (which is built for seq-2-seq!) to be able to easily learn and generalize over these.