Because it is unfortunate or because it is incorrect?

ethomson · on Nov 19, 2013

I admit that it's certainly technically correct that you cannot embed a U+0000 into UTF-8. Of course, anybody who actually needed to be able to represent U+0000 in a C string would simply represent U+0000 as 0xC0 0x80 and be done with it.

dllthomas · on Nov 19, 2013

Heh, I wasn't calling you out, I was curious - I've done a little work with multibyte strings in C but never UTF-8, and it's been a very long time, so I didn't remember enough of the details to be sure of which you were saying.

DerekL · on Nov 19, 2013

But ASCII null-terminated strings can't contain null either, so I don't know why he says that it's just a problem for UTF-8.

Also, if you represent U+0000 by 0xC0 0x80, then it's "Modified UTF-8", and it's not valid UTF-8.