Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Because it is unfortunate or because it is incorrect?


I admit that it's certainly technically correct that you cannot embed a U+0000 into UTF-8. Of course, anybody who actually needed to be able to represent U+0000 in a C string would simply represent U+0000 as 0xC0 0x80 and be done with it.


Heh, I wasn't calling you out, I was curious - I've done a little work with multibyte strings in C but never UTF-8, and it's been a very long time, so I didn't remember enough of the details to be sure of which you were saying.


But ASCII null-terminated strings can't contain null either, so I don't know why he says that it's just a problem for UTF-8.

Also, if you represent U+0000 by 0xC0 0x80, then it's "Modified UTF-8", and it's not valid UTF-8.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: