Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

π•Ώπ–π–Žπ–˜ π–‹π–Šπ–Šπ–‘π–˜ π–‘π–Žπ–π–Š π–™π–Šπ–—π–—π–Žπ–‡π–‘π–Š π–π–†π–ˆπ– π–‡π–šπ–™ 𝕴 π–‘π–Žπ–π–Š π–Žπ–™. π•Ήπ–”π–œ 𝕴 π–ˆπ–†π–“ π–šπ–˜π–Š 𝖆𝖑𝖑 π–π–Žπ–“π–‰π–˜ 𝖔𝖋 π–‹π–†π–“π–ˆπ–ž π–‹π–”π–—π–’π–†π–™π–™π–Žπ–“π–Œ 𝖔𝖓 π–™π–π–”π–˜π–Š π–˜π–Žπ–™π–Šπ–˜ 𝖙𝖍𝖆𝖙 π–‰π–”π–Šπ–˜π–“'𝖙 π–˜π–šπ–•π–•π–”π–—π–™ π–‹π–”π–—π–’π–†π–™π–™π–Žπ–“π–Œ.


Except when the site in question is completely broken wrt astral codepoints.

Which is unexpectedly common as MySQL's "utf8" can't handle codepoints outside the BMP and will just truncate text at the first astral codepoint[0]. You need MySQL 5.5.3 (because adding a whole new encoding in a minor version makes perfect sense) and "utf8mb4" (because why would a codec called "utf8" actually do UTF8?). And then the regex are probably broken because it's PHP and developers use neither UNICODE mode nor properties (PCRE's "\w" will not match all unicode letters, you need "\p{L}" for that, also note that e.g. "πŸ†„" is a symbol not a letter, although "𝔹" is a letter)

[0] https://mathiasbynens.be/notes/mysql-utf8mb4


MySQL is horrible for all the same reasons PHP is horrible, and this applies to Unicode too, except PHP is actually trying to fix its Unicode problems (UTF8 is the default now, moves towards adding a UString class), while MySQL isn't fixing them.


Like π‘»π’˜π’Šπ’•π’•π’†π’“! https://twitter.com/egypturnash/status/535105548761309184




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: