Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Let's be honest, the real mess is with UNIX filenames. I dare you to come up with a legitimate use case for allowing newlines and other control characters in a file name.


It's like a built-in unit test - devs have to not mangle and assume anything about filenames they get from the system - though they still do, I've seen multiple times how my nice umlauts get mangled or my spaces cause scripts to fail.


A few years ago I tried naming my home directory with the unicode pile of poo () and a space in the name to test what of my code might break. However, it broke too much of third party tools/scripts that I occasionally needed for something, so I reverted within a few days.

Though it might be interesting to have an integration test box where the username (and thus all the relevant paths) includes all kinds of oddities - whitespace, emoji, right-to-left marker, etc.


Backward compatibility.


With what?


with almost 50 years of unix history.


I think the point was that UNIX got it wrong, and we've been dealing with the consequences ever since. It's of course too late to change it, so yeah.


Maybe. But 50 years ago utf-8 didn't exist, unicode didn't exist, possibly not even latin-1 did exist. If unix had enforced a specific encoding (which implies constrains which byte values can appear in a path byte string), transition to newer encodings would have been significantly harder.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: