> It was really funny, Uber switched from MySQL to Postgres, and then Postgres to MySQL. So they’ve gone back and forth a couple of times, but when they’ve swapped to Postgres, they had to figure out – they had all this app logic that relied on things being case-insensitive, because the database just doesn’t respect case search.
This suggests to me that the guest here maybe doesn't know about collations? But that doesn't seem likely (possible?) given his stated experience and the roles he's held. So what gives?
Guest here. The context is Uber was on MySQL which by default doesn't respect (or at least in their version and setup at the time) case sensitivity. To MySQL craig is the same as Craig when matching.
I'm fairly familiar with Postgres, but have not heard of a collation that tackles case insensitivity without tweaking the system itself? Postgres you can tweak this at the system level, but I've not seen it in the wild and as per the docs you're now non deterministic. For MySQL it appears you can change this, but seems a surprising default for them they expected case insensitivity.
Yes, another option could have been indexing and searching on lower or upper casing, but they wanted the minimal change to their app.
There was once a conference talk they gave about their migration process from MySQL->Postgres, but I'm not immediately able to find the video online so it may have been removed.
I'm not familiar with pg, but the choice of case sensitivity is a fundamental property of collations in other DBMSes like MSSQL. Web search does make it look like pg only got case-insenstivity-via-collation in v12 (2019), and also requires you to make your own collation with case-insensitivity since it doesn't have any built-in ones with it.
It's been a while and don't remember the specifics, but when we implemented a POSIX-compliant filesystem at Maginatics (acq. by EMC), we used MySQL (Percona actually) to store the FS logic (inode numbers, filenames, etc) and case sensitivity was one of the FS features. Maybe Percona supports it? I do remember (at least at that time) that full utf-8 support was poor though.
Btw, don't get me started about case-sensitive file systems - Windows got it right (case-preserving, but case-insensitive). Why should both /home/ellen/Music and /home/ellen/music exist? :)
Because once you go there, you have to answer questions which are not as easy. Does M match m? Does i match ı? Does a match あ? Does あ match ア? Does m match 𝓂? What happens when they need to coexist?
Great practical note by Josh Berkus on why Uber left Posgresql. Basically: runaway table bloat because Uber had a usecase that postgres doesn't address as well as InnoDB.
The whole VACUUM paradigm is the biggest thing that bugs me about pgsql. The fact that it can actually freeze things always worries me. Can’t this happen constantly in the background like modern GCs?
I learned that about postgres collations and case sensitivity about 6 years back, and it was within 3 months of my first heavy use of postgres. So I wonder if this person was just experiencing their first usage of postgres too? And I've used relational databases for more than 25 years.
I read that as Uber discovered their code relied on MySQL default case insentivity. Migrating to Postgres might have been tricky as collation is a very recent addition in Postgres.
That could be. It was the "because the database just doesn’t respect case search" part that struck me -- to say that MySQL doesn't respect case search is false, and the only scenario I can think of where it can even appear to be the case is if you're new to MySQL and you've made a big assumption without even googling it. Or maybe if you're new to string comparison (and therefore programming?) in general.
This suggests to me that the guest here maybe doesn't know about collations? But that doesn't seem likely (possible?) given his stated experience and the roles he's held. So what gives?