I don't database, but I like to think I have some kind of intuition for storage space requirements, and this article was very confusing.
Ignoring the indexes and just focusing on the main table sizes reported, we have:
- String ("The frequent repetition of these names inflates the size of the table"): 392 MB
- Enum data type ("Internally, an enum type is stored as four-byte floating point number. So it saves space in the table [...]"): 338 MB
- Lookup table ("Also, since a smallint only occupies two bytes, the person_l table can potentially use less storage space than the other solutions"): 338 MB.
I just can't make sense of the numbers, especially given the authors comments that I've quoted.
I'm also wondering about that. But maybe this could be it?
> Surprisingly, the table is just as big as with the enum type above, even though an enum uses four bytes. The reason is that each table row is aligned at a memory address divisible by eight, so PostgreSQL will add six padding bytes after the smallint. If we had more columns and could arrange them carefully, we could see a difference.
This could be the explanation. If the row is padded to 8, bigint is 8, then smallint or enum also use 8. The entries in the string table will be 8 or 16 due to the string length. So one row in person_e and person_l is 16, one row in person_s could be about 20 on average, that is a bit closer to the reality than my intuition, although the storage savings are still less than what I would have expected.
edit:
I did also try out the test and dropped the primary key on the table to compare only enum and string size:
Ignoring the indexes and just focusing on the main table sizes reported, we have:
- String ("The frequent repetition of these names inflates the size of the table"): 392 MB
- Enum data type ("Internally, an enum type is stored as four-byte floating point number. So it saves space in the table [...]"): 338 MB
- Lookup table ("Also, since a smallint only occupies two bytes, the person_l table can potentially use less storage space than the other solutions"): 338 MB.
I just can't make sense of the numbers, especially given the authors comments that I've quoted.
Is this some kind of typo/editing fail?