I'm making a relational lang (https://tablam.org) and this was one of the question marks for me.
The #1 reason sum types was not presented is because "nobody" thought (seriously) about that for long time. See how much "null" have been with us and still you find defenders of it!
The second and more broad problem, is that the relational model was NOT implemented by the RDBMS makers, not in full. To make sum types work, you NEED to support embed relations inside relations:
Frankly, understanding of what the relational model actually is has been missing from the industry even among many people who have made data management their primary focus. IMHO a lot of the penetration of the NoSQL stuff in the last decade was based on misunderstandings of the capabilities and potentials of the relational model.
I think the early history of SQL was too influenced by ISAM-type stores and COBOL flatfiles and the like. Hence the vocabulary around "tables" and "columns" instead of the relations, relvars, tuples, etc. described by the actual relational algebraic model. To many of those with an OO fetish, "relational model" became synonymous with "tabular model", and the industry's early 2000's obsession with object polymorphism, OORMs middleware, etc continued the stall in relational database innovation.
I feel like there is now renewed interest in this stuff, though. Your project looks really neat, for example.
I think you analysis is correct. I also add that is related too to the languages, you point with Cobol, and I also add C, Pascal, Java, etc.
When they have nulls and limited to zero queries, relationships and data modeling capabilities is not hard to see that how they are is reflected in how the DBs are too. I honestly not know about sum types until a few years ago, neither have a clue how much useful they are, so surely even today many are in the same blind spots...
Sum types, how them are stored and how they are used is not the same ie: How are sum types represented in languages like Rust/oCalm? that is tangential as how we experience them.
So, support to embed relations is the step to provide the storage and partially, help in the query, but from the POV of the user them are "normal sum types".
But with that, you get the ability to store relations, that is usefull as-is :)
Oddly enough, 1nf is differently defined depending on which person or database you're asking/using.
Is it
. just to be rectangular?
. values are atomic? (and what does that mean)
. no such thing as nulls? good luck with that
. no duplicate rows (good luck with enforcing that in intermediate tables)
E.F.Codd defines first normal form as eliminating nested relations (in "A Relational Model of Data for Large Shared Data Banks" which introduces the relational model). "Atomic" in this context just means any value which is not a relation.
A relation by definition cannot have duplicate rows, since relations are sets. If "rectangular" means that all rows in a table have the same columns (attributes), this also follows from the definition of relations. So these constraints have to be fulfilled before we can even talk about normalization, since normalization operates on relations. (Nulls are its own can of worms, but dosn't really have anything to do with 1NF.)
Codd introduces 1NF because he realized supporting nested relations would complicate the model and query language for no benefit. It would need special operators for navigate in and out or nested relations, e.g. if you wanted to join tables nested at different levels in other tables. You would need something like XPath on top of SQL. But you can express exactly the same information using foreign keys between tables and keep the language simpler.
The sum-types suggestion shows the problem. If we have sum types as values, presumably they would be composed of other types, right? So sum types could contain other sum types, arbitrary deep. Since you surely would want to query these individual values, you would need operators in SQL to navigate down into sum types. You would need facilities to index and apply constraints, again arbitrary deep down the structure. So the database model and query language gets significantly more complex for dubious benefit.
> So the database model and query language gets significantly more complex for dubious benefit.
Sum types are definitively not a "dubious benefit". Neither nesting relations (that are not that different to join them and will provide better semantics that create a LOT of null columns to flat them).
Will complicate matters and requiere extensions? Sure. But is clear that, today in this century, exist a lot of use-cases where people are resorting to pseudo-nosql (json embedding, to name one of the biggest anti-patterns), and that create more inconsistent features, extensions, weird extra syntax, etc. that anyway are required for it (and could have been cleaner under this idea).
With this, we can get the same benefits and much more.
P.D. Not against have a very simple core and try to model stuff that way, but the major point of a model, like relational, is support how to MODEL.
And we need to model nested thing, and we need to model variants and get rids of nulls and all that. So, sometimes, you need to add complexity + 1 so you get complications - 10.
> To make sum types work, you NEED to support embed relations inside relations
You don't need to embed relations. You don't even need to embed tuples or other composite types, you can just model sum types using a pair of fields. The latter is what Opaleye and Rel8 (Postgres libraries for Haskell) do:
To reply to both my respondents, the link goes exactly to the source for Opaleye's version of the Maybe data type definition. Neatened up, it is
data MaybeFields fields =
MaybeFields {
mfPresent :: Field SqlBool
, mfFields :: fields
}
That is, an Opaleye Maybe (MaybeFields) is a pair. The first component is an SQL bool indicating whether it's a Just or a Nothing. The second component is the payload of the Just. If it's a Nothing the fields of the payload will be set to NULL.
Can be show how is done, with data instead of algorithms? (aka "...show me your tables and I won't usually need your flowcharts; they'll be obvious." ~ Fred Brooks")
The #1 reason sum types was not presented is because "nobody" thought (seriously) about that for long time. See how much "null" have been with us and still you find defenders of it!
The second and more broad problem, is that the relational model was NOT implemented by the RDBMS makers, not in full. To make sum types work, you NEED to support embed relations inside relations:
Then with this, do sum types comes from free: This will make things like storing Logs, metrics and event sourcing so much nicer!How store that is another matter, but I think is not that complicated, just none of the major rdbms have put effort on this.
This is one of my dreams: Making a rdbms with this and other stuff that push the DBs to the modern way of make apps...