I'm making a relational lang (https://tablam.org) and this was one of the questi...

cmrdporcupine · on Oct 4, 2021

Frankly, understanding of what the relational model actually is has been missing from the industry even among many people who have made data management their primary focus. IMHO a lot of the penetration of the NoSQL stuff in the last decade was based on misunderstandings of the capabilities and potentials of the relational model.

I think the early history of SQL was too influenced by ISAM-type stores and COBOL flatfiles and the like. Hence the vocabulary around "tables" and "columns" instead of the relations, relvars, tuples, etc. described by the actual relational algebraic model. To many of those with an OO fetish, "relational model" became synonymous with "tabular model", and the industry's early 2000's obsession with object polymorphism, OORMs middleware, etc continued the stall in relational database innovation.

I feel like there is now renewed interest in this stuff, though. Your project looks really neat, for example.

mamcx · on Oct 4, 2021

I think you analysis is correct. I also add that is related too to the languages, you point with Cobol, and I also add C, Pascal, Java, etc.

When they have nulls and limited to zero queries, relationships and data modeling capabilities is not hard to see that how they are is reflected in how the DBs are too. I honestly not know about sum types until a few years ago, neither have a clue how much useful they are, so surely even today many are in the same blind spots...

goto11 · on Oct 4, 2021

> To make sum types work, you NEED to support embed relations inside relations

If that is the case, sum types would be in violation of first normal form.

mamcx · on Oct 4, 2021

Even considering that normal forms are tools...

Sum types, how them are stored and how they are used is not the same ie: How are sum types represented in languages like Rust/oCalm? that is tangential as how we experience them.

So, support to embed relations is the step to provide the storage and partially, help in the query, but from the POV of the user them are "normal sum types".

But with that, you get the ability to store relations, that is usefull as-is :)

emmelaich · on Oct 4, 2021

Oddly enough, 1nf is differently defined depending on which person or database you're asking/using.

Is it

  . just to be rectangular?
  . values are atomic? (and what does that mean)
  . no such thing as nulls? good luck with that
  . no duplicate rows (good luck with enforcing that in intermediate tables)

goto11 · on Oct 5, 2021

E.F.Codd defines first normal form as eliminating nested relations (in "A Relational Model of Data for Large Shared Data Banks" which introduces the relational model). "Atomic" in this context just means any value which is not a relation.

A relation by definition cannot have duplicate rows, since relations are sets. If "rectangular" means that all rows in a table have the same columns (attributes), this also follows from the definition of relations. So these constraints have to be fulfilled before we can even talk about normalization, since normalization operates on relations. (Nulls are its own can of worms, but dosn't really have anything to do with 1NF.)

Codd introduces 1NF because he realized supporting nested relations would complicate the model and query language for no benefit. It would need special operators for navigate in and out or nested relations, e.g. if you wanted to join tables nested at different levels in other tables. You would need something like XPath on top of SQL. But you can express exactly the same information using foreign keys between tables and keep the language simpler.

The sum-types suggestion shows the problem. If we have sum types as values, presumably they would be composed of other types, right? So sum types could contain other sum types, arbitrary deep. Since you surely would want to query these individual values, you would need operators in SQL to navigate down into sum types. You would need facilities to index and apply constraints, again arbitrary deep down the structure. So the database model and query language gets significantly more complex for dubious benefit.

mamcx · on Oct 5, 2021

> So the database model and query language gets significantly more complex for dubious benefit.

Sum types are definitively not a "dubious benefit". Neither nesting relations (that are not that different to join them and will provide better semantics that create a LOT of null columns to flat them).

Will complicate matters and requiere extensions? Sure. But is clear that, today in this century, exist a lot of use-cases where people are resorting to pseudo-nosql (json embedding, to name one of the biggest anti-patterns), and that create more inconsistent features, extensions, weird extra syntax, etc. that anyway are required for it (and could have been cleaner under this idea).

With this, we can get the same benefits and much more.

P.D. Not against have a very simple core and try to model stuff that way, but the major point of a model, like relational, is support how to MODEL.

And we need to model nested thing, and we need to model variants and get rids of nulls and all that. So, sometimes, you need to add complexity + 1 so you get complications - 10.

tome · on Oct 5, 2021

> To make sum types work, you NEED to support embed relations inside relations

You don't need to embed relations. You don't even need to embed tuples or other composite types, you can just model sum types using a pair of fields. The latter is what Opaleye and Rel8 (Postgres libraries for Haskell) do:

https://hackage.haskell.org/package/opaleye-0.7.1.0/docs/src...

tome · on Oct 10, 2021

To reply to both my respondents, the link goes exactly to the source for Opaleye's version of the Maybe data type definition. Neatened up, it is

    data MaybeFields fields =
      MaybeFields {
        mfPresent :: Field SqlBool
      , mfFields  :: fields
      }

That is, an Opaleye Maybe (MaybeFields) is a pair. The first component is an SQL bool indicating whether it's a Just or a Nothing. The second component is the payload of the Just. If it's a Nothing the fields of the payload will be set to NULL.

mamcx · on Oct 12, 2021

Ok, this is for a Maybe/Option, but can't see how can work for any algebraic type.

I have this for mine:

    struct Ag {
        tag :String, //like "Option.None"
        data:Vec<Value> //The data inside, if any
    }

tome · on Oct 13, 2021

Arbitrary sum types are just a generalisation of what I wrote. Opaleye doesn't have them (yet?) but rel8 does:

https://hackage.haskell.org/package/rel8-1.1.0.0/docs/src/Re...

You have a bool which specifies whether it's left or right, and then two payloads, one for left and one for right.

alexvoda · on Oct 7, 2021

You linked to a file with pages of code. Care to summarize how they model sum types using a pair of fields?

mamcx · on Oct 8, 2021

Can be show how is done, with data instead of algorithms? (aka "...show me your tables and I won't usually need your flowcharts; they'll be obvious." ~ Fred Brooks")

Haskell is hard to decipher for me...

woggy · on Oct 4, 2021

Weird that this pops up today as I was trying to find something like this yesterday. Good luck with it!

alexvoda · on Oct 4, 2021

I am interested in opportunities to cooperate on this.

mamcx · on Oct 4, 2021

Great!, I have the project at https://tablam.org