It's called NoSQL, which removes the need for schema migrations for things like ...

Nick-Craver · on May 4, 2016

> With the foreign key table, performance would suffer, but probably not enough to matter for most use cases.

Citation needed :) That's going to really depend.

I'm not for or against NoSQL (or any platform). Use what's best for you and your app!

In our case, NoSQL makes for a bad database approach. We do many cross-sectional queries that cover many tables (or documents in that world). For example, a Post document doesn't make a ton of sense, we're looking at questions, answers, comments, users, and other bits across many questions all the time. The same is true of users, showing their activity for things would be very, very complicated. In our case, we're simply very relational, so an RDBMS fits the bill best.

_979m · on May 4, 2016

Sorry for being unclear. I'm not proposing NoSQL. I'm saying that many NoSQL users really mainly want NoDDL, which can be implemented on top of Postgres JSON storage while retaining SQL.

- data (string type, int id, json fields)

- fk (string type, int subj_id, int obj_id)

  select
    data.id,
    data.fields,
    fk_1.obj_id as 'foo_id'
    fk_2.obj_id as 'bar_id'
  from data
  join fk as fk_1 on data.id = fk_1.subj_id
  join fk as fk_2 on data.id = fk_2.subj_id
  where 
    data.type = 'my_table'
    and fk_1.type = 'foo'
    and fk_2.type = 'bar'

What would the performance characteristics of that be versus if "foreign keys" are stored in the same table as the data, if fk has the optimal indexes?

xenadu02 · on May 4, 2016

In no specific order:

If your database doesn't enforce the schema you still have a schema, it's just ad-hoc and spread across all your different processes, and no one quite agrees what it is. In the real world as requirements change and your app/service increases in complexity this becomes a constant source of real bugs while simultaneously leading to garbage data. This is not theoretical, we have a lot of direct painful experience with this. Best case scenario your tests and tooling basically replicate a SQL database trying to enforce the schema you used NoSQL to avoid in the first place.

Indexes are fast but they aren't magic. A lot of what a traditional SQL database does is providing a query optimizer and indexes so you can find the data you need really fast. Cramming everything into a few tables means everything has to live in the same index namespace. Yes you can use views and sometimes even indexed views, but then you have a schema so why jump through hoops to use non-optimized storage when the database has actual optimized storage?

Separate database tables can be put on separate storage stacks. A single table can even be partitioned onto separate storage stacks by certain column values. Cramming everything into four tables makes that a lot more complicated. It can also introduce contention (depending on locking strategies) where there wouldn't normally be any.

IMHO most systems would be better served by sharding databases than by using NoSQL and pretending they don't have a schema. If application design prevents sharding then scaling single-master, multiple-read covers a huge number of cases as well. The multiple-master scenario NoSQL systems are supposed to enable is a rare situation and by the time you need that level of scale you'll have thrown out your entire codebase and rewritten it twice anyway.

The key to schema migrations is just to add columns and tables if needed, don't bother actually migrating. Almost all database engines can add columns for "free" because they don't go mutate existing rows. Some can drop columns for "free" too by marking the field as obsolete and only bothering to remove it if the rows are touched.

_979m · on May 4, 2016

Postgres (and at least one other RDBMS) has partial indexes, which pretty much solves the index namespace problem you mention: http://www.postgresql.org/docs/8.0/static/indexes-partial.ht... Partial indexes are integrated into the proof-of-concept repo I linked.

Storing a data type field in the generic storage table enables the same partitioning ability as a standard schema.

99% of NoSQL database users just don't want to deal with migrations, even if they're "free" (another big issue is synchronizing application code state and DB migration state of production, testing, and developer machines), so what they really need is NoDDL, YesSQL.

> Almost all database engines can add columns for "free" because they don't go mutate existing rows. Some can drop columns for "free" too by marking the field as obsolete and only bothering to remove it if the rows are touched.

Didn't know that, thanks.

> It can also introduce contention (depending on locking strategies) where there wouldn't normally be any.

Didn't think of that. I'm aiming this at 99% of NoSQL users in which doing things you could do with SQL requires much more effort, so allowing them to do it with SQL can accept a modest performance degradation, but if you have any good links relevant to how this storage design would affect lock contention, please share.

philliphaydon · on May 4, 2016

A .NET abstraction for PostgreSQL: https://github.com/JasperFx/marten