OLTP, OLAP, Data Warehouses, Data Lakes: what does it all mean?

gxt · on Aug 17, 2022

It's marketing bs. Olap is a special case of oltp with a dedicated dimensional client. Data lakes are just cloud NFS at petabytes scale, data warehouses try to wrap everything (except olap) holistically together. There are layers upon layers of complexity for each of those in production and they all feel insane to me. they all stop short of putting into practice a unified data management theory.

gavinray · on Aug 17, 2022

I'd like to argue that conventional notions of OLTP are ill fitted to modern CRUD/SaaS apps.

OLAP claims to be optimized for read-heavy workloads. But looking at access patterns for SaaS/CRUD apps, data shows the read/write ratio is along the order of 90/10

Additionally, OLTP purportedly deals in entire rows while OLAP is selective about columns and may contain aggregate functions/derived data.

But think about things like GraphQL, Firebase/Supabase, OData, etc

It's common in modern APIs to ask for a selective, minimal number of columns and also calculate derived information to display on a page

To me, the types of apps I've built my entire career sound much more like OLAP than OLTP, but we wouldn't call them that?

andrewf · on Aug 17, 2022

I think the OLAP vs OLTP distinction is most interesting when you run into constraints and are forced to optimize for one at the expense of the other.

Computers are so big and fast today, many applications won't hit that point.

gunnarmorling · on Aug 17, 2022

> It's common in modern APIs to ask for a selective, minimal number of columns and also calculate derived information to display on a page

Yes, but still for common OLTP use cases you're only interested in a very small amount of data, say, the purchase orders of one customer. Whereas for OLAP, your queries touch on vast amounts of data, e.g. all purchase orders by month and product category. That's why typically a row-based storage model makes sense for OLTP, whereas for OLAP typically column-oriented models make sense.

advisedwang · on Aug 17, 2022

OLAP vs OLTP is more than just read vs write -heavy or column vs row -oriented (although it certainly is those things).

RDBMSs used for OLTP also provide consistency and transaction semantics, which are critical to application correctness, but not needed for OLAP uses.

gavinray · on Aug 17, 2022

Realistically what OLAP DBs do you know that don't provide correctness guarantees too though?

For example: https://cloud.google.com/bigquery/docs/reference/standard-sq...

advisedwang · on Aug 17, 2022

True transaction semantics are rare in OLAP world.

As for BigQuery:

> If a transaction mutates (update or deletes) rows in a table, then other transactions or DML statements that mutate rows in the same table cannot run concurrently.

Kind of a non-starter.

jimcavel888 · on Aug 18, 2022

Almost none that are less than $500k/year to rent, next to none.

rgavuliak · on Aug 17, 2022

Misses Data Mesh, buzzwords too outdated

supercanuck · on Aug 17, 2022

and Data Fabric

FateOfNations · on Aug 18, 2022

I'm partial to our data swamp.

waynesonfire · on Aug 17, 2022

Only thing interesting mentioned here to me is Apache Iceberg and it looks to be very early in development.

Being able to evolve schemas overtime is useful. I'll look into it again in a few years.

WatchDog · on Aug 18, 2022

A read replica of your OLTP db, will get you a long way before worrying about all of the complexities and inefficiency of the buzzword based approaches.

muad_kyrlach · on Aug 17, 2022

I'm pretty sure this is puff piece for Dremio, which is yet another attempt to create commercial software on top of a valuable set of open-source tools, namely apache arrow.

In general, i think we are in the final throws of discrete, "enterprise" analytical data architecture. The cloud is quickly making analytical uses case just another storage/query product. Especially as tools like AWS Lake Formation and Palantir come to maturity, we won't need special concepts like "Data Lake" any more.

Also -- OLAP, really? 2000 called, it want's it's buzzwords back...