Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
OLTP, OLAP, Data Warehouses, Data Lakes: what does it all mean? (cloudnatively.com)
32 points by tiwarinitish86 on Aug 17, 2022 | hide | past | favorite | 14 comments


It's marketing bs. Olap is a special case of oltp with a dedicated dimensional client. Data lakes are just cloud NFS at petabytes scale, data warehouses try to wrap everything (except olap) holistically together. There are layers upon layers of complexity for each of those in production and they all feel insane to me. they all stop short of putting into practice a unified data management theory.


I'd like to argue that conventional notions of OLTP are ill fitted to modern CRUD/SaaS apps.

OLAP claims to be optimized for read-heavy workloads. But looking at access patterns for SaaS/CRUD apps, data shows the read/write ratio is along the order of 90/10

Additionally, OLTP purportedly deals in entire rows while OLAP is selective about columns and may contain aggregate functions/derived data.

But think about things like GraphQL, Firebase/Supabase, OData, etc

It's common in modern APIs to ask for a selective, minimal number of columns and also calculate derived information to display on a page

To me, the types of apps I've built my entire career sound much more like OLAP than OLTP, but we wouldn't call them that?


I think the OLAP vs OLTP distinction is most interesting when you run into constraints and are forced to optimize for one at the expense of the other.

Computers are so big and fast today, many applications won't hit that point.


> It's common in modern APIs to ask for a selective, minimal number of columns and also calculate derived information to display on a page

Yes, but still for common OLTP use cases you're only interested in a very small amount of data, say, the purchase orders of one customer. Whereas for OLAP, your queries touch on vast amounts of data, e.g. all purchase orders by month and product category. That's why typically a row-based storage model makes sense for OLTP, whereas for OLAP typically column-oriented models make sense.


OLAP vs OLTP is more than just read vs write -heavy or column vs row -oriented (although it certainly is those things).

RDBMSs used for OLTP also provide consistency and transaction semantics, which are critical to application correctness, but not needed for OLAP uses.


Realistically what OLAP DBs do you know that don't provide correctness guarantees too though?

For example: https://cloud.google.com/bigquery/docs/reference/standard-sq...


True transaction semantics are rare in OLAP world.

As for BigQuery:

> If a transaction mutates (update or deletes) rows in a table, then other transactions or DML statements that mutate rows in the same table cannot run concurrently.

Kind of a non-starter.


Almost none that are less than $500k/year to rent, next to none.


Misses Data Mesh, buzzwords too outdated


and Data Fabric


I'm partial to our data swamp.


Only thing interesting mentioned here to me is Apache Iceberg and it looks to be very early in development.

Being able to evolve schemas overtime is useful. I'll look into it again in a few years.


A read replica of your OLTP db, will get you a long way before worrying about all of the complexities and inefficiency of the buzzword based approaches.


I'm pretty sure this is puff piece for Dremio, which is yet another attempt to create commercial software on top of a valuable set of open-source tools, namely apache arrow.

In general, i think we are in the final throws of discrete, "enterprise" analytical data architecture. The cloud is quickly making analytical uses case just another storage/query product. Especially as tools like AWS Lake Formation and Palantir come to maturity, we won't need special concepts like "Data Lake" any more.

Also -- OLAP, really? 2000 called, it want's it's buzzwords back...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: