I would love to know if other people in the industry (beside hickey/datomic) use...

mike_hearn · on Jan 28, 2025

Depends which industry. If you look at a lot of non-tech industry then they'll use a commercial DB with all those features in place already, rather than hacking up their own data layer. A few years ago I spent some time in the enterprise finance space, and learned some unfashionable tech you don't see talked about on Hacker News much. It left me with a new appreciation for what goes on there. A staggering amount of time spent in tech startups is spent on solving and resolving problems that you can buy off the shelf solutions for and have been able to for a long time.

After all, this talk is now 10 years old but appears to be describing features that have been around for much longer. Take your average bank - it will have a bunch of Oracle databases in it. Those already have every feature discussed in this thread and in the talk:

• Incremental materialized view maintenance (with automatic query rewrite to use it, so users don't have to know it exists).

• Exposing logical commit logs as an API, with tooling (e.g. GoldenGate, LogMiner, query change notifications).

• Time travelling SELECT (... AS OF).

• Lots of audit features.

• Integrated transactional and scalable MQ (no need for Kafka).

My experience was that faced with a data processing problem, enterprise devs will tend to just read the user guide for their corporation's database, or ask for advice from a graybeard who already did so. They go write some SQL or an Excel plugin or something old school, ship it, close the ticket, go home. Then a few years later you look at HN and find there's a whole startup trying to sell the same feature.

timacles · on Jan 28, 2025

Who besides Oracle offers this stuff though?

Yeah Oracle has a bunch of nice features, it also costs a gajillion dollars that no one besides a large enterprise can afford.

zbentley · on Jan 28, 2025

Debezium is popular in this space, though it does bring more tools into the CDC/CQRS stack: https://debezium.io/

mike_hearn · on Jan 28, 2025

I got curious about this and took a look at some pricing calculators. The results are pretty counter-intuitive.

Compared to PostgreSQL that you run yourself, it's expensive because PostgreSQL you run yourself costs nothing if you assume your time is free. But, how many people want to run it themselves? Especially as Postgres isn't much fun to admin (fiddling with vacuuming, setting up replication by hand, managing major version upgrades etc and you may not be able to scale this way).

So in reality a lot of companies and especially startups these days pay Amazon to run the database for them, and so then the cost question is how much more does it cost for a cloud hosted Oracle DB vs a cloud hosted Postgres DB?

Well, an 8 vCPU hosted RDS Postgres in AWS with 32 GB of memory and 100 GB of storage plus another 200 GB of backup storage - so one less powerful than a local DB on my laptop - costs $1,200/month in US East. That's expensive! AWS doesn't let you scale CPU and RAM independently, so I tried to pick something in the middle. For only 100 GB of data you probably don't need 4 physical cores.

So then I checked the OCI (Oracle Cloud) price calculator and specced out a similar database. I picked autonomous serverless (i.e. fully managed), transaction processing+mixed, autoscaling with 8 ECPUs and same amount of primary/backup storage. They don't let you spec RAM independently, I guess because it's a shared DB so RAM usage is transient and not a VM allocation. The cost came to ~$800/month - that's significantly cheaper than RDS Postgres despite that Oracle DBs have drastically more features. Many of which are optimizations that can reduce your database load anyway, so presumably you need more Postgres cores to match the equivalent performance if those features are used smartly (honestly I haven't ported an app between postgres and oracle so I don't have experience with this).

This is pretty surprising. I'd have expected an Oracle DB to cost more, not less. Auto-scaling is part of it (cost is double RDS if you turn that off), but then again, this is possible because Oracle has more multi-tenant and resource isolation features to begin with so it's reasonable to share a database server and overcommit CPU. With AWS it's a full VM so you have to stop the db server manually if you want to save money. Also OCI is a cheaper cloud than AWS as it has less brand recognition I guess. This feels a bit like Amazon is exploiting people's mental defaults. Lots of devs think AWS and Postgres are the only cloud+db combination that is reasonable to consider, and apparently they charge on that basis?

I haven't specced out what a hosted bare metal cluster would cost. You can't cluster Postgres in the same way anyway (multi-write master with full SQL, no sharding).

agumonkey · on Jan 28, 2025

time based snapshots are in datomic and also possible in other dbms (maybe via extensions)

for the rest i don't know

guskel · on Jan 28, 2025

XTDB (inspired by datomic) also has bitemporal queries.

agumonkey · on Jan 28, 2025

thanks I couldn't remember that name

agumonkey · on Jan 28, 2025

Wouldn't surprise me a bit. Thanks for the detailed comment.

bubbleRefuge · on Jan 28, 2025

In large scale business intergration platforms/apps, you have operational systems like SAP and and Oracle Service Cloud generate/stream raw or business events which are published to message brokers in topics ( orders, incidents, suppliers, logistics, etc). There the data is published , validated, transformed (filtered, routed, formatted, enriched, aggregated, etc) into other downstream topics which can be used to egress to other apps or enterprise data stores/data lakes. Data governance apps control who has access. Elastic search or Splunk for data lineage and debugging. you also have sbservability systems sandwiched in there as well.

aebtebeten · on Jan 28, 2025

thoughts from 1992 (Gray+Reuter): https://news.ycombinator.com/item?id=42829878

agumonkey · on Jan 28, 2025

very interesting, every generation foresees the same solutions somehow

reubenmorais · on Jan 28, 2025

LinkedIn supposedly: https://engineering.linkedin.com/distributed-systems/log-wha...