Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm curious to see how this compares in real life to TimescaleDB hypertables with compression - which to me, reads as much the same thing. I'm wondering if Citus is bringing a lower level implementation of idea possibly?


The access method approach followed in Citus is indeed lower level and more generic, which means it can be used on both time series data and other types of data.

For time series data, you can use built-in partitioning in PostgreSQL. It's not as easy to use as TimescaleDB, but pg_partman goes a long way, see: https://docs.citusdata.com/en/latest/use_cases/timeseries.ht...

You can then use the columnar access method to compress old partitions (see the end of the doc), and use distributed tables to shard and parallelize queries and DML.


Came here to say this - I was looking to see how compression compared to timescale’s stated 91% compression.

https://docs.timescale.com/latest/using-timescaledb/compress...


There are a lot of differences that need to be taken into account before making a comparison.

1. TimescaleDB implements the compression on a hifher level, the underlying table storage/access method remains the same

2. TimescaleDB doesn't compress latest data, allowing you to keep fast writes and edits for recent data, but also allows you to benefit from compression on row based data

3. Although not currently available, it is possible to have a TimescaleDB hypertable with a column based access method

4. Comparing would have to take into account the data model, access methods (types of queries), ingestion vs query comparison (batch vs real time), backfilling and editing, etc

I agree that this (Columnar) would be closer to Parquet.


It always depends on the data, but we've seen 92.5% and more: https://twitter.com/JeffMealo/status/1368030569557286915


(TimescaleDB person)

TimescaleDB users have seen 98% (ie over 50x) compression rates in some real-world cases (e.g., for some IT monitoring datasets), but compression ratio will definitely vary by dataset. (For example, a dataset of just 0s will compress even better! But that's probably not a realistic dataset :-) )

The reality is that Citus and TimescaleDB [0][1] take very different approaches to columnar compression, which result in different usability and performance trade-offs. In reality one should choose the right tool for their workload.

(As an aside, if you have time-series data, no one has spent more time developing an awesome time-series experience on Postgres than the TimescaleDB team has :-) )

Kudos to the Citus team for this launch! I love seeing how different members of the Postgres community keep pushing the state-of-the art.

[0] Building columnar compression in a row-oriented database (https://blog.timescale.com/blog/building-columnar-compressio...)

[1] Time-series compression algorithms, explained (https://blog.timescale.com/blog/time-series-compression-algo...)


This reads to me more like parquet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: