In a real-life situation where I tried this the keys were very wide, compound, with text keywords. That's where most of the savings came from since before each row was mostly key and not very much actual associated data. Sorry to mislead you about TOAST compression.
It was daily data too, literally 10 or more orders of magnitude less ingestion than Timescale is built for, so the arrays fit inside 1-2 postgres pages and so writing was absolutely not a problem every for 5-10 years of data.
Timescale may be the right solution when you need, I quote from their docs, "Insert rates of hundreds of thousands of writes per second", or somewhere within a few orders of magnitude of that. Good for the niche, but the niche is uncommon.
Yes ditching timestamps. The full row configuration used amazon keywords as keys, so they looked something like start_date|end_date|company_id|keyword_match_type|keyword_text|total_sales_by_day. They match_type and keyword text were like 40 bytes per row, so that's where the huge savings came from.
The data was assumed to be contiguous, aka there's end_date-start_date+1 entries in the total_sales_by_day array.
If this data were in Timescale, the giant keys would be compressed on disk, but I believe (would need to check) that there would be a lot of noise in memory/caches/processing after decompressing while processing the rows.
Anyway, in conclusion, I do think Timescale has its niche uses, but I've seen a lot of people think they have time series data and need Timescale when they really just have data that is pre-aggregated on a daily or even hourly basis. For these situations Timescale is overkill.
Yeah, fully agreed with the overall thesis. Postgres is quite capable straight out of the box.
I really like the idea of TOAST compression as an archival format too for big aggregates - I’ll have to check out the performance of a (grouping_id, date, length_24_array_of_hourly_averages) schema next time I get an excuse.
It was daily data too, literally 10 or more orders of magnitude less ingestion than Timescale is built for, so the arrays fit inside 1-2 postgres pages and so writing was absolutely not a problem every for 5-10 years of data.
Timescale may be the right solution when you need, I quote from their docs, "Insert rates of hundreds of thousands of writes per second", or somewhere within a few orders of magnitude of that. Good for the niche, but the niche is uncommon.
Yes ditching timestamps. The full row configuration used amazon keywords as keys, so they looked something like start_date|end_date|company_id|keyword_match_type|keyword_text|total_sales_by_day. They match_type and keyword text were like 40 bytes per row, so that's where the huge savings came from.
The data was assumed to be contiguous, aka there's end_date-start_date+1 entries in the total_sales_by_day array.
If this data were in Timescale, the giant keys would be compressed on disk, but I believe (would need to check) that there would be a lot of noise in memory/caches/processing after decompressing while processing the rows.
Anyway, in conclusion, I do think Timescale has its niche uses, but I've seen a lot of people think they have time series data and need Timescale when they really just have data that is pre-aggregated on a daily or even hourly basis. For these situations Timescale is overkill.