Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I want to do something similar, and load it in a place like snowflake. I say "like" snowflake because its annoying that they have a $25 a month minimum, otherwise it's already nicely suited to be a personal data warehouse!


BigQuery charges per query and for storage which might work for you. Where did you see the $25 minimum for Snowflake?


BigQuery or Snowflake seem rather extreme for anything I might consider personal data. Even logging as many things as in the article a local sqlitedb and your preferred backup solution would get pretty far. It might be easier to setup nice dashboards with a cloud product though.


I want the db to be in the cloud. I'm not on my laptop most of my time and I'm definitely not on the same laptop more than 20% of my time. Hosting any sqlite db or postgres db on the cloud means I pay for hot storage which is significantly expensive if I have 200gb of data say. (Which I do; I have a ton of data from my lab days that also I want queryable like this; I also intend to purchase some datasets that I think might be fun to keep at hand so)


I can see the appeal. Those services might be designed for petabytes of data analysis, but the flip side is that you have something ready to go with little maintenance.


Forgot about BigQuery, will have to give it a shot!

The Snowflakes minimum Is something my colleague got from them when we set up an account for a startup idea.


You can pretty easily store most of these data in any database. Don't need something like Snowflake - postgres stock can handle about any analytics queries, especially if you're not writing too much.


So the issue is not paying for compute when you're not using it. Firstly I've tried running postgres on a low powered lightsail instance and while it runs, it's kinda insane that it takes a ton of time to load data and then to query it. Indexless scans of data in large tables (let's say a db with all my emails) can take a while!

I've tried the same in snowflake, and I'm only using their smallest warehouse, and everything happens in seconds. Loading GBS of data is seconds, scanning GBs of data is seconds. Their smallest compute warehouse is still significantly beefier than the cpu you get with a puny lightsail with a shared vcpu and while it's disk io is shit (since it's reading from S3) the parallelism makes up for it.

In the end I think it's the pricing model that makes the difference. With snowflake you seamlessly launch a very powerful machine to run your query and only get billed for a minute, and with running postgres you need to run your machine all the time. Also hot storage (as opposed to s3) is also non trivially expensive. Haven't seen any way to get 200gb of ssd storage without spending that 25 dollar minimum that snowflake costs anyway. This is data I intend to not even update or query more than a few times a month, so I really don't want to pay for compute running all the time in its name.


That's a fair point - the 200GB disk costs a bit more if you need an SSD (although my experience shows an SSD is not super necessary unless you have write heavy workloads or otherwise unusual access patterns).

If you drop the SSD requirement, a cheap dedicated server will blow Snowflake out of the water. As a bonus, you can run your own code (ETL, scrapers, dataviz). Kimsufi has 4 core 2TB (spinning) w/ 16GB RAM for $17/mo. Personally, that is the route I go for my personal warehouse, as I find I almost always want to put an API or django app in front (and other software like Celery, scrapers, etc.)


This is one of the reasons I've been building my version of this on top of SQLite: it's incredibly cheap. All you need is a writable disk somewhere. I started with a $5/month VPS.


How much data are you storing? How much is 200gb of storage? At least on lightsail it wasn't cheap! Also that's not even backed up!


Only about 20GB, so it's pretty inexpensive.

I'm not actually bothering to run backups because theoretically ALL of the data there can be retrieved from other sources - pulled back out of the Twitter archive exports for example.

But another similar project uses tarsnap for backups, which is pretty inexpensive.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: