Hacker Newsnew | past | comments | ask | show | jobs | submit | eoswald's commentslogin

Hahah at least you're not getting called every five minutes because you cant shut off the alerts, because its apparently deployed SOMEWHERE but good luck finding how to access it. Can't wait to see the bill from Twilio because of this lol


Today changed my opinion on them completely. Was willing to give them the benefit of the doubt that they're growing fast, but now seeing that they've failed to scale properly, and are missing little things that become big things later. I can't take that risk.


Yep, and this is why I'm pissed. They lied. They're completely dependent on GCP. So, I gotta do some research, i need something a little more stable (and less dependent on one company's whims) than this. This is bad for them, because it really strikes at the heart of their 'big claim,' peacefull software deployments. This is chaos.


Yea, I mean, that's the whole MO of our platform and we failed at that. So yea, that's disappointing and more so for our customers.

I can provide an explanation about the GCP dependency. Yes, we have host workloads off GCP, and we have been able to build a good business by performing a cloud exit. However, we were worried that we would have a circular dependency on our own cloud. I don't think we expected to get auto-modded out of our own account, hence we left our DB on CloudSQL.

It was never our intent to deceive people that we didn't own our own destiny with our business. The last GCP issue, we were assured that this scenario wouldn't happen (when we got auto-ratelimited, which was bad, but survivable) - but it seems like we have further work to do. Apologies.


I’m very sympathetic and understand that decisions are easy to criticize in hindsight but leaving your database in GCP while moving everything else to your own data centres seems so backwards I can’t even begin to imagine how that could happen. Was this really an intentional design decision?


I have exactly the same architecture. You can easily administer a postgres/mysql on your own infrastructure, but it's also the one thing where backups and availability are super strict. I can easily support multi-region in Google Cloud or AWS and that's way harder to do on-prem, and it's also hard to handle the replication story as safely as with Google Cloud. The hope is that GCP et al. give you safety and availability for the control plane stuff and you can run your data plane on-prem.

At $2m/mo spend, this kind of thing is insane. GCP has never been the most reliable of clouds but this is pretty awful. I would never have expected this.


I have kind of the same architecture. I host multiple dedicated servers and vps instances in the Hetzner "cloud", but all of these connect to a few hosted databases by Hetzners web hosting packages for like 20 bucks a month. It sounds insane, but the one thing that absolutely needs to stay online, is the database, so not hosting this myself makes sense. And since Hetzner is apparently tuned their dirt cheap databases pretty well, we can hammer them pretty hard without any problems.


> decisions are easy to criticize in hindsight

I mean, the pain we have caused our customer ultimately proves you correct. That said, we made our decisions with the information and constraints that we knew in that moment in time. Railway has hosts in AWS/GCP/and co-los, so coordinating those workloads in a fully distributed manner would be ideal but end of the day, we didn't forsee that would just have our project get deleted just like that.

(Even if we did get assurances from them in 2024, that it wouldn't happen again, although we just got auto-rate limited the last time.)


Thanks for getting things back up (genuinely mean that, btw). Upon logging back in I was prompted to promise I'm not deploying naughty things (I'm not). Was this in response to GCP detecting illegal (prohibited) behavior from something deployed via railway?


Actually, when I made the TOS check, I put that in Redis. That + the feature flags got reset.


could you clarify, did an automated process by Google delete a GCP project/account/resource(s)? like, what exactly were you seeing when trying to get access or see what happened?


They deleted our GCP proj. sans warning. Still working the details, but that's how this whole thing began.


this is easily explained by "database migrations are incredibly difficult and very risky"


Why CloudSQL? why not AlloyDB for stability?


Sorry, I have a hard time blaming Google for this, when Railway seems to be having increasing trouble keeping the platform stable. Something like this should NOT take down an ENTIRE service. There should be a backup when literally your business is about being the reliable backend. This just seems like poor planning to me.


I don't quite know what you mean. Do you really expect Railway to use a multi-cloud architecture to host all of their client's projects? I suspect that would lead to a lower availability, all things considered.


Well, in the same token, is it smart to base your ENTIRE architecture on a single cloud architecture? Isn't that why some of us build in fallbacks for AWS-hosted services? I mean, their enitre platform, both public and private facing, is running on the same thing. One error, one problem, takes out the entire service.


Taking this at face value, this doesn't happen to AWS clients - at least I don't read about it here.

AWS may have data centers[0] go[1] down[2], but that's within expected bounds of standard ops.

[0] https://hooks.slack.com/services/TJ7HQS7FC/B0B5S7UTBJ4/PUHIC...

[1] https://www.aljazeera.com/news/2025/10/21/what-caused-amazon...

[2] https://netflixtechblog.com/lessons-netflix-learned-from-the...


They literally own their own data centers. That's whats surprising about this. They are lying to their customers when they say they operate their own data center because obviously they don't if everyone's apps are down with GCP blocking their account.


Is it not possible that they own their own data center and have an unfortunate Google dependency?

Obviously a fiasco but I’m not prepared to call them liars when it could be an honest mistake.


Then don't say your not a "Cloud on top of a cloud" provider.

They even made fun of cloud providers being down when AWS was down.


I imagine there's also an important difference between:

1. We depend on X but could gracefully migrate to an alternate in a week if we really needed to.

2. All data is mirrored instantly so that we can do seamless fail-over in case X has its own outage.


Oh, I see what you mean. Eh, it's possibly the same reason that AWS essentially goes down when us-east-1 goes down.


Disaster recovery is pretty expensive, right? Especially for their size.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: