We're using fly at my work. It's had multiple outages in the last month that have taken down our production servers. There has been no proactive communication and very little insight besides "We've identified the issue and are attempting a fix."
We're now 24 hours into an outage that started with everything being taken offline, and is now causing intermittent 502 errors. Their status page (https://status.flyio.net/) still shows 99.99% uptime 24 hours into an outage.
Besides the outages, the service is great. But, that's a big caveat. We're pretty frustrated and are considering leaving.
Is anyone else in the same situation, and if so what's keeping you/what are you leaving for?
I think it might've prevented users from posting on our forums or sending in an email (premium support). I can imagine users looking at the status page and mistakenly thinking their problems were related to the current incident.
I've interpreted "Monitoring" as essentially meaning: "this is fixed, but we're keeping a close eye on the situation". We do not yet have a formal process for incidents such as this one (but we are working on that).
If our users are having issues, that's a problem. Looking at our own metrics, the community forum and our premium support inbox: I don't believe this to be the case.
Perhaps we should've done a better job at explaining the exact symptoms our users might be experiencing from this particular incident.