Seems like major issues are still ongoing. If anything it seems *worse* than it ...

markus_zhang · 2025-10-20T17:20:39 1760980839

It has been quite a while, wondering how many 9s are dropped.

365 day * 24 * 0.0001 is roughly 8 hours, so it already lost the 99.99% status.

rdtsc · 2025-10-20T17:44:32 1760982272

9s don’t have to drop if you increase the time period! “We still guarantee the same 9s just over 3450 years now”.

rvba · 2025-10-20T18:43:41 1760985821

In a company where I worked, the tool measuring downtime was at the same server, so even if the server was down they still showed 100% up.

If the server didnt work - the tool too measure didnt work too! Genius

bityard · 2025-10-20T18:58:28 1760986708

This happened to AWS too.

February 28, 2017. S3 went down and took down a good portion of AWS and the Internet in general. For almost the entire time that it was down, the AWS status page showed green because the up/down metrics were hosted on... you guessed it... S3.

https://aws.amazon.com/message/41926/

CaptainOfCoit · 2025-10-20T20:57:31 1760993851

Happened a couple of times :)

- 2008 - https://news.ycombinator.com/item?id=116445

- 2010 - https://news.ycombinator.com/item?id=1396191

- 2015 - https://news.ycombinator.com/item?id=10033172

- 2017 - https://news.ycombinator.com/item?id=13755673 (Postmortem: https://news.ycombinator.com/item?id=13775667)

- 2024 - https://news.ycombinator.com/item?id=41770111

hinkley · 2025-10-20T23:41:42 1761003702

Five times is no longer a couple. You can use stronger words there.

bapak · 2025-10-21T06:44:58 1761029098

It happened a murder of times.

hinkley · 2025-10-21T17:38:10 1761068290

Ha! Shall I bookmark this for the eventual wiki page?

casey2 · 2025-10-22T10:44:05 1761129845

https://www.youtube.com/watch?v=HxP4wi4DhA0

Maybe they should start using real software instead of mathematicians' toy langs

Scoundreller · 2025-10-20T22:02:22 1760997742

Have we ever figured out what “red” means? I understand they’ve only ever gone to yellow.

kokanee · 2025-10-20T23:03:47 1761001427

If it goes red, we aren't alive to see it

Cthulhu_ · 2025-10-21T08:35:26 1761035726

I'm sure we need to go to Blackwatch Plaid first.

subpar · 2025-10-20T20:19:36 1760991576

obligatory https://x.com/lintzston/status/791761626890469377

belter · 2025-10-20T20:42:59 1760992979

Published in the same week of October ...9 years ago ...Spooky...

decimalenough · 2025-10-20T19:30:12 1760988612

I used to work at a company where the SLA was measured as the percentage of successful requests on the server. If the load balancer (or DNS or anything else network) was dropping everything on the floor, you'd have no 500s and 100% SLA compliance.

conductr · 2025-10-20T19:15:53 1760987753

Similar to hosting your support ticketing system with same infra. "What problem? Nobody's complaining"

hinkley · 2025-10-20T23:40:23 1761003623

I’ve been customer for at least four separate products where this was true.

I can’t explain why Saucelabs was the most grating one, but it was. I think it’s because they routinely experienced 100% down for 1% of customers, and we were in that one percent about twice a year. <long string of swears omitted>

bigiain · 2025-10-20T22:11:14 1760998274

I spent enough time ~15 years back to find an external monitoring service that did not run on AWS and looked like a sustainable business instead of a VC fueled acquisition target - for our belts-n-braces secondary monitoring tool since it's not smart to trust CloudWatch to be able to send notifications when it's AWS's shit that's down.

Sadly while I still use that tool a couple of jobs/companies later - I no longer recommend it because it migrated to AWS a few years back.

(For now, my out-of-AWS monitoring tool is a bunch of cron jobs running on a collections of various inexpensive vpses and my and other dev's home machines.)

6031769 · 2025-10-21T10:28:01 1761042481

Nagios is still a thing and you can host it wherever you like.

bigiain · 2025-10-22T03:21:57 1761103317

Interestingly, the reason I originally looked for and started using it was an unapproved "shadow IT" response to an in-house Nagios setup that was configured and managed so badly it had _way_ more downtime than any of the services I'd get shouted about at if customers noticed them down before we did...

(No disrespect to Nagios, I'm sure a competently managed installation is capable of being way better than what I had to put up with.)

AbstractH24 · 2025-10-21T14:39:15 1761057555

If its not on the dashboard, it didn't happen

echelon · 2025-10-20T18:20:36 1760984436

Common SLA windows are hour, day, week, month, quarter, and year. They're out of SLA for all of those now.

When your SLA holds within a joke SLA window, you know you goofed.

"Five nines, but you didn't say which nines. 89.9999...", etc.

SlightlyLeftPad · 2025-10-20T20:11:25 1760991085

These are typically calculated system-wide, so if you include all regions, technically only a fraction of customers are impacted.

alkhimey · 2025-10-20T20:38:55 1760992735

Customers in all regions were affected…

prmoustache · 2025-10-20T21:48:32 1760996912

Indirectly yes but not directly.

Our only impact was some atlassian tools.

captainkrtek · 2025-10-20T19:31:41 1760988701

I shoot for 9 fives of availability.

dare944 · 2025-10-21T03:38:44 1761017924

5555.55555% Really stupendous availableness!!!

president_zippy · 2025-10-20T23:01:44 1761001304

I see what you did there, mister :P

hamburglar · 2025-10-20T18:42:06 1760985726

I prefer shooting for eight eights.

decimalenough · 2025-10-20T19:30:58 1760988658

You mean nine fives.

Veserv · 2025-10-20T17:41:21 1760982081

You added a zero. There are ~8760 hours per year, so 8 hours is ~1 in 1000, 99.9%.

nine_k · 2025-10-20T22:01:02 1760997662

An outage like this does not happen every year, The last big outage happened in December 2021, roughly 3 years 10 month = 46 months ago.

The duration of the outage in relation to that uptime is (8 h / 33602 h) * 100% = 0.024%, so the uptime is 99.976%, slightly worse than 99.99%, but clearly better than 99.90%.

They used to be five nines, and people used to say that it's not worth the while to prepare for an outage. With less than four nines, the perception might shift, but likely not enough to induce a mass migration to outage-resistant designs.

hinkley · 2025-10-20T23:44:55 1761003895

Won’t the end result be people keeping more servers warm in other AWS regions which means Amazon profits from their own fuckups?

pinkgolem · 2025-10-21T02:29:09 1761013749

There was a pretty big outage 2023

markus_zhang · 2025-10-20T18:00:33 1760983233

Oh you are right!

codeduck · 2025-10-20T17:25:49 1760981149

I'm sure they'll find some way to weasel out of this.

elchananHaas · 2025-10-20T19:32:37 1760988757

For DynamoDB, I'm not sure but I think its covered. https://aws.amazon.com/dynamodb/sla/. "An "Error" is any Request that returns a 500 or 503 error code, as described in DynamoDB". There were tons of 5XX errors. In addition, this calculation uses percentage of successful requests, so even partial degradation counts against the SLA.

From reading the EC2 SLA I don't think this is covered. https://aws.amazon.com/compute/sla/

The reason is the SLA says "For the Instance-Level SLA, your Single EC2 Instance has no external connectivity.". Instances that were already created kept working, so this isn't covered. The SLA doesn't cover creation of new instances.

alex_young · 2025-10-20T18:27:37 1760984857

It's not down time, it's degradation. No outage, just degradation of a fraction[0] of the resources.

[0] Fraction is ~ 1

indoordin0saur · 2025-10-20T19:29:57 1760988597

This 100% seems to be what they're saying. I have not been able to get a single Airflow task to run since 7 hours ago. Being able to query Redshift only recently came back online. Despite this all their messaging is that the downtime was limited to some brief period early this morning and things have been "coming back online". Total lie, it's been completely down for the entire business day here on the east coast.

randomname11 · 2025-10-20T21:58:43 1760997523

We continue to see early signs of progress!

Keyframe · 2025-10-20T19:32:34 1760988754

It doesn't count. It's not downtime, it's unscheduled maintenance event.

8organicbits · 2025-10-20T19:12:32 1760987552

Check the terms of your contract. The public terms often only offer partial service credit refunds, if you ask for it, via a support request.

hinkley · 2025-10-20T23:46:20 1761003980

If you aren’t making $10 for every dollar you pay Amazon you need to look at your business model.

The refund they give you isn’t going to dent lost revenue.

hinkley · 2025-10-20T23:38:31 1761003511

Where were you guys the other day when someone was calling me crazy for trying to make this same sort of argument?

abraae · 2025-10-20T20:34:46 1760992486

I haven't done any RFP responses for a while but this question always used to make me furious. Our competitors (some of who had had major incidents in the past) claimed 99.99% availability or more, knowing they would never have to prove it, and knowing they were actually 100% until the day they weren't.

We were more honest, and it probably cost us at least once in not getting business.

d1sxeyes · 2025-10-20T21:35:15 1760996115

An SLA is a commitment, and an RFP is a business document, not a technical one. As an MSP, you don’t think in terms of “what’s our performance”, you think of “what’s the business value”.

If you as a customer ask for 5 9s per month, with service credit of 10% of at-risk fees for missing on a deal where my GM is 30%, I can just amortise that cost and bake it into my fee.

procaryote · 2025-10-21T07:22:56 1761031376

it's a matter of perspective... 9.9999% is real easy

dgoldstein0 · 2025-10-21T07:33:29 1761032009

Only if you remember to spend your unavailability budget

hvb2 · 2025-10-20T18:02:10 1760983330

It's a single region?

I don't think anyone would quote availability as availability in every region I'm in?

While this is their most important region, there's a lot of clients that are probably unaffected if they're not in use1.

They COULD be affected even if they don't have anything there because of the AWS services relying on it. I'm just saying that most customers that are multi region should have their east region out and are just humming along.

reactordev · 2025-10-20T18:21:40 1760984500

It’s THE region. All of AWS operates out of it. All other regions bow before it. Even the government is there.

idontwantthis · 2025-10-20T18:33:09 1760985189

"The Cloud" is just a computer that you don't own that's located in Reston, VA.

reactordev · 2025-10-20T19:29:02 1760988542

Facts.

hinkley · 2025-10-20T23:47:42 1761004062

The Rot Starts at the Head.

derektank · 2025-10-20T19:40:01 1760989201

AWS GovCloud East is actually located in Ohio IIRC. Haven't had any issues with GovCloud West today; I'm pretty sure they're logically separated from the commercial cloud.

vasco · 2025-10-20T19:09:18 1760987358

> All of AWS operates out of it.

I don't think this is true anymore. In the early days bad enough outages in us-east-1 would bring down everything because some metadata / control pane stuff was there, I remember getting affected while in other regions, but there's been many years since this has happened.

Today for example no issues. I just avoid us-east-1 and everyone else should to. It's their worst region by far in terms of reliability because they launch all the new stuff there and are always messing it up.

Root_Denied · 2025-10-20T19:47:46 1760989666

A secondary problem is that a lot of the internal tools are still on US East, so likely the response work is also being impacted by the outage. Been a while since there was a true Sev1 LSE (Large Scale Event).

phinnaeus · 2025-10-20T20:49:53 1760993393

What the heck? Most internal tools were in Oregon when I worked in BT pre 2021.

Root_Denied · 2025-10-21T22:10:18 1761084618

The primary ticketing system was up and down apparently, so tcorp/SIM must still have critical components there.

reactordev · 2025-10-20T19:29:37 1760988577

tell me it isn't true while telling me there isn't an outage across AWS because us-east-1 is down...

vasco · 2025-10-20T19:46:05 1760989565

I help run quite a big operation in a different region and had zero issues. And this has happened many times before.

reactordev · 2025-10-20T20:52:39 1760993559

If that were true, you’d be seeing the same issues we are in us-west-1 as well. Cheers.

alkhimey · 2025-10-20T20:48:43 1760993323

Global services such as STS have regional endpoints, but is it really that common to hit specific endpoint rather than use the default?

hamburglar · 2025-10-20T18:43:43 1760985823

The regions are independent, so you measure availability for each on its own.

logifail · 2025-10-20T19:08:31 1760987311

Except if they aren't quite as independent as people thought

hamburglar · 2025-10-20T19:43:58 1760989438

Well that’s the default pattern anyway. When I worked in cloud there were always some services that needed cross-regional dependencies for some reason or other and this was always supposed to be called out as extra risk, and usually was. But as things change in a complex system, it’s possible for long-held assumptions about independence to change and cause subtle circular dependencies that are hard to break out of. Elsewhere in this thread I saw someone mentioning being migrated to auth that had global dependencies against their will, and I groaned knowingly. Sometimes management does not accept “this is delicate and we need to think carefully” in the midst of a mandate.

I do not envy anyone working on this problem today.

oxfordmale · 2025-10-20T20:01:55 1760990515

But is is a partial outage only, so it doesn't count. If you retry a million times everything still works /s

outworlder · 2025-10-20T16:51:27 1760979087

I'm wondering why your and other companies haven't just evicted themselves from us-east-1. It's the worst region for outages and it's not even close.

Our company decided years ago to use any region other than us-east-1.

Of course, that doesn't help with services that are 'global', which usually means us-east-1.

andrewl-hn · 2025-10-20T19:40:09 1760989209

Several reasons, really:

1. The main one: it's the cheapest region, so when people select where to run their services they pick it because "why pay more?"

2. It's the default. Many tutorials and articles online show it in the examples, many deployment and other devops tools use it as a default value.

3. Related to n.2. AI models generate cloud configs and code examples with it unless asked otherwise.

4. It's location make it Europe-friendly, too. If you have a small service and you'd like to capture European and North American audience from a single location us-east-1 is a very good choice.

5. Many Amazon features are available in that region first and then spread out to other locations.

6. It's also a region where other cloud providers and hosting companies offer their services. Often there's space available in a data center not far from AWS-running racks. In hybrid cloud scenarios where you want to connect bits of your infrastructure running on AWS and on some physical hardware by a set of dedicated fiber optic lines us-east-1 is the place to do it.

7. Yes, for AWS deployments it's an experimental location that has higher risks of downtime compared to other regions, but in practice when a sizable part of us-east-1 is down other AWS services across the world tend to go down, too (along with half of the internet). So, is it really that risky to run over there, relatively speaking?

It's the world's default hosting location, and today's outages show it.

derefr · 2025-10-20T20:24:00 1760991840

> it's the cheapest region

In every SKU I've ever looked at / priced out, all of the AWS NA regions have ~equal pricing. What's cheaper specifically in us-east-1?

> Europe-friendly

Why not us-east-2?

> Many Amazon features are available in that region first and then spread out to other locations.

Well, yeah, that's why it breaks. Using not-us-east-1 is like using an LTS OS release: you don't get the newest hotness, but it's much more stable as a "build it and leave it alone" target.

> It's also a region where other cloud providers and hosting companies offer their services. Often there's space available in a data center not far from AWS-running racks.

This is a better argument, but in practice, it's very niche — 2-5ms of speed-of-light delay doesn't matter to anyone but HFT folks; anyone else can be in a DC one state away with a pre-arranged tier1-bypassing direct interconnect, and do fine. (This is why OVH is listed on https://www.cloudinfrastructuremap.com/ despite being a smaller provider: their DCs have such interconnects.)

For that matter, if you want "low-latency to North America and Europe, and high-throughput lowish-latency peering to many other providers" — why not Montreal [ca-central-1]? Quebec might sound "too far north", but from the fiber-path perspective of anywhere else in NA or Europe, it's essentially interchangeable with Virginia.

leptons · 2025-10-20T20:31:20 1760992280

Lots of stuff is priced differently.

Just go to the EC2 pricing page and change from us-east-1 to us-west-1

https://aws.amazon.com/ec2/pricing/on-demand/

luhn · 2025-10-20T20:56:34 1760993794

us-west-1 is the one outlier. us-east-1, us-east-2, and us-west-2 are all priced the same.

leptons · 2025-10-23T15:32:43 1761233563

There are many other AWS regions than the ones you listed, and many different prices.

AbstractH24 · 2025-10-21T14:42:49 1761057769

This seems like a flaw Amazon needs to fix.

Incentivize the best behaviors.

Or is there a perspective I don't see?

leptons · 2025-10-21T20:31:41 1761078701

How is it a flaw!? Building datacenters in different regions come with very different costs, and different costs to run. Power doesn't cost exactly the same in different regions. Local construction services are not priced exactly the same everywhere. Insurance, staff salaries, etc, etc... it all adds up, and it's not the same costs everywhere. It only makes sense that it would cost different amounts for the services run in different regions. Not sure how you're missing these easy to realize facts of life.

AbstractH24 · 2025-10-22T01:50:24 1761097824

I think the cost of a day like Monday due to over relying on a single location outweighs that

leptons · 2025-10-22T05:47:30 1761112050

What happened on Monday has nothing to do with why services cost different prices in different regions.

AbstractH24 · 2025-10-22T15:39:20 1761147560

No, but it does reflect the dangers of incentivizing everyone to use a single region.

Most people (myself include) only choose it because its the cheapest. If multiple regions were the same price then there'd be less impact if one goes down.

leptons · 2025-10-22T19:26:12 1761161172

The problems with us-east-1 have been apparent for a long time, many years. Once I started using us-east-1 long ago, and seeing the problems there, I moved everything to us-west-1 and stopped having those problems. EC2 instances were completely unreliable in us-east-1 (we were running hundreds to thousands at a time), not so in us-west-1. The error rates we were seeing were awful in us-east-1.

A negligible cost difference shouldn't matter when your apps are unstable due to the region being problematic.

AbstractH24 · 2025-10-22T19:53:33 1761162813

> A negligible cost difference shouldn't matter when your apps are unstable due to the region being problematic.

agreed, but a sizable cohort of people don't have the foresight or incentives for think past their nose and clicking the cheapest option.

So its on Amazon to incentivize what's best.

leptons · 2025-10-23T18:38:37 1761244717

People's lack of curiosity, enough to not even explore the other options, is not Amazon's problem.

dclowd9901 · 2025-10-20T21:19:48 1760995188

> 5. Many Amazon features are available in that region first and then spread out to other locations.

This is the biggest one isn't it? I thought Route 53 isn't even available on any other region.

jedberg · 2025-10-20T17:16:13 1760980573

Some AWS services are only available in us-east-1. Also a lot of people have not built their infra to be portable and the occasional outage isn't worth the cost and effort of moving out.

bartread · 2025-10-20T20:16:31 1760991391

> the occasional outage isn't worth the cost and effort of moving out.

And looked at from the perspective of an individual company, as a customer of AWS, the occasional outage is usually an acceptable part of doing business.

However, today we’ve seen a failure that has wiped out a huge number of companies used by hundreds of millions - maybe billions - of people, and obviously a huge number of companies globally all at the same time. AWS has something like 30% of the infra market so you can imagine, and most people reading this will to some extent have experienced, the scale of disruption.

And the reality is that whilst bigger companies, like Zoom, are getting a lot of the attention here, we have no idea what other critical and/or life and death services might have been impacted. As an example that many of us would be familiar with, how many houses have been successfully burgled today because Ring has been down for around 8 out of the last 15 hours (at least as I measure it)?

I don’t think that’s OK, and I question the wisdom of companies choosing AWS as their default infra and hosting provider. It simply doesn’t seem to be very responsible to be in the same pond as so many others.

Were I a legislator I would now be casting a somewhat baleful eye at AWS as a potentially dangerous monopoly, and see what I might be able to do to force organisations to choose from amongst a much larger pool of potential infra providers and platforms, and I would be doing that because these kinds of incidents will only become more serious as time goes on.

jedberg · 2025-10-20T21:50:07 1760997007

You're suffering from survivorship bias. You know that old adage about the bullet holes in the planes, and someone pointed out that you should reinforce that parts without bullet holes, because these are the planes that came back.

It's the same thing here. Do you think other providers are better? If people moved to other providers, things would still go down, more likely than not it would be more downtime in aggregate, just spread out so you wouldn't notice as much.

At least this way, everyone knows why it's down, our industry has developed best practices for dealing with these kinds of outages, and AWS can apply their expertise to keeping all their customers running as long as possible.

Perseids · 2025-10-21T06:29:58 1761028198

> If people moved to other providers, things would still go down, more likely than not it would be more downtime in aggregate, just spread out so you wouldn't notice as much.

That is the point, though: Correlated outages are worse than uncorrelated outages. If one payment provider has an outage, chose another card or another store and you can still buy your goods. If all are down, no one can shop anything[1]. If a small region has a power blackout, all surrounding regions can provide emergency support. If the whole country has a blackout, all emergency responders are bound locally.

[1] Except with cash – might be worth to keep a stash handy for such purposes.

bartread · 2025-10-22T17:10:22 1761153022

Yeah, exactly this. I don’t know why the person who responded to me is talking about survivorship bias… and I suppose I don’t really care because there’s a bigger point.

The internet was originally intended to be decentralised. That decentralisation begets resilience.

That’s exactly the opposite of what we saw with this outage. AWS has give or take 30% of the infra market, including many nationally or globally well known companies… which meant the outage caused huge global disruption of services that many, many people and organisations use on a day to day basis.

Choosing AWS, squinted at through a somewhat particular pair of operational and financial spectacles, can often make sense. Certainly it’s a default cloud option in many orgs, and always in contention to be considered by everyone else.

But my contention is that at a higher level than individual orgs - at a societal level - that does not make sense. And it’s just not OK for government and business to be disrupted on a global scale because one provider had a problem. Hence my comment on legislators.

It is super weird to me that, apparently, that’s an unorthodox and unreasonable viewpoint.

But you’ve described it very elegantly: 99.99% (or pick the number of 9s you want) uptime with uncorrelated outages is way better than that same uptime with correlated, and particularly heavily correlated, outages.

bartread · 2025-10-20T23:38:54 1761003534

That’s a pretty bold claim. Where’s your data to back it up?

More importantly you appear to have misunderstood the scenario I’m trying to avoid, which is the precise situation we’ve seen in the past 24 hours where a very large proportion of internet services go down all at the same time precisely because they’re all using the same provider.

And then finally the usual outcome of increased competition is to improve the quality of products and services.

I am very aware of the WWII bomber story, because it’s very heavily cited in corporate circles nowadays, but I don’t see that it has anything to do with what I was talking about.

AWS is chosen because it’s an acceptable default that’s unlikely to be heavily challenged either by corporate leadership or by those on the production side because it’s good CV fodder. It’s the “nobody gets fired for buying IBM” of the early mid-21st century. That doesn’t make it the best choice though: just the easiest.

And viewed at a level above the individual organisation - or, perhaps from the view of users who were faced with failures across multiple or many products and services from diverse companies and organisations - as with today (yesterday!) we can see it’s not the best choice.

mk89 · 2025-10-21T04:00:58 1761019258

This is an assumption.

Reality is, though, that you shouldn't put all your eggs in the same basket. And it was indeed the case before the cloud. One service going down would have never had this cascade effect.

I am not even saying "build your own DC", but we barely have resiliency if we all rely on the same DC. That's just dumb.

ytpete · 2025-10-21T00:33:55 1761006835

From the standpoint of nearly every individual company, it's still better to go with a well-known high-9s service like AWS than smaller competitors though. The fact that it means your outages will happen at the same time as many others is almost like a bonus to that decision — your customers probably won't fault you for an outage if everyone else is down too.

That homogeneity is a systemic risk that we all bear, of course. It feels like systemic risks often arise that way, as an emergent result from many individual decisions each choosing a path that truly is in their own best interests.

bartread · 2025-10-22T17:22:12 1761153732

Yeah, but this is exactly not what the internet is supposed to be. It’s supposed to be decentralised. It’s supposed to be resilient.

And at this point I’m looking at the problem and thinking, “how do we do that other than by legislating?”

Because left to their own devices a concerningly large number of people across many, many organisations simply follow the herd.

In the midst of a degrading global security situation I would have thought it would be obvious why that’s a bad idea.

twistedpair · 2025-10-20T17:19:15 1760980755

Services like SES Inbound are only available in 2x US regions. AWS isn't great about making all services available in all regions :/

zamalek · 2025-10-20T17:40:30 1760982030

We're on Azure and they are worse in every aspect, bad deployment of services, and status pages that are more about PR than engineering.

At this point, is there any cloud provider that doesn't have these problems? (GCP is a non-starter because a false-positive YouTube TOS violation get you locked out of GCP[1]).

[1]: https://9to5google.com/2021/02/26/stadia-port-of-terraria-ca...

tapoxi · 2025-10-20T19:11:55 1760987515

Don't worry there was a global GCP outage a few months ago

ecshafer · 2025-10-20T18:01:49 1760983309

Global auth is and has been a terrible idea.

derefr · 2025-10-20T20:48:50 1760993330

[flagged]

hshdhdhj4444 · 2025-10-20T21:03:39 1760994219

That’s an incredibly long comment that does nothing to explain why a YouTube ToS violation should lead to someone’s GCP services being cut off.

Also, Steve Jobs already wrote your comment better. You should have just stolen it. “You’re holding it wrong”.

derefr · 2025-10-20T21:28:53 1760995733

[flagged]

zamalek · 2025-10-20T22:08:07 1760998087

Are you warned about the risks in an active war one? Yes.

Does Google warn you about this when you sign up? No.

And PayPal having the same problem in no way identifies Google. It just means that PayPal has the same problem and they are also incompetent (and they also demonstrate their incompetence in many other ways).

WesolyKubeczek · 2025-10-21T06:38:41 1761028721

s/in no way identifies Google/in no way indemnifies Google/

Sorry

zamalek · 2025-10-21T07:26:34 1761031594

> Sorry

No, thank you.

derefr · 2025-10-21T19:03:32 1761073412

> It just means that PayPal has the same problem and they are also incompetent

Do you consider regular brick-and-mortar savings banks to be incompetent when they freeze someone's personal account for receiving business amounts of money into it? Because they all do, every last one. Because, again, they expect you to open a business account if you're going to do business; and they look at anything resembling "business transactions" happening in a personal account through the lens of fraud rather than the lens of "I just didn't realize I should open a business account."

And nobody thinks this is odd, or out-of-the-ordinary.

Do you consider municipal governments to be incompetent when they tell people that they have to get their single-family dwelling rezoned as mixed-use, before they can conduct business out of it? Or for assuming that anyone who is conducting business (having a constant stream of visitors at all hours) out of a residentially-zoned property, is likely engaging in some kind of illegal business (drug sales, prostitution, etc) rather than just being a cafe who didn't realize you can't run a cafe on residential zoning?

If so, I don't think many people would agree with you. (Most would argue that municipal governments suppress real, good businesses by not issuing the required rezoning permits, but that's a separate issue.)

There being an automatic level of hair-trigger suspicion against you on the part of powerful bureaucracies — unless and until you proactively provide those bureaucracies enough information about yourself and your activities for the bureaucracies to form a mental model of your motivations that makes your actions predictable to them — is just part of living in a society.

Heck, it's just a part of dealing with people who don't know you. Anthropologists suggest that the whole reason we developed greeting gestures like shaking hands (esp. the full version where you pull each-other in and use your other arms to pat one-another on the back) is to force both parties to prove to the other that they're not holding a readied weapon behind their backs.

---

> Are you warned about the risks in an active war one? Yes. Does Google warn you about this when you sign up? No.

As a neutral third party to a conflict, do you expect the parties in the conflict to warn you about the risks upon attempting to step into the war zone? Do you expect them to put up the equivalent of police tape saying "war zone past this point, do not cross"?

This is not what happens. There is no such tape. The first warning you get from the belligerents themselves of getting near either side's trenches in an active war zone, is running face-first into the guarded outpost/checkpoint put there to prevent flanking/supply-chain attacks. And at that point, you're already in the "having to talk yourself out of being shot" point in the flowchart.

It has always been the expectation that civilian settlements outside of the conflict zone will act of their own volition to inform you of the danger, and stop you from going anywhere near the front lines of the conflict. By word-of-mouth; by media reporting in newspapers and on the radio; by municipal governments putting up barriers preventing civilians from even heading down roads that would lead to the war zone. Heck, if a conflict just started "up the road", and you're going that way while everyone's headed back the other way, you'll almost always eventually be flagged to pull over by some kind stranger who realizes you might not know, and so wants to warn you that the only thing you'll get by going that way is shot.

---

Of course, this is all just a metaphor; the "war" between infrastructure companies and malicious actors is not the same kind of hot war with two legible "sides." (To be pedantic, it's more like the "war" between an incumbent state and a constant stream of unaffiliated domestic terrorists, such as happens during the ongoing only-partially-successful suppression of a populist revolution.)

But the metaphor holds: just like it's not a military's job to teach you that military forces will suspect that you're a spy if you approach a war zone in plainclothes; and just like it's not a bank's job to teach you that banks will suspect that you're a money launderer if you start regularly receiving $100k deposits into your personal account; and just like it's not a city government's job to teach you that they'll suspect you're running a bordello out of your home if you have people visiting your residentially-zoned property 24hrs a day... it's not Google's job to teach you that the world is full of people that try to abuse Internet infrastructure to illegal ends for profit; and that they'll suspect you're one of those people, if you just show up with your personal Google account and start doing some of the things those people do.

Rather, in all of these cases, it is the job of the people who teach you about life — parents, teachers, business mentors, etc — to explain to you the dangers of living in society. Knowing to not use your personal account for business, is as much a component of "web safety" as knowing to not give out details of your personal identity is. It's "Internet literacy", just like understanding that all news has some kind of bias due to its source is "media literacy."

zamalek · 2025-10-22T00:37:33 1761093453

You may not be aware of this, but Paypal is unregulated. They can, and have, overreached. This is very different from a bank who has regulations to follow, some of which protect the consumer from the whims of the bank.

hosh · 2025-10-20T22:17:14 1760998634

I appreciate this long comment.

I am in the middle of convincing the company I just joined to consider building on GCP instead of AWS (at the very least, not to default to AWS).

fragmede · 2025-10-20T18:54:17 1760986457

If you can't figure out how to use a different Google account for YouTube from the GCP billing account, I don't know what to say. Google's in the wrong here, but spanner's good shit! (If you can afford it. and you actually need it. you probably don't.)

zamalek · 2025-10-20T19:29:00 1760988540

The problem isn't specifically getting locked out of GCP (though it is likely to happen for those out of the loop on what happened). It is that Google themselves can't figure out that a social media ban shouldn't affect your business continuity (and access to email or what-have-you).

It is an extremely fundamental level of incompetence at Google. One should "figure out" the viability of placing all of one's eggs in the basket of such an incompetent partner. They screwed the authentication issue up and, this is no slippery slope argument, that means they could be screwing other things up (such as being able to contact a human for support, which is what the Terraria developer also had issues with).

kondro · 2025-10-20T17:22:47 1760980967

One of those still isn’t us-east-1 though and email isn’t latency-bound.

hvb2 · 2025-10-20T18:03:56 1760983436

Except for OTP codes when doing 2fa in auth

kondro · 2025-10-20T21:00:29 1760994029

100ms isn’t going to make a difference to email-based OTP.

Also, who’s using email-based OTP?

shermantanktop · 2025-10-20T18:30:17 1760985017

Same calculation everyone makes but that doesn’t stop them from whining about AWS being less than perfect.

indoordin0saur · 2025-10-20T19:41:27 1760989287

We have discussions coming up to evict ourselves from AWS entirely. Didn't seem like there was much of an appetite for it before this but now things might have changed. We're still small enough of a company to where the task isn't as daunting as it might otherwise be.

sleepybrett · 2025-10-20T17:21:00 1760980860

So did a previous company i worked at, all our stuff was in west-2.. then east-1 went down and some global backend services that aws depended on also went down and effected west-2.

I'm not sure a lot of companies are really looking at the costs of multi-region resiliency and hot failovers vs being down for 6 hours every year or so and writing that check.

DrBenCarson · 2025-10-20T17:32:09 1760981529

Yep. Many, many companies are fine saying “we’re going to be no more available than AWS is.”

frankchn · 2025-10-20T17:55:09 1760982909

Customers are generally a lot more understanding if half the internet goes down at the same time as you.

chrisweekly · 2025-10-20T18:02:57 1760983377

Yes, and that's a major reason so many just use us-east-1.

lordnacho · 2025-10-20T17:02:28 1760979748

Is there some reason why "global" services aren't replicated across regions?

I would think a lot of clients would want that.

JoshTriplett · 2025-10-20T18:06:31 1760983591

> Is there some reason why "global" services aren't replicated across regions?

On AWS's side, I think us-east-1 is legacy infrastructure because it was the first region, and things have to be made replicable.

For others on AWS who aren't AWS themselves: because AWS outbound data transfer is exorbitantly expensive. I'm building on AWS, and AWS's outbound data transfer costs are a primary design consideration for potential distribution/replication of services.

me551ah · 2025-10-20T18:22:33 1760984553

It is absolutely crazy how much AWS charges for data. Internet access in general has become much cheaper and Hetzner gives unlimited AWS. I don't recall AWS ever decreasing prices for outbound data transfer

Sanzig · 2025-10-20T19:57:36 1760990256

I think there's two reasons: one, it makes them gobs of money. Two, it discourages customers from building architectures which integrate non-AWS services, because you have to pay the data transfer tax. This locks everyone in.

And yes, AWS' rates are highway robbery. If you assume $1500/mo for a 10 Gbps port from a transit provider, you're looking at $0.0005/GB with a saturated link. At a 25% utilization factor, still only $0.002/GB. AWS is almost 50 times that. And I guarantee AWS gets a far better rate for transit than list price, so their profit margin must be through the roof.

JoshTriplett · 2025-10-20T20:28:04 1760992084

> I think there's two reasons: one, it makes them gobs of money. Two, it discourages customers from building architectures which integrate non-AWS services, because you have to pay the data transfer tax. This locks everyone in.

Which makes sense, but even their rates for traffic between AWS regions are still exorbitant. $0.10/GB for transfer to the rest of the Internet somewhat discourages integration of non-Amazon services (though you can still easily integrate with any service where most of your bandwidth is inbound to AWS), but their rates for bandwidth between regions are still in the $0.01-0.02/GB range, which discourages replication and cross-region services.

If their inter-region bandwidth pricing was substantially lower, it'd be much easier to build replicated, highly available services atop AWS. As it is, the current pricing encourages keeping everything within a region, which works for some kinds of services but not others.

mnutt · 2025-10-21T12:23:58 1761049438

Even their transfer rates between AZs _in the same region_ are expensive, given they presumably own the fiber?

This aligns with their “you should be in multiple AZs” sales strategy, because self-hosted and third-party services can’t replicate data between AZs without expensive bandwidth costs, while their own managed services (ElastiCache, RDS, etc) can offer replication between zones for free.

immibis · 2025-10-20T22:03:20 1760997800

Hetzner is "unlimited fair use" for 1Gbps dedicated servers, which means their average cost is low enough to not be worth metering, but if you saturate your 1Gbps for a month they will force you to move to metered. Also 10Gbps is always metered. Metered traffic is about $1.50 per TB outbound - 60 times cheaper than AWS - and completely free within one of their networks, including between different European DCs.

In general it seems like Europe has the most internet of anywhere - other places generally pay to connect to Europe, Europe doesn't pay to connect to them.

zikduruqe · 2025-10-20T18:59:48 1760986788

"Is there some reason why "global" services aren't replicated across regions?"

us-east-1 is so the government to slurp up all the data. /tin-foil hat

rhplus · 2025-10-20T17:24:29 1760981069

Data residency laws may be a factor in some global/regional architectures.

lordnacho · 2025-10-20T17:51:36 1760982696

So provide a way to check/uncheck which zones you want replication to. Most people aren't going to need more than a couple of alternatives, and they'll know which ones will work for them legally.

elchananHaas · 2025-10-20T18:21:58 1760984518

My guess is that for IAM it has to do with consistency and security. You don't want regions disagreeing on what operations are authorized. I'm sure the data store could be distributed, but there might be some bad latency tradeoffs.

The other concerns could have to do with the impact of failover to the backup regions.

belter · 2025-10-20T18:43:24 1760985804

Regions disagree on what operations are authorized. :-) IAM uses eventual consistency. As it should...

"Changes that I make are not always immediately visible": - "...As a service that is accessed through computers in data centers around the world, IAM uses a distributed computing model called eventual consistency. Any changes that you make in IAM (or other AWS services), including attribute-based access control (ABAC) tags, take time to become visible from all possible endpoints. Some delay results from the time it takes to send data from server to server, replication zone to replication zone, and Region to Region. IAM also uses caching to improve performance, but in some cases this can add time. The change might not be visible until the previously cached data times out...

...You must design your global applications to account for these potential delays. Ensure that they work as expected, even when a change made in one location is not instantly visible at another. Such changes include creating or updating users, groups, roles, or policies. We recommend that you do not include such IAM changes in the critical, high availability code paths of your application. Instead, make IAM changes in a separate initialization or setup routine that you run less frequently. Also, be sure to verify that the changes have been propagated before production workflows depend on them..."

https://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoo...

bcrosby95 · 2025-10-20T17:04:40 1760979880

Global replication is hard and if they weren't designed with that in mind its probably a whole lot of work.

ineedasername · 2025-10-20T17:14:51 1760980491

I thought part of the point of using AWS was that such things were pretty much turnkey?\

elchananHaas · 2025-10-20T17:55:44 1760982944

Mostly AWS relies on each region being its own isolated copy of each service. It gets tricky when you have globalized services like IAM. AWS tries to keep those to a minimum.

oofbey · 2025-10-21T15:21:53 1761060113

One advantage to being in the biggest region: when it goes down the headlines all blame AWS, not you. Sure you’re down too, but absolutely everybody knows why and few think it’s your fault.

nijave · 2025-10-20T23:09:06 1761001746

For us, we had some minor impacts but most stuff was stable. Our bigger issue was 3rd party SaaS also hosted on us-east-1 (Snowflake and CircleCI) which broke CI and our data pipeline

Eridrus · 2025-10-21T05:29:52 1761024592

This was a major issue, but it wasn't a total failure of the region.

Our stuff is all in us-east-1, ops was a total shitshow today (mostly because many 3rd party services besides aws were down/slow), but our prod service was largely "ok", a total of <5% of customers were significantly impacted because existing instances got to keep running.

I think we got a bit lucky, but no actual SLAs were violated. I tagged the postmortem as Low impact despite the stress this caused internally.

We definitely learnt something here about both our software and our 3rd party dependencies.

perching_aix · 2025-10-20T18:14:58 1760984098

cheapest + has the most capacity

throwaway-aws9 · 2025-10-20T18:26:33 1760984793

You have to remember that health status dashboards at most (all?) cloud providers require VP approval to switch status. This stuff is not your startup's automated status dashboard. It's politics, contracts, money.

hinkley · 2025-10-20T23:59:23 1761004763

Which makes them a flat out lie since it ceases to be a dashboard if it’s not live. It’s just a status page.

PeterCorless · 2025-10-20T16:50:39 1760979039

Downdetector had 5,755 reports of AWS problems at 12:52 AM Pacific (3:53 AM Eastern).

That number had dropped to 1,190 by 4:22 AM Pacific (7:22 AM Eastern).

However, that number is back up with a vengeance. 9,230 reports as of 9:32 AM Pacific (12:32 Eastern).

Part of that could be explained by more people making reports as the U.S. west coast awoke. But I also have a feeling that they aren't yet on top of the problem.

rogerrogerr · 2025-10-20T17:13:20 1760980400

Where do they source those reports from? Always wondered if it was just analysis of how many people are looking at the page, or if humans somewhere are actually submitting reports.

SteveNuts · 2025-10-20T17:41:39 1760982099

It turns out that a bunch of people checking if "XYZ is down" is a pretty good heuristic for it actually being down. It's pretty clever I think.

jedberg · 2025-10-20T17:14:52 1760980492

It's both. They count a hit from google as a report of that site being down. They also count that actual reports people make.

hunter2_ · 2025-10-20T18:16:20 1760984180

So if my browser auto-completes their domain name and I accept that (causing me to navigate directly to their site and then I click AWS) it's not a report; but if my browser doesn't or I don't accept it (because I appended "AWS" after their site name) causing me to perform a Google search and then follow the result to the AWS page on their site, it's a report? That seems too arbitrary... they should just count the fact that I went to their AWS page regardless of how I got to it.

jedberg · 2025-10-20T18:24:47 1760984687

I don't know the exact details, but I know that hits to their website do count as reports, even if you don't click "report". I assume they weight it differently based on how you got there (direct might actually be more heavily weighted, at least it would be if I was in charge).

mjrpes · 2025-10-20T16:54:40 1760979280

Down detector agrees: https://downdetector.com/status/amazon/

Amazon says service is now just "degraded" and recovering, but searching for products on Amazon.com still does not work for me. https://health.aws.amazon.com/health/status

ilamont · 2025-10-20T17:08:53 1760980133

Search, Seller Central, Amazon Advertising not working properly for me. Attempting to access from New York.

When this is fixed, I am very interested in seeing recorded spend for Sunday and Monday.

ayhanfuat · 2025-10-20T17:44:18 1760982258

Amazon Ads is down indeed https://status.ads.amazon.com/

belter · 2025-10-20T18:38:53 1760985533

This looks like one their worst outage in 15 years and us-east-1 still shows as degraded but I had no outages, as dont use us-east-1. Are you seeing issues on other regions?

https://health.aws.amazon.com/health/status?path=open-issues

The closest to their identification of a root cause seems to be this one:

"Oct 20 8:43 AM PDT We have narrowed down the source of the network connectivity issues that impacted AWS Services. The root cause is an underlying internal subsystem responsible for monitoring the health of our network load balancers. We are throttling requests for new EC2 instance launches to aid recovery and actively working on mitigations."

hinkley · 2025-10-22T18:42:35 1761158555

I wonder how many people discovered their autoscaling settings went batshit when services went offline, either scaling way down or way up, or went metastable and started fishtailing.

jread · 2025-10-20T21:08:53 1760994533

Lambda create-function control plane operations are still failing with InternalError for us - other services have recovered (Lambda, SNS, SQS, EFS, EBS, and CloudFront). Cloud availability is the subject of my CS grad research, I wrote a quick post summarizing the event timeline and blast radius as I've observed it from testing in multiple AWS test accounts: https://www.linkedin.com/pulse/analyzing-aws-us-east-1-outag...

Forricide · 2025-10-20T16:46:21 1760978781

Definitely seems to be getting worse, outside of AWS itself, more websites seem to be having sporadic or serious issues. Concerning considering how long the outage has been going.

busymom0 · 2025-10-20T16:49:45 1760978985

That's probably why Reddit has been down too

whaleofatw2022 · 2025-10-20T16:44:16 1760978656

Dangerous curiosity ask, is whether the number of folks off for Diwali is a factor or not?

I.e. lots of folks that weren't expected to work today and/or trying to round them up to work the problem.

loudmax · 2025-10-20T17:56:31 1760982991

Northern Virginia's Fairfax County public schools have the day off for Diwali, so that's not an unreasonable question.

In my experience, the teams at AWS are pretty diverse, reflecting the diversity in the area. Even if a lot of the Indian employees are taking the day off, there should be plenty of other employees to back them up. A culturally diverse employee base should mitigate against this sort of problem.

If it does turn out that the outage was prolonged due to one or two key engineers being unreachable for the holiday, that's an indictment of AWS for allowing these single points of failure to occur, not for hiring Indians.

dd_xplore · 2025-10-21T11:08:46 1761044926

It's more worse if caused by American engineers , not on holiday

hinkley · 2025-10-20T23:56:47 1761004607

Seems like a lot of people missing that this post was made around midnight PST time and thus it would be more reasonable to ping people at lunch in IST before waking up people in EST or PST.

hinkley · 2025-10-22T18:15:30 1761156930

More info is claiming the problem started around 9:15 the previous day, but brewed for a while. But that’s still after breakfast in IST.

hinkley · 2025-10-22T18:43:41 1761158621

Sometimes I miss my phone buzzing when doing yard work. Diwali has to be worse for that.

junon · 2025-10-20T16:46:20 1760978780

Seeing as how this is us-east-1, probably not a lot.

redeux · 2025-10-20T16:49:52 1760978992

I believe the implication is that a lot of critical AWS engineers are of Indian descent and are off celebrating today.

herewulf · 2025-10-20T17:26:45 1760981205

junon's implication may be that AWS engineers of Indian descent would tend to be located on the West Coast.

jmedefind · 2025-10-20T18:10:51 1760983851

North Virginia has a very large Indian community.

All the schools in the area have days off for Indian Holidays since so many would be out of school otherwise.

hinkley · 2025-10-20T23:58:20 1761004700

This broke in the middle of the day IST did it not? Why would you start waking up people in VA if it’s 3 in the morning there if you don’t have to?

AutoDunkGPT · 2025-10-20T18:55:11 1760986511

I bet you haven't gotten an email back from AWS support during twilight hours before.

There are 153k Amazon employees based in India according to LinkedIn.

junon · 2025-10-21T10:49:53 1761043793

Missing my point entirely.

hinkley · 2025-10-22T18:24:14 1761157454

Then I missed it too because I let my Indian coworkers handle production issues after 9,10pm unless the problem sounds an awful lot like the feature toggle I flipped on in production is setting servers on fire.

My main beef with that team was that we worked on too many stories in parallel so information on brand new work was siloed. Everyone caught up after a bit but stuff we just or hadn’t demoed yet was spotty for coverage.

If I was up at 1 am it was because I had insomnia and figured out exactly what the problem was and it was faster to fix it than to explain. Or if I wake up really early and the problem is still not fixed.

napolux · 2025-10-20T18:33:13 1760985193

worst of all: ring alarm unstoppable siren because the app is down and the keyboard was removed by my parents and put "somewhere in the basement".

bartread · 2025-10-20T20:21:45 1760991705

Is it hard wired? If so, and if the alarm module doesn’t have an internal battery, can you go to the breaker box and turn off the circuit it’s on? You should be able to switch off each breaker in turn until it stops if you don’t know which circuit it’s on.

If it doesn’t stop, that means it has a battery backup. But you can still make life more bearable. Switch off all your breakers (you probably have a master breaker for this), then open up the alarm box and either pull the battery or - if it’s non-removable - take the box off the wall, put it in a sealed container, and put the sealed container somewhere… else. Somewhere you can’t hear it or can barely hear it until the battery runs down.

Meanwhile you can turn the power back on but make sure you’ve taped the bare ends of the alarm power cable, or otherwise electrically insulated them, until you’re able to reinstall it.

napolux · 2025-10-21T20:24:08 1761078248

I'll keep it in mind, thx. I was lucky to find the keypad in the "this is the place where we put electronic shit" in the basement.

bartread · 2025-10-22T16:58:05 1761152285

Nice. Well, whatever, I’m glad you managed to stop it from driving you up the wall.

Liftyee · 2025-10-22T01:58:18 1761098298

I have a Ring alarm. It has a battery backup and is powered by AC adaptor, so no need to turn off entire circuits (but no easy silence). All the sensors I have are wireless (not sure if they offer wired).

I would honestly do your box option. Stuff it in there with some pillows and leave it in the shed for a while.

bartread · 2025-10-22T16:57:28 1761152248

Yeah, we’ve got a bunch of Ring stuff but not the interior alarm so I wasn’t sure how it worked. I suspected it might have a battery backup and, in that case, desperate times -> desperate measures.

autophagian · 2025-10-20T16:39:38 1760978378

Yeah. We had a brief window where everything resolved and worked and now we're running into really mysterious flakey networking issues where pods in our EKS clusters timeout talking to the k8s API.

cj00 · 2025-10-20T17:23:35 1760981015

Yeah, networking issues cleared up for a few hours but now seem to be as bad as before.

mvdtnz · 2025-10-20T17:49:20 1760982560

The problems now seem mostly related to starting new instances. Our capacity is slowly decaying as existing services spin down and new EC2 workloads fail to start.

baubino · 2025-10-20T20:02:43 1760990563

Basic services at my worksite have been offline for almost 8 hours now (things were just glitchy for about 4 hours before that). This is nuts.

indoordin0saur · 2025-10-20T20:43:11 1760992991

Have not gotten a data pipeline to run to success since 9AM this morning when there was a brief window of functioning systems. Been incredibly frustrating seeing AWS tell the press that things are "effectively back to normal". They absolutely are not! It's still a full outage as far as we are concerned.

assholesRppl2 · 2025-10-20T16:33:52 1760978032

Yep, confirmed worse - DynamoDB now returning "ServiceUnavailableException"

claudiug · 2025-10-20T16:46:38 1760978798

ServiceUnavailableException hello java :)

dutzi · 2025-10-20T16:49:43 1760978983

Here as well…

JCM9 · 2025-10-20T16:41:25 1760978485

Agree… still seeing major issues. Briefly looked like it was getting better but things falling apart again.

tlogan · 2025-10-20T18:51:41 1760986301

I noticed the same thing and it seems to have gotten much worse around 8:55 a.m. Pacific Time.

By the way, Twilio is also down, so all those login SMS verification codes aren’t being delivered right now.

wavemode · 2025-10-20T16:32:34 1760977954

SEV-0 for my company this morning. We can't connect to RDS anymore.

jmuguy · 2025-10-20T17:03:06 1760979786

Yeah we were fine until about 1030 eastern and have been completely down since then, Heroku customer.

davedx · 2025-10-20T18:32:00 1760985120

Andy Jassy is the Tim Cook of Amazon

Rest and vest CEOs

hinkley · 2025-10-22T18:27:06 1761157626

Don’t insult Tim Cook like that.

He got a lot of impossible shit done as COO.

They do need a more product minded person though. If Jobs was still around we’d have smart jewelry by now. And the Apple Watch would be thin af.

perching_aix · 2025-10-20T16:36:21 1760978181

In addition to those, Sagemaker also fails for me with an internal auth error specifically in Virginia. Fun times. Hope they recover by tomorrow.

steveBK123 · 2025-10-20T18:06:54 1760983614

Agreed, every time the impacted services list internally gets shorter, the next update it starts growing again.

A lot of these are second order dependencies like Astronomer, Atlassian, Confluent, Snowflake, Datadog, etc... the joys of using hosted solutions to everything.

hinkley · 2025-10-22T18:36:05 1761158165

Before my old company spun off, we didn’t know the old ops team had put on-prem production and our Atlassian instances in the same NAS.

When the NAS shit the bed, we lost half of production and all our run books. And we didn’t have autoscaling yet. Wouldn’t for another 2 years.

Our group is a bunch of people that has no problem getting angry and raising voices. The whole team was so volcanically angry that it got real quiet for several days. Like everyone knew if anyone unclenched that there would be assault charges.

jonplackett · 2025-10-21T08:46:46 1761036406

The problem is now that, what’s anyone going to do? Leave?

I remember a meme years ago about Nestle. It was something like: GO ON, BOYCOT US - I BET YOU CAN’T - WE MAKE EVERYTHING.

Same meme would work for Aws today.

MaKey · 2025-10-21T08:51:04 1761036664

> Same meme would work for Aws today.

Not really, there are enough alternatives.

jonplackett · 2025-10-22T13:46:54 1761140814

How any just run on AWS underneath though?

And it’s not lie there aren’t other brands of chocolate either…

hinkley · 2025-10-22T18:31:20 1761157880

It’s amazing how much you can avoid them by eating food that still looks like what it started as though. They own a lot of processed food.

ljdtt · 2025-10-20T16:43:50 1760978630

first time i see "fubar", is that a common expression on the industry? jsut curious (english is not my native language)

sorentwo · 2025-10-20T16:46:04 1760978764

It is an old US military term that means “F*ked Up Beyond All Recognition”

dingnuts · 2025-10-20T16:50:17 1760979017

FUBAR being a bit worse than SNAFU: "situation normal: all fucked up" which is the usual state of us-east-1

D-Coder · 2025-10-21T20:02:43 1761076963

My favorite is JANFU: Joint Army-Navy Fuck-Up.

joeyphoen · 2025-10-20T17:50:00 1760982600

But you probably have seen the standard example variable names "foo" and "bar" which (together at least) come from `fubar`

sunnybeetroot · 2025-10-20T18:45:33 1760985933

Which are in fact unrelated.

jameshart · 2025-10-20T20:55:48 1760993748

Unclear. ‘Foo’ has a life and origin of its own and is well attested in MIT culture going back to the 1930s for sure, but it seems pretty likely that it’s counterpart ‘bar’ appears in connection with it as a comical allusion to FUBAR.

worik · 2025-10-20T18:39:53 1760985593

Foobar == "Fucked up beyond all recognition "

Even the acronym is fucked.

My favorite by a large margin...

gregw2 · 2025-10-20T17:54:51 1760982891

Interestingly, it was "Fouled Up Beyond All Recognition" when it first appeared in print back towards the end of World War 2.

https://en.wikipedia.org/wiki/List_of_military_slang_terms#F...

Not to be confused with "Foobar" which apparently originated at MIT: https://en.wikipedia.org/wiki/Foobar

TIL, an interesting footnote about "foo" there:

'During the United States v. Microsoft Corp. trial, evidence was presented that Microsoft had tried to use the Web Services Interoperability organization (WS-I) as a means to stifle competition, including e-mails in which top executives including Bill Gates and Steve Ballmer referred to the WS-I using the codename "foo".[13]'

jameshart · 2025-10-20T20:56:56 1760993816

What people would print and what soldiers would say in the 1940s were likely somewhat divergent.