Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Four minutes to realize SHTF, but a while longer to figure out why:

> Level 3 was aware it had a problem within four minutes, the FCC report said. The problem was difficult to diagnose, however, because no one at Level 3 was aware of the consequences of leaving that particular field empty, nor had anyone at the company previously seen the system behave the way it was behaving.

That is, they didn't know that leaving that field blank is what caused the S to HTF.



I can imagine it once they found out the cause.

"Really? That was it? Are you fucking serious?"


sounds like most of the manufacturing issues I have been involved in...

'why did the machine break?'

'well the spec says to clean it using chemical X but the cabinet with X is 75 feet away and the cabinet with chemical Y is next to the machine so they used Y. They use X and Y interchangeably on other machine so the technicians (note: high-school grads, great guys but not chemists) thought it was interchangeable on this machine.'

'well why aren't they interchangeable on this machine'

'Y reacts with the glue used to assemble the machine, which was a change in the newer versions because of EPA regulations, so doing this weekly maintenance task for 8 years was finally enough to degrade the glue'

'so why was chemical Y stored near here?'

'because X has to be kept so many feet from Y. Last year in the efficiency audit we found techs had to walk to far on average to get X so we moved the X cabinet, which resulted in us moving Y to here.'

'has Y been used on any of the other machines when it shouldn't have been?'

'we don't know and we aren't sure how to check'

Fault trees usually make for really interesting reading.


Nice.

I'm guessing I can't read the source to this particular fault tree, but I wonder where I might find others. Preferably without digging through e.g. troves of court documents and the like.



Ooh, interesting. Thanks!

And I'd somehow gotten in my head the NTSB were only aviation, probably from old TV shows. TIL about their actual name.


Figuring out why can be avoided if you have proper change management. If you are root causing every issue then that will drastically increase TTM. Instead mitigate first and just revert the change.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: