is there an ELI5 on how can this happen? Like i get its a boot loop, but what did crowdstrike do that cause it? How can non malicious code trigger boot loop?
I would not call Crowdstrike ”non-malicious”. It’s incredibly incompetently implemented kit that’s sold to organizations as snakeoil that ”protects them from cybercrime”. It’s purpose is to give incompetent IT managers to ”implement something plausible” against cyberincidents, and when an incident happens, it gives them the excuse that ”they followed best practices”.
It craps the users PC while at it too.
I hope the company burns to the ground and large organizations realize it’s not a really great idea to run a rootkit at every PC ”just because everyone else does it”.
I have to say, it saved our ass a few months ago. Some hacker got access to one of multiple brands server infrastructure, started running PowerShell to weed through the rest and CrowdStrike notified us (the owning brand) that something was off about the PowerShell being ran. Turns out this small brand was running a remote in tool that had an exploit. Had Crowdstrike not been on that server we wouldn't have known until someone manually got in there to look at it.
I've had CrowdStrike completely delete a debug binary I ran from Visual Studio. Its injected module in every single process shows up in all of our logging.
What specifically makes it "incredibly incompetently implemented", and would you simply derisively describe any system that can push updates requiring admin access a "rootkit", or is there some way you envision a "competently implemented rootkit" operating? Your opinion seems incredibly strong so I'm just curious how you arrived at it? I'm not in IT, but the idea of both rolling out updates remotely and outsourcing the timely delivery of these updates to my door* is a no brainer.
* if not directly to all my thousands of PCs without testing, which is 100% a "me" task and not a "that cloud provider over there" task
Rootkit means Crowdstrike literally intercepts commands before they can be executed in the CPU. It is like letting a third party implant a chip in your brain. If the chip thinks the command in your head is malicious, it will stop your brain from ever receiving the command.
Crowdstrike needs to be the first person in the room so that they can act like the boss. If other people show up before crowdstrike, there's a possibility that they'll somehow prevent crowdstrike from being the boss. For this reason, crowdstrike integrates with the boot process in ways that most software doesn't.
Their ability to monitor and intervene against all software on the system also puts them in a position to break all software on the system.
What they did is that they forgot to write a graceful failure mode for their driver loader. (And what they did on top of it is to ship it without testing.)
My assumption is that when you have graceful failure for something like this, you risk a situation where someone figures out how to make it gracefully fail, so no it's disabled on this huge fleet.
It's likely that there have been multiple discussions about graceful failure at the load stage and decided against for 'security' reasons.
If the threat model includes "someone can feed corrupted files to us" then I would definitely want more robustness and verification, not less.
It's perfectly okay to make the protected services unavailable for security reasons, but still a management API should be available, and periodically the device should query whatever source of truth about the "imminent dangers". And as the uncertainty decreases the service can be made available again.
(Sure, then there's the argument against complexity in the kernel ... true, but that simply means that they need to have all this complexity upstream, testing/QA/etc. And apparently what they had was not sufficient.)