Crowdstrike software runs in kernel mode, which means that when it tries to execute illegal instruction, access non existent memory or even divide by 0, the system cannot intercept this and kill only that process, the whole system goes down. And since the software starts early in the bot process, it would crash eveytime it booted. The flaw here is that the kernel mode software accepted unverified data from the internet without a user level software parsing it for validity before passing it to the kernel level software. This ia a bad design from Crowdsrike and bad decision from Delta for rellying on sofware from a company that doesn't know about basic precautions for kernel mode code. Each machine had to be rebooted with equivalent of special keypresses to avoid loading extensions after which you coudl delete the offending file and reboot normally. (this was documented early during the night by Crowdstrike). Remains to be seen if Delta IT staff were at work fixing the problem as soon as the fix was docuented or whether the fix started during regular work hours.
Some systems - if they were "slow" were able to come up with network, download the latest crowdstrike update which deleted the offending file and reboot before crashing. But it's still no excuse for operating at ring level 0 at the kernel. I've heard this is a side effect of some EU anti-trust ruling that forces Microsoft to give unrestricted access to the kernel to third parties. I have not uncovered anything that validates this however. I believe Apple and MacOS take a different, safer approach which is pseudo-API driven for certain IO calls that might be interrogated by DLP and similar software.
@@apl175 the Crowdstrike software evidently started with the corrupt file prior to requesting the latest update since the system would crash whenever it booted. You have kernel mode software that downloads something from the Internet and acts on it without validating it. That says a lot.
Exactly. I was involved with remediation of a few thousand affected computers worldwide at my company. Repair required 'boots on the ground' once the procedure was identified, and luckily our processes allowed us to 'deputize' some technical staff to help remotely. Crowdstrike would need to offer up hundreds of engineers to fly all over the world to make a 10 minute repair, but that's not what they were offering. Microsoft would need to offer the same. Delta declining support by Microsoft to save face about ancient systems is 90% malarkey as Microsoft deals with computers older than that in some industries, such as industrial automation. If they had a support contract with another outside vendor to perform IT work and that vendor assured them they could handle it, that's likely why Delta said, "Nah, we got this. Thanks." Why engage another outside entity and take on the associated risks when you were promised your existing vendor would be just fine? Delta was likely concerned with access to their physical assets as well as entry into secure areas, as said in the video. If the devices were encrypted, they would need to be 'unlocked' to repair them, which could require additional access to be granted at the centralized IT level to obtain recovery keys. Delta is the third largest airline in the world by destinations, and I bet the vast majority of those destinations required a site visit. Given the sheer number of impacted devices in such far-flung locations, I personally cut Delta some slack at the IT level, and I hope some of their engineers got some time off afterward. How the company handled the canceled flights and their legal actions going forward is another matter altogether and I'm not privy enough to it to make an informed opinion. Unless a detailed account of the response by way of a third-party investigation is released to the public, it's all conjecture.
The old console based scheduling system is probably the root cause of the cascading disruption. Most other airlines probably had a browser based application where they just needed a browser and VPN.
My takeaway from this utterly preventable and foreseeable calamity? Delta’s garbage CEO complains bitterly regarding Cloudstrike’s indifference is only matched by Delta’s passengers complaining bitterly about the airline’s indifference. Did I mention this was preventable?
It should be mentioned that friday was Delta's busiest day of the year
I flew out of ATL on Alaska Airlines that day. No wait to taxi since all the Delta flights were grounded - we got in early!
Gay comment
Crowdstrike software runs in kernel mode, which means that when it tries to execute illegal instruction, access non existent memory or even divide by 0, the system cannot intercept this and kill only that process, the whole system goes down. And since the software starts early in the bot process, it would crash eveytime it booted.
The flaw here is that the kernel mode software accepted unverified data from the internet without a user level software parsing it for validity before passing it to the kernel level software. This ia a bad design from Crowdsrike and bad decision from Delta for rellying on sofware from a company that doesn't know about basic precautions for kernel mode code.
Each machine had to be rebooted with equivalent of special keypresses to avoid loading extensions after which you coudl delete the offending file and reboot normally. (this was documented early during the night by Crowdstrike). Remains to be seen if Delta IT staff were at work fixing the problem as soon as the fix was docuented or whether the fix started during regular work hours.
Some systems - if they were "slow" were able to come up with network, download the latest crowdstrike update which deleted the offending file and reboot before crashing. But it's still no excuse for operating at ring level 0 at the kernel.
I've heard this is a side effect of some EU anti-trust ruling that forces Microsoft to give unrestricted access to the kernel to third parties. I have not uncovered anything that validates this however.
I believe Apple and MacOS take a different, safer approach which is pseudo-API driven for certain IO calls that might be interrogated by DLP and similar software.
@@apl175 the Crowdstrike software evidently started with the corrupt file prior to requesting the latest update since the system would crash whenever it booted.
You have kernel mode software that downloads something from the Internet and acts on it without validating it. That says a lot.
the real villian in this are the hotel raising their prices
Pity they didn't go to court.
It would have made an extremely interesting video for the channel.
My flight got cancelled twice during this.
Oh i was very happy about the Crowdstrike bug which locked out my office laptop. Got paid for doing nothing all day
Crowdstrike compensations to Delta are due
one million skymiles
@@aoe4_kachowwhat’s that good for an upgrade to premium economy?
Delta protects more devices with crowdstrike than American or United.
Furthermore, Crowdstike cant help manually reboot computers in person anyways.
Exactly. I was involved with remediation of a few thousand affected computers worldwide at my company. Repair required 'boots on the ground' once the procedure was identified, and luckily our processes allowed us to 'deputize' some technical staff to help remotely. Crowdstrike would need to offer up hundreds of engineers to fly all over the world to make a 10 minute repair, but that's not what they were offering. Microsoft would need to offer the same. Delta declining support by Microsoft to save face about ancient systems is 90% malarkey as Microsoft deals with computers older than that in some industries, such as industrial automation. If they had a support contract with another outside vendor to perform IT work and that vendor assured them they could handle it, that's likely why Delta said, "Nah, we got this. Thanks." Why engage another outside entity and take on the associated risks when you were promised your existing vendor would be just fine?
Delta was likely concerned with access to their physical assets as well as entry into secure areas, as said in the video. If the devices were encrypted, they would need to be 'unlocked' to repair them, which could require additional access to be granted at the centralized IT level to obtain recovery keys. Delta is the third largest airline in the world by destinations, and I bet the vast majority of those destinations required a site visit. Given the sheer number of impacted devices in such far-flung locations, I personally cut Delta some slack at the IT level, and I hope some of their engineers got some time off afterward.
How the company handled the canceled flights and their legal actions going forward is another matter altogether and I'm not privy enough to it to make an informed opinion. Unless a detailed account of the response by way of a third-party investigation is released to the public, it's all conjecture.
The old console based scheduling system is probably the root cause of the cascading disruption. Most other airlines probably had a browser based application where they just needed a browser and VPN.
We got delayed 4 hours but we flew faster in probably a jetstream and arrived only 30 minutes delayed
Why haven't the two gone to court? I want to see who really did what. Did Delta have outdated technology?
757
My takeaway from this utterly preventable and foreseeable calamity? Delta’s garbage CEO complains bitterly regarding Cloudstrike’s indifference is only matched by Delta’s passengers complaining bitterly about the airline’s indifference. Did I mention this was preventable?