CrowdStrike Unofficial Retro
ฝัง
- เผยแพร่เมื่อ 15 ก.ย. 2024
- Recorded live on twitch, GET IN
Article
DELETE ME DELETE ME
Guest
DELETE ME DELETE ME
My Stream
/ theprimeagen
Best Way To Support Me
Become a backend engineer. Its my favorite site
boot.dev/?prom...
This is also the best way to support me is to support yourself becoming a better backend engineer.
MY MAIN YT CHANNEL: Has well edited engineering videos
/ theprimeagen
Discord
/ discord
Have something for me to read or react to?: / theprimeagenreact
Kinesis Advantage 360: bit.ly/Prime-K...
Get production ready SQLite with Turso: turso.tech/dee...
0:30 “Any sufficiently advanced incompetence is indistinguishable from malice” 🤯
isn't it a play on 'any sufficiently advanced technology is indistinguishable from magic", Arthur C. Clarke
@@crazyfrisbee23I think is a counterplay on Hanlon's razor - "Never attribute to malice that which can be explained by incompetence" principle too.
I know that's how I live my life. I think it's combining Clarke with Hanlon.
0:19 "Practice Enough New and Interesting Skills".... perfect acronym
That's a Falcon® great acronym.
I’ve used “Something Happened In The System” in a couple of reports 😊
Very Underated Linguistically Vital Acronym.
And you know what? Time In The Saddle will really help you with that!
If you ever feel bad just remember: you will never fuck up as hard as this
Challenge accepted
Actively thinking for ways to accidentally cause this much damage and I don't think I've got what it takes. Best I might manage is prevent some people from submitting to regulators on time and the bank gets a fine... even then plenty of other people might pick up my fkup and fix/flag it. Kinda lends credence to the rumours that this Crowdstrike thing is no mere "oopsie". Systems with way less impact and reach have way more checks and balances.
Hold my beer
You say that. This CEO of crowdstrike was also the CFO of mcafee when they did the same thing in 2010. Imagine breaking the internet twice.
I work in telecom and I believe in my ability to fuck up my BGP THAT badly
Can we take a moment to appreciate how Prime does not want any patents on his internet history but he's totally chill with searching for 'Hacktivist Furry' without a second thought? What a chad.
Unlike everything else in the news, I'm not at all getting sick of this coverage.
Talking about news, yesterday I was listening to the news on our national radio station here in Montreal (CBC) and they were saying this blackout was caused by a malfunction with Windows Updates... No joke... Luckily they always say how they care about giving us the best accurate news possible, can you imagine if they didn't care?
I was sick of it almost immediately
@@AG-ur1lj You must've had a fun weekend.
@@redditrepo473 would’ve been cool if just 1 of the tech channels I follow had opted to cover something else, but at least I can count on Tosh to release a banger every week
@@redditrepo473 well it would’ve been cool if just 1 of the tech channels I follow had opted to cover a different topic. But I always know that if I make it to Tuesday, Tosh will put another banger out
I've been working on setting up rolling releases where I work and pushing for us to move towards releasing most stuff that way. CrowdStrike has made convincing people much easier.
10:13 Can someone please inform Prime that Falcon/EDR software isn't running ONLY on servers. It runs on a fleet of machines and also servers. It's part of the corporate image that's dumped on every laptop that's shipped off to users. That's why having to visit every laptop to fix the issue, especially remote workers, is particularly painful.
Especially if you first have to walk those remote workers through entering a long bitlocker key
@@bulabulable yeah, it's a shit show.
So... in order for a company to be sure that what they effectively outsourced is actually also working in their entire infrastructure after an update, would it be possible instead to have a honeypot, which you constantly ping after an update and then only install the update if it actually answers? As I learned in other forums, Crowdstrike actually by design disregarded individual settings on updating and just forced the update no matter what, which might have been one reason it went so sever. So maybe ask the company you outsource to what rights you have to those settings with that service installed. Will you be able to control which systems get updated and which not?
Can someone please inform Prime, that the Crowdstrike kernel driver was basically a P-Code interpreter signed by MS....what broke it was a "malformed" updatefile that contained all zeros... The Kernellevel P-Code interpreter didn't check User-Space input properly....
General rule: When Bulding Kernelcode ALLWAY just trust Usermode input...,because why not.
@@TremereTT where are people getting this information from? the channel files do not contain code and the driver is not an interpreter, the code that broke was there since version 7.11 of the driver back in feb 2024. the channel file is just a bunch of strings, malware signatures stuff like that, they just had buggy parsing code.
“Move fast and break things” stonks go up.
For your rivals? Common w for everyone that hate u
What this article fails to consider is that the companies who pay for Crowdstrike don’t want to have the final say in what security updates get pushed. They don’t want to decide the minutia of security practices, because THAT’S WHAT CROWDSTRIKE IS FOR
Yeah, they hire those people to be in charge of their security, they trust they don't f-up and in this month only we got two examples of what happens when the ones we trust our security f-up, one real and the other virtual
How is that at all relevant to the article? The article is looking at what CS likely got wrong and how to avoid these issues. Many companies do want to pay money to push responsibility of security onto another company, really dubious, but again not what the article is discussing.
@@Ash_18037 it's bullet point 1 in the article
I think that is the first point in the article. Users should be able to select an update plan (normal/canary ring). Then then user has to think about it only when he is buying the service. Same like Windows.
I saw at least one (possibly two) first-hand reports that this update overrode the local staged update protocol. Some companies may not want that level of control or couldn’t be in compliance with predetermined requirements for such a control. But some companies certainly do. CS screwed all of them equally. We will see if the stories about the override will be confirmed.
I'm an MLEng, and I spent monday doing IT over the phone for my company (along with a very large number of Eng/Data/etc), as we had over 3K BSOD'd latptops with bitlockered drives. I now have a new appreciation for people doing tech support. Having to walk people through using a cmd prompt, I have seen so many miscommunications I would not have imagined. Some examples:
People thinking a colon means * or #.
People typing in space when you say space instead of spacebar.
Someone using 'F as in Fam' (obv heard as S as in Sam), even though they were using names for every other letter (when trying to clarify start of Key IDs so I could look up the bitlocker recovery key on the MDAM server).
I did eventually land on a pretty good script going with stuff like 'crowd as in a big group of people, not a fluffy thing in the sky' when they needed to type in crowdstrike.
gotta use the NATO phonetic alphabet. It was made when radios had bad enough audio quality.
And also learning your alpha bravo charlies (ABCs in NATO phonetic alphabet).
@@kevinrineer5356 I found the best strategy was to tailor my choices to (what I could gather) about the caller. Some people were ESL (so familiar/everyday words were best), some people had impressively short attention spans (so silly words kept them engaged).
F as in fam is insane
F as in fam. When gen z has corrupted us all 😂
NATO phonetic alphabet is my go to and making them pull up the alphabet too helped 😂
We started using in Crowdstrike in February of 2023 because we were restoring from a ransomware attack. It took out everything. They got in through the firewall and attached to all machines, servers, and backups. We had redundant backups for this reason that were fine, but it was a headache. We didn’t have many apps, but it was the restoring 6 servers and 48 workstations and reentering the data took 5 months to completely restore. We are a chemical manufacturer so the work continued and we switched to paper for a period and that was the data that had to be reentered.
Switzerland is legisating that their Public Sector software be Open Source under the premise "public money, public code", we'll see how it plays out as 3rd party rights or security concerns can see things exempted from having their code released; That said this does mean more govt doesn't necessarily equate more Windows & even then thanks to gaming it is easier than ever to run Windows apps on Linux, though React OS is available for anyone who just wants Windows XP but Open Source
"Haha, hospital and emergency services no worky because CrowdStrikey updatey. Oh well🤷" - US Govt
Did any US government agency use CrowdStrike?
Airports are owned by Gov. I think that's one of them.
@@santiagohal6747 Airports generally are owned by local or state governments, even though they may receive some federal funding. Also, I haven't heard anything about airports themselves being affected aside from maybe minor things like digital signage. For the most part it was airlines that were affected.
Dave's Garage channel has the best technical information on the CrowdStrike issue I have heard/read so far.
He even explains ring-0 and ring-1 and why crowdstrike dips into your ring-0. Its a good video for newbies just to learn about how kernel somewhat works in windows.
@@Sandy-o4pI think you mean ring-3
@@tma2001 Ring-0 == kernel ; Ring-1 == boot_drivers ; Ring-2 ==drivers_loaded_after_init_loads_the_important_stuff ; Ring-3 == application. Depending on the operating system, the setup may only have 3 rings and I think microsoft designed theirs with 4 rings (or more ). Crowdstrike lives in Ring-0 or ~Ring-1 as a boot_driver and it totally cripples the system if the code is bad. They did that to skip the application protection layer, and thus planes don't fly.
@@tma2001 Just for some newby readers ; he states ring-3 because some OSs skip ring 1- and ring-2. Ring-0 is where the kernel lives, but that's kind of the old way of thinking about memory sections. Its just a learning mechanism. Most schools teach ring-0, ring-1 and ring-2 and ring-3 now and all are being used. 0=kernel, 1=important_device_drivers_like_motherboard, 2=peripheral_device_drivers, and ring-3 is application. Back in the day Ring-3, in microsoft, is/was taught to be the application layer. Microsoft and other teacher started teaching it different because its not really standard. I think many OSs changed the ring-teaching material because of VMs (like boch (back in the early 2000s). It gets weird. Eitherway, crowdstrike screws around with the kernel memory, so they could bypass the protection of the application layer. And planes don't get off the ground due to scheduling because the ticket computers are down. Another example is that Linux and BSD don't use the Ring teaching stuff because its a false mnemonic-human-brain learning technique. Its kind of like the OSI network stack model - it exists because some professor needs to sell books. When you read the code, none of that ring stuff is in there.
no, his video is full of wrong technical details posted by some rando on twitter who was completely wrong on his assesment
Linux was also affected. Falcon for Red Hat, Debian and Rocky Linux has been causing crashing.
Yes, in May, on a Friday push...
@@opposite342Friday ahaha, they really have a culture of Friday pushes
Solutions to the previous bsod issues was to rename the crowdstrike folders. Holy …. Isn’t that an security issue by itself? Prevent the antivirus to check stuff just by deleting a folder or file
Was in a company of two engineers and we did staged releases. 15 SaaS hosting servers, update one, wait a few days.
24:24 A file getting mostly zeroed out could be caused by something as simple as an interrupted file transfer. In fact that might be the most likely cause.
Didn't MS Azure go down a few hours before? I can't say for sure, but I thiiiink I can make a guess at what happened...
But either way, what the hell are they doing without a canary for something like this?!
Okay, so the thing with CrowdStrike is that it also comes with a $1m dollar warranty if you are breached by ransomware - there are obviously a whole bunch of T/Cs. My assumption would be that companies would rather not breach those terms in order to have that $1m warranty. the other thing is also that if you give users the final say in updating and they choose not to - if they get breached, they will spread the news that it was your software that let them down. even if they are to blame...
As an aircraft avionics engineer, I find all of this hilarious and wonder what the fuck the "ordinary" software engineering industry is up to...
Based JSFAVC enjoyer
When you use Crowdstrike, you don't need to worry about attackers...you have to worry about Crowdstrike. 😅
where I work a team of 42 techs spent 45 hours each restoring our systems.
This is going to be a very expensive class action lawsuit.
I am currently stuck at the airport. If i didn't buy flight insurance i might not have been able to financially recover for about 6 months pulling OT. As soon as i heard about the update my first thought was "oh no they probably didn't test and moved their fever dream code straight into prod". It's one thing to know that and another to see the real world consequences of something like this at global scale.
Words are, the signing of the file corrupted it, so there is that vector.
They tested the artifact pre-signing, then they signed the file, which corrupted it
Wow so they didn’t test it then. Wild. Why TF is signing so hard for security vendors of all companies!!
So, what was the point of signing the file if no parts of the software checked it later? (Obviously, it didn't because corruption was never discovered until after the crash)
I heard the problem file was all null values.
If the file can be altered after testing, you're testing in the wrong place...
Article is "DELETE ME DELETE ME"?! 🤯
When you really look at everything that had to occur for this to have happened, you'd have to be a complete fool to think that this was an "accident".
This was NOT a SDLC issue. It was a release management issue.
I’m remembering coming into work a few months ago and learning that Dropbox had decided to change our company’s folder structure. Our scanning system broke completely, and it was a nightmare to get everybody up to speed again.
29:00 this is one of the arguments for not using binary formats for data. If you use text, the software must parse the text. Wrong text will result in parsing error. Loading a binary directly into your software is almost as dangerous as directly loading and executing machine instruction binaries.
It's like no one has ever heard of a formalized Change Management process.
2:56 YES! Canary releases, CrowdStrike.
It's not like 100% of the endpoints will be targeted by the attack you're circumventing with your update.
Your threat intelligence should know a thing or two about the most likely endpoints to be targeted. Maybe start with these, IDK.
Everyone seems to forget that there was a failure of Microsoft Azure before Cloudstrike published its channel file. The Azure caused a failure between storage and compute. I feel that Cloudstrike was getting zeros from storage instead of the correct files. This does not absolve Cloudstrike of errors in not checking the validity of data and null pointer dereference. Here is part of Azure notice failure. Between 21:40 UTC on 18 July 2024 and 12:15 UTC on 19 July 2024, customers may have experienced issues with multiple Azure services in the Central US region due to an Azure Storage availability event. This issue affected Virtual Machine (VM) availability, which caused downstream impact on multiple Azure services, including failures of service management operations and connectivity or availability of services. Services with dependencies on the impacted Virtual Machines would have been affected.
This is a Developer Comment , the perfect response to surface-level investigation and experts
This actually sounds like a plausible way that this could have happened.
Per the CloudStrike incident analysis report, this doesn't seem to be relevant (I'm sure they'd already blamed a 3rd party if they could)
Even if it was, the software should be checking the integrity of the channel files. No checksum, no signature etc. It basically downloads files and runs them in kernel mode to avoid having to get the driver re-certified for every update, but no checking to see if they are corrupt or have been tampered with etc. Even before null pointers etc this should have been noticed by the software.
@@georgehelyarin software level dev do that to reduce time even if it fail only the app fail, but in kernel mode? This is a dev who doesn't understand why he shouldn't or someone who doesn't care.
That McKinsey study actually referenced itself.
Regarding the frequency of updates: theoretically, "empty updates" shuffled in with actual zero-day fixes will increase the opsec of blue team's update strategy. Also ensures people are used to the idea of updating often, and the 1-in-a-thousand truly critical update won't send alarm bells to stockholders, clients, etc.
10:00
If you’re working with software that handles ingestion of data outside of your network, VPNs, or allows third party applications/libraries you need malware detection especially if you’re pci. It’s not just a “windows” thing and it’s across all corp images including laptops/workstations
It’s the new digital gold mine - create the possibilities of threats to make “security software”
People died or will die because of this. Transportation, manufacturing, hospital care, bank transfert...lives were lost because of this and they will only get a slap on the wrist...
In theory... Couldn't you modify the channel file to run whatever code you want as a virus at the kernel level? Since clearly the file isn't actually signed you could do any kind of buffer overflow or function pointer shenanigans to your own malicious code pretty easily?
I got the same thoughts. This file sits inside windows directory, so you have to be administrator to update it, and if you are administrator you already can load any drivers. But only signed drivers. Just a small privilege escalation
There were more fundamental Windows drivers that had the same issues. It's what cheater used to bypass kernel anticheat before most of them were patched.
Article is DELETE ME DELETE ME?
Logging that broke production... Yeah, sounds familiar... As for the thing at hand, sources say that the Channel Files are simply files with pcode and that one was corrupted in flight, for some reason. Now, we have three potential scenarios.
1 - The kernel driver isn't checking some form of signature in the cfiles so it executed trash and...
2 - It has, but signing was the last step and the cfile was corrupted before it was signed
3 - It was the signing step that corrupted the cfile in mem and signed a whole lot of zeros (somehow, this is the one i predict to be true, because... been there...)
p.s. Yeah, having a log of what you were trying to do and skipping steps that failed would have solved this issue with 1 reboot. shame they didn't think of that
p.s.2 the "memory safe" part if BS. If you're dynamically loading something into memory and then executing it, you're gonna get the shaft, no matter what you write it in. Interpreted pcode, maybe, IL code a la C#, maybe, but binary>memory>execute? You're GONNA GET THE SHAFT.
On a server, a vulnerability may be exploited and an EDR may detect that, any exploitation attempt basically, falcon can even detect unusual behavior between processes "pipes"
Question: How many of you have an auto updating browser with god knows what - no prompt to install, no patch notes, in one of your most essental productivity tools. It's a built-in supply chain attack.
2 words: Smoke Test
That's all that had to be done! 🤯
Genuinely curious what do you think are either sides of the dei issue? at the end of the day someone is saying businesses are too diverse, which it obviously isn't true. Should we, as engineers, strive to have less diverse coworkers? Like why should anyone even consider this as an issue, let alone search for some answer in the middle
The problem isn't "diversity" per se. The issue lies on the policies implemented because of it. Business have to hire people based on gender/race instead of qualifications because they have to fill a diversity quota. The diverse hires can't be questioned because it will be interpreted as bias against them so everybody is afraid to speak up when the diversity hire makes a mistake. Diversity hires are also more prone, even incentivized, to denounce coworkers they think have a bias against them and no obligation to prove it.
None of this happens if those "diverse" hires have the same rules for both hiring process and working conditions. If one is certain that the company only hires who they consider the best available, there would be no suspicion if your coworker happens to be a woman, gay, trans, black, yellow, gray or a furry.
@@ahumeniy I understand the scenario you are describing, and I am pro-“hiring the right person for the job”, but it doesn’t seem like a widespread enough problem to get worked over. What about networking? What about nepotism? Companies make hiring decisions based on non meritocratic reasons all the time. It’s just easier to point at the marginalized groups and blame them for the worlds problems.
On Linux I love that I can continue working on a project without updates interrupting anything. After I am done with the work and ready to upgrade, I'm also ready for any upcoming problems. And I can directly opt in to updates, before they occur, because I can see what the updates are changing (in the terminal at least). It's my goddamn operating system and all applications managed by it.
That’s not a difference between Linux and Windows. Sys admin on windows can do that too for your windows machine. In fact it’s the consumer windows default. There are three things in play here: 1) company security inspired policies force your system to install the update regardless of your wishes. 2) install is automagically if and when you boot which windows people often do in the morning at start of work, 3) al lot of the mayhem was caused by servers being auto upgraded, not because some laptops broke.
Software Patents protect the Intellectual Property Owners… if you recreate their Intellectual Property in a basement without looking at the patent… your still liable. ie, ignorance is not a defense.
Would becoming a lawyer, before founder help? I'm so scared of patents... Lol Shouldn't courts consider intent?
His point was that you are going to have even more trouble defending yourself if there is a record of you having viewed that patent in your browser history.
@@ericgoodman3510 that is not how patent law works. View or not view, the invention is registered and recreating it in any form without permission puts you at risk. There is no “i did not know its an invention” defense.
If only MBAs were held accountable....
Some seem to believe that the "Channel Files" are more like fully operable code (attributes and also loops, conditionals, branches etc) than just patterns to be matched. IF the driver installed into the kernel works as a sort of runtime environment, then this related file could well include a set of working code which, then, also could have serious bugs. The WHQL did their oversight on the Falcon driver, but they cannot anticipate some add-in code from some outside install file.
We once had a little fuckup with 00 in backup files. The issue was that SMB only supports 1MB packages when moving the data. Some copy library we used failed silently by padding the packages to their expected length.
6:40 Prime, your company is still liable for infringing on a patent even if you don't look at them. The publication of the patent puts everyone on constructive notice of the patent.
Damages are triple if you do it knowingly (and they can prove that) IIRC
Treble damages involves knowing or reckless disregard of the patent's existence and validity.
However, ignorance of a patent's existence is not always a defense against willful infringement. Courts may consider several factors, including:
1. Duty of Care: Companies are expected to perform due diligence to avoid infringing existing patents. Failure to investigate can be seen as reckless behavior.
2. Notice of Infringement: If you have been notified of the patent and continue to infringe, it can contribute to a finding of willfulness.
3. Behavior and Actions: Courts look at the overall behavior, including any steps taken to avoid infringement after being notified.
@@JustBCWi Fair enough, I don't think anyone is saying ignorance is a defense against patent infringement. My understanding is that even if you know a patent's title and/or description, surely you're only going to make things worse (that is, easier to prove willful infringement in court) by then going and reading the patent's actual claims. Do you disagree?
@@PassifloraCerulea This is where I'm going to punt. I'm not a patent attorney, and it's been years since I studied it in law school (IANAL,BIHAJD). What I was saying is: claiming to never read a patent doesn't mitigate being held liable for patent infringement. It might be a way to avoid enhanced damages, but in Halo it looks like it's harder to avoid them. You're on constructive notice with the filed and approved patent.
And just because there's a patent doesn't mean it's enforceable. A guy patented all they ways of folding pocket squares about 25 years ago. He still had the patent even though someone invalidated all of the claims (because all those ways of folding had been in the public domain for about a century).
That said, the SCO v Linux case is instructive. That's what pushed me to law school as a developer back in 2004. Linux was able to show his development of the kernel from the beginning due to having version control. It showed he made rookie mistakes in the beginning, and the product matured. Had he just copypasted from another source, it would have been obvious. But because it showed true independent evolution, it was defensible.
You can also read a patent and use that knowledge to avoid infringement.
Peace. :)
Something I dont think many people consider is that this could've happened to both Linux and Mac (maybe not the boot loop part, but the initial crash yes)
Also a response to Point 1 - Customers have complete control over with sensor version they can put out to their host. They have options between Automated updates(latest, 1 version behind, or 2 versions behind) or static version (it has around 10 versions to choose from last I checked the windows sensor update policy), saying that customers have no control is false.
Per point 2, if Point 3 was followed correctly this wouldn't be an issue. and the reason they roll out to everyone at the same time is to update everyone's protection at the same time. which because Point 3 isn't followed there-in lie the issue. So I agree.
Point 4, I agree that it needs fixed, but part (not all, but some) of the blame lies on Microsoft for even signing the kernel module after their testing as well that didn't have the exception handling it required.
A bit less likely though, as blanket policies often allow for more jabs on management of non-Windows servers. Of course the bottom line is nobody should defer their security updates to a live patch service. It's almost never warranted, and guaranteed to fail at some point.
You can hack at Windows all you want, but Linux only doesn't have these issues because in spiite of all the fanbois, nobody's made it so it actually works as a client for the masses. Instead of doing your superior patronizing dance, maybe use your skills to change that.
Kids barely know how to use iPhones these days, so I don't see companies wanting to make the switch in the workplace to any distro with current developments. Linux distributions in many ways are great...😊
@@ImEddieful Funnily enough, just learnt that similar CrowdStrike issues have already affected Debian and other Linux systems in the past
@@ImEddiefulso are kids glued to their phones or they don't know how to use them to begin with? Which is it boomer? Or do you decide based on what helps you win an argument?
@@rusi6219 I've said nothing of the sort, troll. I am no boomer either, but interesting you find that to be an insult. Go cry to your momma, it seems your panties are on too tight.
I don't always wipe my hard drive, but when I do, I use clownstrike.
lol blame DEI meanwhile ignoring the C-Suite exec that was in charge when this happened last time ...
7:25 "oh damn Shaun of the dev", so good
people have been using "DEI" as the n-word regardless of qualifications, which is bad.
The published C-file was all zeroes, it was not a coding bug but a process one.
It beggars belief that the system DIDN'T have a step to check that the file was something expected and reasonable, and catch it before trying to run it at effing boot level. That's criminal levels of negligence.
that was a red herring as CS confirmed in a blog post - not every one had all zeros channel file which created a lot of initial confusion. There are many possible reasons for this such as pre-allocating files before updating, wiping a file as a post cleanup step for security etc. Also valid channel files have a magic byte signature at the start. Its also been reported that channel files contain byte code for a VM interpreter in the actual kernerl driver CSAgent.sys so it was a code bug.
I work for a Bank, i found out afterward crowdstrike was rejected because our technical teams couldn't pretest updates.
What's really crazy is that they had the same error YEARS AGO, just on a smaller scale, and didn't fix it. Now that they had it on a massive scale, this will surely make it easier to sue them for damages.
I once routed 60,000 phone lines through a server in Haifa Israel and then took off for the weekend. I still do not underatnd why I was still employed on Monday morning.
Many devs and tech influencers like this dude continue to push half-assed mediocre languages like Go lang which, almost a whole decade after Java, still has the gall to ignore the "Billion Dollar Mistake". The arrogance of the Golang creators is suffocating. Yes, Go isn't used in kernel code (yet, fortunately), so it's not directly related to the subject matter. However, null pointer issues plague all kinds of critical software, and any modern language should really offer some level of protection against them (e.g. Swift, Kotlin, Rust, etc).
Bruh be really calling Ken Thompson arrogant (probably one of the humblest CS people there is)
@@rusi6219I believe he was stoned during the process and the other guys just put his name there 😂
Pigeons eat dead rat
Pigeon gets pancaked by car
Pigeons eat dead rat
The flip side is that any sufficiently shrewd malice is indistinguishable from incompetence
Microsoft had this problem completely solved in their WSUS design. This was decades ago. Why is Microsoft letting anyone push out unvetted changes to kernel-level software without any end-user control? Getting that signature on your driver should come with a requirement that you *never* do this.
How did he get 6am pacific from 4am UTC? What?
... i think i had 4pm pt in my head
And I imagine the crazy scenario of a hidden MITM attack secretly replacing the contents of this channel file with zeroes to see who breaks and learn who uses CS
we had 2500 machines to manually fix.. we were lucky they all had LOM devices on them... highly recommend lights out management
the new bot meta is hilarious😂
Getting rid of Windows dues not reduce government though.
FOSS people live in their heads not in the real world
That quote at the beginning is pretty much Hanlon's razor:
"Never attribute to malice that which can be adequately explained by stupidity."
I don't necessarily agree with Hanlon's razor, tbh. It's the kind of thing to be aware of, that sometimes people are just stupid, but I object to the "never" part.
12:51 That's just an old german saying: "Mit Kanonen auf Spatzen schießen." So it's sparrows in the original.
0:30 always remember to Value All Great and Interesting New Achievements, it really pays off!
Even bigger issue here other than Crowdstrike untested updates are the reason why the software exist in first place in critical places.
Possible reasons:
1. Corporate policy
2. Insurance company requirement
3. Incompetent managed service provider
This is not only Crowdstrike thing, we need more people to blame that Adidas fixing of computers and all troubled that caused to people.
Some signage system doesn't need guarding software to prevent malware installation. Those are dedicated systems and no one is using there browser or email application.
Crowdstrike correcly applied prime's "nEgAtivE sPaCe PrOgRaMmInG" advice of crashing on error in production code
Real men test in Prod - Titan founder, Crowdstrike and my Partner in our School Software.
The thing is there is control in the console for software version updates, including pinning. N+0, N+1 (what most run) and N+2, and supported versions of the agents are tied to the version of the operating system, and when the agent is newer than the version for the supported operating system, you can pin the version.
The channel updates are a different thing, that that workflow is where the failure was. This particular channel file was for monitoring Windows named pipes, so this is why it didn't impact Linux and Mac, they don't have these things. The complexity of the operating systems, especially with legacy compatibility, that we have to build software to protect, is just too much. But there is so many billions of lines of code we run every day in this modern world.
You know, a green screen would never have had this issue, because the attack surface was actually knowable (telescopes, shoulder surfing, listening to the clacky keyboards from 2km away for timing attacks, etc).
Another comment, saying the users have to do the updating, my brothers in IT, WE CAN'T EVEN PATCH KNOWN PATCHES in our environments in a timely fashion. And we can't write code well enough to not be of high enough quality that the US cybersecurity defense are BEGGING for software quality to improve from basic memory vulnerabilities and crying for Rust.
Good advice, let the users update, but when you have threat actors targeting your enterprise, being a few months behind, one email or drive by google ad away from losing your business. It's not realistic.
And to tell me off and agree with the video, crowdstrike released their Preliminary Post Incident Review on their website. 😂
In the last couple of months Crowstrike affected Linux (twice with a purple screen and it took weeks to fix), Windows (once) and MacOS (once) with similar issues. So Linux is not the answer in this case. You can preach Linux like it's a religion, but it's just another OS with another purpose.
17:40 no system to test? You said it can’t be they do not test before stating stuff for distribution. That’s not (they only) thing needed: thing can get broken during packaging and distribution. A company that does this kind of distribution to customers should run a serious number machines exactly like customers do. Which are updated through the customer updates channels. Have a testing/validation team who are customer. Including paying the fees, calling helpdesk, running an good variety of machines you support, etc.
Their actual driver itself does have to go through rigorous testing to verify it won't break machines, that's what the WHQL is and that's required for Microsoft to sign your driver. This is configuration loaded by the driver though, and they did no validation on those files to make sure they're actually usable. So a file full of 0s was sent out and it bricked millions of computers. The driver code that was verified stable didn't change, though. Perhaps WHQL needs to be expanded to include fuzzing of configuration files.
37:00 OK, no more doubts about it: this is negligence, not accident. I look forward to them getting sued into the ground.
Sorry, but I've deployed plenty of Linux boxes as a government contractor. Simply put, for anything critical our government tends to use Linux. It's more reliable, it's also not a US component which is seen as a liability (for example as a tool to enforce export bans). Heck, in some cases we even effectively rebuilt from sources and maintained our own Linux distro just to be sure.
450 jobs are available at CrowdStrike including the CTO
So, this not only happened before, but happened even on Linux as well...
11:41 and that's why you need to separate your life from work. The guy probably casually replied to work chats on his personal PC.
WOW! You gotta push back harder on the DEI bs. "You can blame some level of DEI on certain issues" is as dangerous as the original commenter's statement. This is really bad.
Yeah, I think I am leaving after that comment. Using DEI in this context is almost always a racist dogwhistle, and allowing it to even be mentioned as a possible reason is absurd. People to blame? The developer? Maybe a bit. The other devs that signed off on the change? Maybe a bit. Poor testing and deployments? Maybe a bit. The industry emphasizing profits and speed over everything else? Maybe a bit. But no way in hell is his somehow being put on minorities, shaking my head.
The rolling update thing is untrue here. They provide those updates to combat ongoing malware attacks. Those kinds of updates need to happen very quickly. This isn't just malware signatures, but preemptive and tracing infrastructure driven by cloud computing to act even if given malware just emerged and is unknown.
My code bricked our portable reader devices TWICE because a kernel driver our device used for touch screen used localized input for reading the screen calibration text file at boot. On German locale they use comma instead of dot for decimals, so it tried to read the numbers to know how to transform screen touches into pixels, failed, and rebooted forever :) It had to be manually reinstalled from a pendrive. We of course tested it before automatic update, but we happen to use default C locale. Then, after like half a year we bricked these devices AGAIN with the same shitty kernel driver for the shitty OEM touchscreen we used, because they fixed the bug in the kernel driver (it started to use C locale for reading the numbers in kernel no matter the system locale), and our workaround of using the system locale for writing these numbers during calibration - now caused the error in the first place :)
In my defense we were a company of 15 people, with 2 software developers, boss, a few salespeople and like 10 guys screwing the screws in, and we did have "Add testing in German locale" in our TODO, just not soon enough :)
18:00 they are testing pre-packaging. They don't test the published version.
The day after my first child was born, Russia invaded Ukraine. The day after my second child was born, the Crowdstrike fiasco happened. Curious to see what the birth of my third child may bring.
What he didn’t meantion is that according to crowdstrike tos we shouldn’t deploy it on highly available servers
Remember this is a product thats mostly pointed to endpoints so the most exposed part of your infrastructure.
There's lots of controls like data exfiltration that can even be done by employess. We should doubt our employees and their ability to maintain their endpoint secure (the same as with code)
If I remeber correctly (has been a few years) falcon can be updated by stages but the content updates I don't remember if u can set up an audience based deploy.
Here in Mexico by law we must have more Linux than windows since windows is a private company of the USA and they are a vulnerability for data security.
40:25 you know I'm really starting to think that this was actually the real reason 😮
I get why they want this on servers after having been put in charge of an actual Windows server. Everything was done by copying files and clicking around using remote desktop software. Apparently Ansible does work on Windows, but it wasn't part of the culture there. Luckily I was fired for incompetence after spending two weeks trying to figure out where the hell they put the equivalent of /etc/fstab on Windows.
CrowdStrike did not break my company laptop. I always turn auto-update off.
Having different opt in levels for "days of stability" makes a lot of sense. Like, at least give me a day with no new updates and then install the latest.
I just think the entire team missed a bigger mistake as they have been so involved with the details. They should let an independent auditor test their releases to remove biases/presumptions. Sometimes we don't know things work different when you release to the outside world. No amount of internal testing can replicate this.
Why is the article link deleted?
The Cloud Strike software did not check to "channel file" to be valid, using a hash comparison and error robust loader for example. Simply bad programming. Otherwise Kernel drivers are quite safe, as they need to get tested and signed. But this channel file does not, and thus never gets tested by a third party. In combination with bad programming, it was bound to crash.
The reason why Linux doesn't get hacked because no one cares to. Windows is tied to businesses with money. If Linux was as used in corporate setting as Windows, it would have been hacked just as much. Hackers will find a way.