Real men test in production… The truth about the CrowdStrike disaster

แชร์
ฝัง
  • เผยแพร่เมื่อ 23 พ.ย. 2024

ความคิดเห็น • 2.3K

  • @richdobbs6595
    @richdobbs6595 4 หลายเดือนก่อน +9322

    I once was a employee at Carbon Black, a competitor to CrowdStrike, working in automated testing. It was competitive with the worst software development practices of any organization I've ever been exposed to. The devs were fairly smart, but the assumption was that the purpose of testing was to bless the code that they had written. I agreed to step up and manually test one dev's code, and I reported back that every time that I tried to run the code it killed the process without leaving any diagnostics. The dev said, how can I troubleshoot the problem without any good data? I looked at his code and identified he was not checking for null pointers but just dereferencing them anyway. This was an important step in getting myself terminated as not being a team player.

    • @GrizikYugno-ku2zs
      @GrizikYugno-ku2zs 4 หลายเดือนก่อน +338

      "Say you're a Karen without saying you're a Karen" - robot woman's voice

    • @MrDragonorp
      @MrDragonorp 4 หลายเดือนก่อน +1159

      Bro you weren't a team player. Didn't you hear, real men only fix after production

    • @JoseMonteverde
      @JoseMonteverde 4 หลายเดือนก่อน +390

      Classic Señor dev

    • @XeenimChoorch-nx8wx
      @XeenimChoorch-nx8wx 4 หลายเดือนก่อน +69

      ZII. Zero is initialization. Controlled failure vs uncontrolled failure

    • @cybervigilante
      @cybervigilante 4 หลายเดือนก่อน +179

      @@MrDragonorp TAF - Test After Failure

  • @rosgoncharuk2403
    @rosgoncharuk2403 4 หลายเดือนก่อน +2811

    Anyone who's working in IT should only be surprised how doesn't this happen each month.

    • @XIIchiron78
      @XIIchiron78 4 หลายเดือนก่อน +223

      Only through the tireless efforts of countless engineers correcting other people's stupidity (and sometimes their own) does the world make it through another day without disaster.

    • @millanferende6723
      @millanferende6723 4 หลายเดือนก่อน +43

      If only things were about you know quality, and truth... instead of "who can get the most attention the fastest."

    • @dickduquesne
      @dickduquesne 4 หลายเดือนก่อน

      that's exactly what I was thinking, indeed

    • @HD-fc4ds
      @HD-fc4ds 4 หลายเดือนก่อน +19

      working in IT and working in multibillion security company is different.

    • @tokopiki
      @tokopiki 4 หลายเดือนก่อน +55

      I'm working in IT and I'm everyday amazed how far we've made it without the modern civilization folding on itself. Being part of that clusterf%ck in the belly of the beast is equally awesome and terrifying.

  • @h3w45
    @h3w45 4 หลายเดือนก่อน +5377

    It's very self asssuring to know that I and a programmer at one of the most advanced tech companies have the same practices

    • @PhilLesh69
      @PhilLesh69 4 หลายเดือนก่อน +60

      I always copy the entire path of the existing code into a -bak or -date directory, then run the new code in place to test it on one production level server, before deploying it to the rest of the servers. That way in can use scp to copy the old working coy back if things really go belly up.
      But I guess when they rely on automation and all kinds of layers of abstraction between them and the code, they cannot do it that simply and easily.

    • @ForcefighterX2
      @ForcefighterX2 4 หลายเดือนก่อน +85

      Of course they cannot (simply) test their code in production level environments. Corporations have made un-maintainability into an art form, where a single deployment-step is so automated, but requires so many manual steps as well, that no single person can ever deploy anything easily.
      And when you are new to the organization, and learned for the first time how insanely convoluted their deployment process is, you undoubtedly asked "why!?". But as always the answer is "has grown historically" (legacy). And by that time you entered the organization, it would take weeks or months to re-implement this insane architecture into something which can actually be deployed in a sane manner.
      But we all know to never touch a running system. Even if it's a running nuclear bomb close to detonation.

    • @maleldil1
      @maleldil1 4 หลายเดือนก่อน

      @@PhilLesh69 have you heard about Git?

    • @Koltonjbacon
      @Koltonjbacon 4 หลายเดือนก่อน +3

      So what’s up with the whole Diddy thing?

    • @scottgillespie2690
      @scottgillespie2690 4 หลายเดือนก่อน +17

      I don’t see many engineers on LinkedIn with more than a year or two of experience at a company before they move on. It took me a few months to understand our codebase and with all of the reorgs and compounding layers of rotating management it made it difficult for anyone to sit and focus on much of anything for very long.

  • @alkebabish
    @alkebabish 4 หลายเดือนก่อน +690

    As a web developer, I can confirm testing in production is the best way to go: the added pressure focuses you, and it saves having to push things to production. The way I like to do it is over ftp with notepad. Or if I'm on the toilet I'll use my phone and edit the files directly using the cpanel file manager. If my software was running on all the most vital computers in the world, I imagine that pressure would make me sharp as a knife and I'd never make a mistake.

    • @awrjkf
      @awrjkf 4 หลายเดือนก่อน +47

      Chad

    • @ian-tumulak
      @ian-tumulak 4 หลายเดือนก่อน +22

      Real web devs fix in production.

    • @trueperson-o2z
      @trueperson-o2z 4 หลายเดือนก่อน +21

      Holy fuck I haven't laughed this hard at a comment in months

    • @azibekk
      @azibekk 4 หลายเดือนก่อน +9

      As a developer i like to watch ci/cd on the toilet too 😂 it really helps me to not sit too much in the toilet

    • @westevo
      @westevo 4 หลายเดือนก่อน +1

      Giga Chad!

  • @zirconium5849
    @zirconium5849 4 หลายเดือนก่อน +369

    As a Rust programmer I cac confirm that this was our plan to get Rust into production

    • @ashwithchandra2622
      @ashwithchandra2622 4 หลายเดือนก่อน +18

      Even upgrading to c++23 will work instead of changing whole code to rust.

    • @lucascamelo3079
      @lucascamelo3079 4 หลายเดือนก่อน

      ​@@ashwithchandra2622wait until c++60, safe memory edition.

    • @conquerorofindia
      @conquerorofindia 4 หลายเดือนก่อน

      ​@@ashwithchandra2622works! but fails in prod😢.

    • @5uryaprakashPi
      @5uryaprakashPi 4 หลายเดือนก่อน

      ​@@ashwithchandra2622, how? What is so special about CPP 23

    • @backslash68
      @backslash68 4 หลายเดือนก่อน

      @@ashwithchandra2622 what feature of c++23 will make it immune to null pointer de-referencing?

  • @smithwillnot
    @smithwillnot 4 หลายเดือนก่อน +960

    I just finished my first puzzle on Brilliant and got invitation for job interview in CrowdStrike. Wish me luck boys.

    • @stsm6192
      @stsm6192 4 หลายเดือนก่อน +19

      Crack up, like it👍😆

    • @ViriKyla
      @ViriKyla 4 หลายเดือนก่อน +2

      Lololol

    • @stsm6192
      @stsm6192 4 หลายเดือนก่อน +8

      @@smithwillnot good luck i know you will do a better job, test check, test check, test check, lol

    • @stsm6192
      @stsm6192 4 หลายเดือนก่อน +5

      @@smithwillnot oh i forgot disable automatic updates on any OS before you send out the updated patches ha ha,

    • @generationm2059
      @generationm2059 4 หลายเดือนก่อน +16

      Don't forget to test in production on a Friday and have another job on standby!

  • @Lustanda
    @Lustanda 4 หลายเดือนก่อน +5002

    Real men test in production... ON A FRIDAY

    • @pingvingaming
      @pingvingaming 4 หลายเดือนก่อน +284

      The best code is the Friday 5 minute before go home time is the best code

    • @donaldobrien9171
      @donaldobrien9171 4 หลายเดือนก่อน +61

      The competent folks are on summer vacation

    • @mesiroy1234
      @mesiroy1234 4 หลายเดือนก่อน +14

      F yeah Ianit ever ncode nerd
      Litrealy never coded in my life
      BUT I KNOW DONT UPDATE ON FRIDAY

    • @sanketjadhavar
      @sanketjadhavar 4 หลายเดือนก่อน +25

      At 5:30 PM😂😂😂

    • @weho_brian
      @weho_brian 4 หลายเดือนก่อน +63

      actually real men don't test their code at all, they just push their code and wait for a scream test

  • @ray-mc-l
    @ray-mc-l 4 หลายเดือนก่อน +2157

    Hahaha damn that "Real men test in production" pic with the submarine guy killed me

    • @TheGunnarRoxen
      @TheGunnarRoxen 4 หลายเดือนก่อน +202

      it certainly killed the sub guy...

    • @ethereal2620
      @ethereal2620 4 หลายเดือนก่อน +98

      Wait until you find out that his last name was *Rush.* 😮

    • @Kronabyss_
      @Kronabyss_ 4 หลายเดือนก่อน +7

      I bet he thought the same thing

    • @XDarkGreyX
      @XDarkGreyX 4 หลายเดือนก่อน +5

      Jeff made that joke in a vid right after that disaster already. If you enjoyed this, go back and watch that.

    • @Koltonjbacon
      @Koltonjbacon 4 หลายเดือนก่อน +1

      So what’s up with the whole Diddy thing?

  • @StarLightDotPhotos
    @StarLightDotPhotos 4 หลายเดือนก่อน +28

    This was 100% a culture issue. I left Crowdstrike in March of 2024 specifically because of these types of quality issues. I never expected anything to blow up this bigly, but the culture that enables this type of thing is why I left.

  • @techgroveusa
    @techgroveusa 4 หลายเดือนก่อน +56

    Emphasizing quality assurance and the organization's responsibility underscores why continuous integration and proper testing are so crucial.

  • @the_primal_instinct
    @the_primal_instinct 4 หลายเดือนก่อน +4200

    Commented faster than CrowdStrike devs push into production

    • @DylanEdd_1
      @DylanEdd_1 4 หลายเดือนก่อน +79

      and way faster than the rollback :D

    • @abhishekpardhi
      @abhishekpardhi 4 หลายเดือนก่อน +8

      Smooth

    • @joga_bonito_aro
      @joga_bonito_aro 4 หลายเดือนก่อน +5

      Doubt

    • @timothyvandyke9511
      @timothyvandyke9511 4 หลายเดือนก่อน +10

      Impossible

    • @pierrecurie
      @pierrecurie 4 หลายเดือนก่อน +5

      @@DylanEdd_1 Considering the BSOD, is automated rollback even possible?

  • @intp
    @intp 4 หลายเดือนก่อน +519

    The company I work for has under 200 employees, under 30 devs, and we devs are writing education software. But even we have 5 levels test environments before any change hits production. That's besides automated tests written by the API devs, automated tests written by the front end devs, and automated end-to-end testing by the QA team. Then there is required peer reviews of all code, and the QA dev manual testing. It's scary if a software company with such a critical product is releasing code without at least these guard rails.

    • @JxH
      @JxH 4 หลายเดือนก่อน +26

      "The company I work for has under 200 employees, under 30 devs, ..."
      FYI - The number zero (two places) is compatible with that sentence structure.

    • @darrennew8211
      @darrennew8211 4 หลายเดือนก่อน +44

      I heard from one employee that there's no automated testing. Also, this update was flagged to pass all canary testing at individual companies and to deploy everywhere immediately. And the driver itself is flagged that if it fails during boot Windows shouldn't disable it and boot anyway. The file that caused the crash was all zeros content. This is either intentional and someone shorted a lot of stock, or it's criminally negligent.

    • @Kenionatus
      @Kenionatus 4 หลายเดือนก่อน +30

      ​@@JxHI'd be highly impressed if zero devs managed to pull off that much procedure.

    • @o1-preview
      @o1-preview 4 หลายเดือนก่อน +24

      worked at a place that had 15 "QA" that all they did was click on the functionality, didn't even read the code, send it to the client to click around and than push to production, worst company I've ever worked at, fuck those guys - when I raised this issue I was fired withing 2-3 weeks!

    • @Solinaru
      @Solinaru 4 หลายเดือนก่อน +5

      I have a feeling I know what company you're talking about, not because I worked there but I used to work for a competitor with only one level of testing. 😂

  • @kwan8247
    @kwan8247 4 หลายเดือนก่อน +2045

    1:13 what the hell is this stock video lmao

    • @gus473
      @gus473 4 หลายเดือนก่อน +93

      My Class Portrait!

    • @Itsallfun3000
      @Itsallfun3000 4 หลายเดือนก่อน +13

      F!

    • @OnurErtas-q1o
      @OnurErtas-q1o 4 หลายเดือนก่อน +121

      I would pay for it.

    • @MarterElectronics
      @MarterElectronics 4 หลายเดือนก่อน +64

      HR

    • @ClariNerd
      @ClariNerd 4 หลายเดือนก่อน

      Search “Bret Hart null pointer” and you’ll find it.

  • @GoogleDoesEvil
    @GoogleDoesEvil 4 หลายเดือนก่อน +64

    This isn't even the first time this quarter CrowdStrike caused a bunch of machines to kernel panic/bug check. In June, Falcon Sensor was causing RHEL 9.4 to kernel panic. In April, it caused Debian to kernel panic. In both of those cases though it was a Linux kernel bug.

    • @SaraMorgan-ym6ue
      @SaraMorgan-ym6ue 4 หลายเดือนก่อน +5

      Crowd strike crashed linux a few months back crash Microsoft now it's Macs turn dun dun dun🤪🤪

  • @NotGarbageLoops
    @NotGarbageLoops 4 หลายเดือนก่อน +193

    Programmers are generally terrified about missing deadlines and will do whatever you command them to. It's up to the project manager to track delays and ensure the boss is notified in advance that deadlines will be missed. It's up to the boss to ensure they have good project managers and QA testing practices. Yes, this is indeed an organizational failure.

    • @WhiteSharks-wz6kn
      @WhiteSharks-wz6kn 4 หลายเดือนก่อน +2

      So are all devices that used Crowdstrike unusable now and need a fresh windows install?

    • @MatthewDeveloper
      @MatthewDeveloper 4 หลายเดือนก่อน +20

      ​@@WhiteSharks-wz6knNot really, just boot into safe mode and get rid of the borked driver.
      This is sure going to be annoying for the IT team if they need physical access to do it, and don't forget this must be done for EVERY DEVICES.

    • @akin242002
      @akin242002 4 หลายเดือนก่อน +7

      ​@WhiteSharks-wz6kn No. Just need to delete the latest Crowdstrike driver. Usually 2 major steps.
      A) Either get the specific encryption key access to the company laptop/desktop first or go straight into safe mode.
      B) Go to the command prompt and delete the latest Crowdstrike driver file (c-00000291*.sys).
      FYI... I work in IT. Out team of 13 had to go through this process for 700+ employee laptops 💻 on Friday. Some old and some new. Interesting stories to tell at a bar or on Reddit.

    • @klausstock8020
      @klausstock8020 4 หลายเดือนก่อน

      @@akin242002 What everyone hears: "Delete the latest Crowdstrike driver file (c-00000291*.sys)."
      What every malware author hears: "Delete all CrowdStrike files (c-00*.sys).".

    • @andrewroberts7428
      @andrewroberts7428 3 หลายเดือนก่อน +1

      the existence of project managers is often an organizational failure

  • @bryokyo
    @bryokyo 4 หลายเดือนก่อน +948

    "Most of us will be dead by then", that got me rolling

    • @ImperativeGames
      @ImperativeGames 4 หลายเดือนก่อน +29

      There is nothing funny about nuclear war.

    • @almaximus03
      @almaximus03 4 หลายเดือนก่อน +7

      That got me rolling too😅

    • @gustavo9758
      @gustavo9758 4 หลายเดือนก่อน +22

      I thought I got the joke... until I actually did and was like "wait a minute..."

    • @ArawnOfAnnwn
      @ArawnOfAnnwn 4 หลายเดือนก่อน +4

      ​@@ImperativeGames So you say. I find it hilarious! 😅😎

    • @trumpetpunk42
      @trumpetpunk42 4 หลายเดือนก่อน

      In 2021 my previous employer invited a pair of supposed doctors to tell us with a straight face that we would literally all be dead in five years if we didn't get the experimental injection. 2026 confirmed!

  • @cyfrowymuza
    @cyfrowymuza 4 หลายเดือนก่อน +639

    that's right - a classic null pointer dereference... nobody expects the spanish inquisition

    • @traveller23e
      @traveller23e 4 หลายเดือนก่อน +24

      it's such an insufficient explanation, a null pointer dereference is a symptom not the root cause.

    • @mesiroy1234
      @mesiroy1234 4 หลายเดือนก่อน +3

      Ianit ever ncode nerd
      Litrealy never coded in my life
      BUT I KNOW DONT UPDATE ON FRIDAY😊

    • @TestyMcTestypants
      @TestyMcTestypants 4 หลายเดือนก่อน

      They must now sit in the comfy (gamer) chair.

    • @johnsmith1953x
      @johnsmith1953x 4 หลายเดือนก่อน +3

      LOL! This has been a problem since the late 1970s.
      and it STILL IT!!! OMG!!!!

    • @rj7250a
      @rj7250a 4 หลายเดือนก่อน +6

      Again, it was not a null pointer, there was a null check in the code.

  • @Guru4hire
    @Guru4hire 4 หลายเดือนก่อน +1053

    The idea that a rust enthusiast would "prove a point" is the most believable thing in the world.

    • @o1-preview
      @o1-preview 4 หลายเดือนก่อน +7

      idk I like the point of where this was done on purpose to practice for a real event

    • @rj7250a
      @rj7250a 4 หลายเดือนก่อน +28

      I mean, if the driver was writen in Rust then it would crash anyway, since Rust by default crashes on memory unsafety.
      The c++ code already checked for null pointer as mentioned in the last twitter thread in the video.

    • @alexanderSydneyOz
      @alexanderSydneyOz 4 หลายเดือนก่อน +3

      "The idea that a rust enthusiast would "prove a point" is the most believable thing in the world."
      Well, other than what actually happened

    • @minerscale
      @minerscale 4 หลายเดือนก่อน +9

      ​@@rj7250a I feel like a kernel driver should probably have a panic handler that unloads or maybe restarts the driver with a count of number of retries. That way any unrecoverable errors (bar compiler bugs/unsafe block promises not being kept) will not bring down the system

    • @jfbeam
      @jfbeam 4 หลายเดือนก่อน +21

      @@minerscale I see you don't know much about writing kernel-mode stuff. Unlike a userspace application, nothing is tracked in kernel space, so there's no way to know how to "restart" or unload the offending driver... or anything that has been commingled with it. You have to the driver's shutdown and exit code; once it's done anything "bad", none of its data structures can be trusted, and by extension, the entire kernel, as in ring 0 it could've messed with literally anything.

  • @reedmclean3574
    @reedmclean3574 4 หลายเดือนก่อน +44

    3:50 This felt like a personal attack, I'm literally writing a to-do list application right now as one of my first apps.

    • @GammaFn.
      @GammaFn. 4 หลายเดือนก่อน +7

      Don't feel attacked, writing your own to-do app is a right of passage

  • @RainingArtillery
    @RainingArtillery 4 หลายเดือนก่อน +23

    Let's also mention that not only does the driver run in kernel mode, but it's flagged as running on boot. That is why this outage was so bad: Bluescreen because of driver -> Reboot, ah, this driver is marked as an essential part of the system that we can't boot without -> Bluescreen. Meaning them rolling out a fix will not fix machines automatically, an IT tech has to go over to every single machine and manually reboot in safe mode to have the fix actually applied.

  • @treyquattro
    @treyquattro 4 หลายเดือนก่อน +607

    that "real men test in production" meme was sick!

    • @alxk3995
      @alxk3995 4 หลายเดือนก่อน +21

      That got created right after the incident. But it's gold. 😂

    • @christopherg2347
      @christopherg2347 4 หลายเดือนก่อน +29

      One could say it was...Titanic.

    • @seanburke424
      @seanburke424 4 หลายเดือนก่อน +11

      "Everyone has a QA system, but not everyone has a production system"

    • @nicholasvinen
      @nicholasvinen 4 หลายเดือนก่อน +18

      I don't always test my code, but when I do, I do it in production...

    • @christopherg2347
      @christopherg2347 4 หลายเดือนก่อน +1

      @@seanburke424 That saying doesn't make sense to me.

  • @FireStormHR
    @FireStormHR 4 หลายเดือนก่อน +186

    WHY DIDNT I KNOW THAT 1:30 VIDEO EXISTS?

    • @wesleyrm
      @wesleyrm 4 หลายเดือนก่อน +21

      Pure GOLD

    • @csvscs
      @csvscs 4 หลายเดือนก่อน +10

      Please share a link to it!!!

    • @geeshta
      @geeshta 4 หลายเดือนก่อน +32

      It's called Making of WrestleMania: The Arcade Game it's on YT

    • @bsherman8236
      @bsherman8236 4 หลายเดือนก่อน +1

      Some memes never get old

  • @GSBarlev
    @GSBarlev 4 หลายเดือนก่อน +812

    So apparent Crowdstrike Falcon broke a Debian image about three months ago, but because Linux doesn't actually force software updates, it fucked the VMs of a few dozen nerds who reported the issue and rolled back to the previous image before the entire global ecosystem went down.
    Seems like there's a few lessons to be learned here.

    • @___Kevin
      @___Kevin 4 หลายเดือนก่อน +13

      Interesting

    • @AlexiosLair
      @AlexiosLair 4 หลายเดือนก่อน +109

      Classic small stick that holds the entire global infrastructure from collapse

    • @ren3059
      @ren3059 4 หลายเดือนก่อน +4

      wait wtf

    • @rafazieba9982
      @rafazieba9982 4 หลายเดือนก่อน +153

      This update was not "forced by Windows". It wasn't even done by Windows. CrowdStrike updated the rules itself.

    • @shroomer3867
      @shroomer3867 4 หลายเดือนก่อน +49

      The only lesson you need to learn, is to shut up and update your windows system as soon as possible or else we'll do it for you!
      - Microsoft.

  • @weston8400
    @weston8400 4 หลายเดือนก่อน +69

    In my experience here's how problem solving with code works.
    "I want to solve this problem with code. Here is my plan"
    "Let's write the code now."
    "Testing the code. Oh no, there are bugs."
    "I fixed the bugs."
    "Oh wait, what's this?"
    "This problem has to do with stuff I can't just fix, guess I'll work around it."
    "I hate my life, this is really hard."
    "This works, but it shouldn't. It looks ugly and I hate it."
    "Whatever, it's working."

    • @mukta4689
      @mukta4689 4 หลายเดือนก่อน +7

      *pushes code in production*
      *crash*

    • @9tales9f
      @9tales9f 3 หลายเดือนก่อน +3

      "This works, but it shouldn't."
      and that's where you ask for your friend's device

  • @PSP92262
    @PSP92262 4 หลายเดือนก่อน +20

    The fact that QA doesn't seem to be a thing anymore is mind-boggling.

    • @backslash68
      @backslash68 4 หลายเดือนก่อน +2

      what do you think? we are in the Agile era now. Fail fast, fail often, QA is not needed.

  • @mhadi-dev
    @mhadi-dev 4 หลายเดือนก่อน +240

    "It's an organization failure" - A great programmer once said.

    • @roguegryphonica3147
      @roguegryphonica3147 4 หลายเดือนก่อน +31

      Because afterwards he was fired for not being a team player.

    • @alexanderSydneyOz
      @alexanderSydneyOz 4 หลายเดือนก่อน +3

      And a few rogue bank traders

  • @abg44
    @abg44 4 หลายเดือนก่อน +446

    This goes to show that outsourcing to one single third party for Kernel intrusion detection isn't the best idea ever, lol

    • @pluto8404
      @pluto8404 4 หลายเดือนก่อน +93

      or having universal automatic updates pushed to your machine.

    • @JxH
      @JxH 4 หลายเดือนก่อน +9

      So you want Norton, and McAfee, and Kaspersky, and CrowdStrike, and ... ALL installed at once ?

    • @lachlanmckinnie1406
      @lachlanmckinnie1406 4 หลายเดือนก่อน +54

      @@JxH More like some companies use product A, other companies use product B, not a single using all at once. To use an agricultural analogy, you want a security polyculture, as monoculture is vulnerable to disease.

    • @roganl
      @roganl 4 หลายเดือนก่อน +24

      @@lachlanmckinnie1406 The great clownstrike famine of `24.

    • @robertfiedor7559
      @robertfiedor7559 4 หลายเดือนก่อน

      m try

  • @BeepBoop2221
    @BeepBoop2221 4 หลายเดือนก่อน +195

    It's boeing all over again, engineers and QA replaced with suits.

    • @csibesz07
      @csibesz07 4 หลายเดือนก่อน +5

      Yeah. From reflex, I compared it to that disaster when explaining to others.

    • @BeepBoop2221
      @BeepBoop2221 4 หลายเดือนก่อน

      @@csibesz07 crowdstrike is now blaming businesses for not having disaster recovery!

    • @gezenews
      @gezenews 4 หลายเดือนก่อน

      replaced with slaves.

    • @allangibson8494
      @allangibson8494 4 หลายเดือนก่อน +12

      Also done in India in a “low cost engineering center”. Lunch time Friday roll out of updates…

    • @angkhoa1216
      @angkhoa1216 4 หลายเดือนก่อน +5

      @@allangibson8494Hopefully after this fiasco and trump’s being president, the damn suits can stop outsourcing important shits

  • @ericwelsh4853
    @ericwelsh4853 4 หลายเดือนก่อน +176

    I worked at a small Dot Com in the early 2000's. We had a QA process for pushing changes to the production web sites.
    After the QA department had tested a new release, the QA manager manually signed a form that was printed on a sheet of paper, then that sheet of paper was handed to the sysadmin responsible for deploying changes to production.
    Seems like a foolproof process?
    Nope.
    After working there a few months, the QA manager told me that the producers (product owners) were printing out those forms and forging the QA manager's signature.
    We had no idea we were pushing untested code to production, yet until we found out about this we were being blamed because the production web sites were unreliable.

    • @o1-preview
      @o1-preview 4 หลายเดือนก่อน +28

      worked this year in a company that had their QA not look at the code at all, have them just test the functionality by clicking shit on the website, than send to the client (which doesn't understand code) to test it by also clicking shit around and if QA and the client said ok it was pushed to production.. Once I said code needed to be reviewed I lasted another 2-3 weeks before getting fired! fuck those guys, I hope that reporting their asses actually made something happen, but I doubt it

    • @Kreze202
      @Kreze202 4 หลายเดือนก่อน +8

      Interned on a major national telcom company as a Security Business Partner, the company had quite a rigid pentesting system where every new system or update requires a form that requires 2 written signatures, one from the higher ups of the cybersec team that confirms that the new asset is good to go for prod and one from the dev team. Turns out some dev teams (the company had multiple dev teams for different projects) just pushed to prod anyway without ever having this signed form or even requesting the cybersec team for one.

    • @ericepperson8409
      @ericepperson8409 4 หลายเดือนก่อน +4

      It's still a more robust system than most software companies employ these days. Somehow Agile is thought to mean in a lot of teams, it if compiles, it's good to go.

  • @michaelogden5958
    @michaelogden5958 4 หลายเดือนก่อน +11

    I'm a retired IT guy, part of a team that did global pushes quite regularly. While a flaw in one of our pushes might "only" take down our presence on the web, there were layers upon layers of pre-push testing, staged releases, and so forth. I remember the pucker factor each and every time we did a "for real" push. I empathize when I hear of D'oh!!! misadventures.

  • @systematicpsychologic7321
    @systematicpsychologic7321 4 หลายเดือนก่อน +1657

    Regarding option 3: just wait to see if in 2025 you start hearing "The new government requested data that unfortunately was irrevocably lost during the Crowdstrike debacle."

    • @elderman64
      @elderman64 4 หลายเดือนก่อน +102

      Wouldn't be surprised to see that happening just in 2024 itself

    • @Uveryahi
      @Uveryahi 4 หลายเดือนก่อน +1

      😮! well not that 😮

    • @justsignmeup911
      @justsignmeup911 4 หลายเดือนก่อน +42

      Funny how that only happens to government systems

    • @remigiuszbloch
      @remigiuszbloch 4 หลายเดือนก่อน +87

      or Secret Service internal communication history was lost during Crowdstrike situation... as they say: don't let crisis go to waste...

    • @flowerofash4439
      @flowerofash4439 4 หลายเดือนก่อน +25

      don't tell me they are going to fly a plane straight to a server and blame the asians and their budhism...

  • @b4ttlemast0r
    @b4ttlemast0r 4 หลายเดือนก่อน +112

    What's crazy is that the update didn't even change any executable file. A change to a data file should not be able to crash the entire program and even operating system.

    • @AIrtfical
      @AIrtfical 4 หลายเดือนก่อน +11

      Not true a misconfigured config file yaml json toml files regularly cause parsing crashes however it’s unacceptable that a tool like this isn’t resilient to fail safely and gracefully. It’s running as the windows root or in the 0 layer perhaps it crashed it detected itself as a threat or the Os ? Unsure but static config can definitely causes crashes unsure why the bsod was happening unless the OS runtime requires this service to be running or fail this way which would be weird.

    • @Roboprogs
      @Roboprogs 4 หลายเดือนก่อน +7

      One level’s data is another level’s code, sometimes.

    • @opposite342
      @opposite342 4 หลายเดือนก่อน +4

      ​@@AIrtfical
      it's exactly what you said actually. The program forces itself as a requirement for windows to be functional

    • @BinToss._.
      @BinToss._. 4 หลายเดือนก่อน +14

      @@AIrtfical It's a boot-start driver. If any boot-start driver experiences an unhandled exception, the entire boot sequence fails. If Windows detects and disables a bad boot-start driver (I don't know if it can), the system would be running (yay), but it would violate company policy by running without a required software (uh-oh).

    • @obsolete959
      @obsolete959 4 หลายเดือนก่อน +7

      Kernel-level operations have to crash the system when encountering an error, because not crashing can lead to far worse outcomes when dealing with direct memory access. It is by design, and smart design at that.
      Now you can argue that not being able to boot without the faulty driver instantly after is not the smartest design, but that's on Crowdstrike for flagging their drivers are boot-start drivers.

  • @johnwilliams3075
    @johnwilliams3075 4 หลายเดือนก่อน +159

    "Failing upwards" seems to equal "They ~sure~ look great in a suit, let's promote them!". I've seen this over, and over, and over, over the last 30+ years, and it never ends well. It usually goes one of two ways:
    1. The person in charge of a thing ends up being so bad or disinterested in their job that some really important thing ends up spectacularly failing even though they avoid blame (ie. today's example), and they stick around to screw up the next thing they're put in charge of. Occasionally they suffer the consequences of their ignorance, but by then the organizational and repetitional damage is done.
    2. They muck around for a few years, cluelessly rising on the org chart until they shuffle off to some new employer who's even more impressed with their fashion sense, usually leaving behind a two-comma morass of overdue projects, impossible deadlines, expensive and inappropriate software subscriptions, disgruntled technical staff, and the like.

    • @planescaped
      @planescaped 4 หลายเดือนก่อน +35

      More that they know how to talk. The distance one can get simply by confidantly bullshitting your way through life is incredible.

    • @davidtitanium22
      @davidtitanium22 4 หลายเดือนก่อน

      I'm convinved that people need to be a certain level of psychopath to be "leaders" and it has nothing to do with their competence

    • @markmendez3939
      @markmendez3939 4 หลายเดือนก่อน +5

      Does anyone remember on what basis Israel chose their first king?
      ... That guy would look good in a crown

    • @XIIchiron78
      @XIIchiron78 4 หลายเดือนก่อน +18

      The thing to understand is that the C level doesn't work for the company or for the customers. They work for the shareholders. So CEOs who make obviously and openly stupid decisions outwardly are often just in effect cooking their books by sacrificing everything else to cut expenses and deliver a quarterly return. And then they bail with a great resume and a bunch of money before everything implodes. Or sometimes even after it implodes, because shareholders don't care and can easily move on to the next legacy brand with their gains. They know when to get out.
      This practice of corporate looting that pervades America started pretty much with Jack Welch who gutted GE while managing to earn an entire cult following for doing so.

    • @szilardfineascovasa6144
      @szilardfineascovasa6144 4 หลายเดือนก่อน

      @@XIIchiron78Someone that gets it.

  • @duotronic6451
    @duotronic6451 4 หลายเดือนก่อน +6

    When I was in IT, we would release security updates to IT computers & servers & volunteers a week before releasing to the rest of the company.

  • @obsolete959
    @obsolete959 4 หลายเดือนก่อน +8

    What's worse is that Crowdstrike updates bypass staging policies. So even the smart companies that run critical software updates in their own test systems first to make sure they don't break anything before updating all computers still got the CS update forced upon them. So not only did they ignore their own staging and testing policies, they also ignored everyone else's staging and testing policies.

    • @ShawnFumo
      @ShawnFumo 4 หลายเดือนก่อน

      Yeah, the problem seems to be that those staging/testing policies apply to new versions of the sensor, but not to the data definition files. Which might be ok in theory if they were actually bulletproof against bad data files. But no matter what, they shouldn't have sent out the update to all their clients at the same time. Even if they sent it to a few thousand and waited an hour before sending the rest, it probably would have been enough to prevent this huge disaster. Just bad policies on top of bad policies

  • @nova_supreme8390
    @nova_supreme8390 4 หลายเดือนก่อน +38

    The prosecutor: Show me on this graph where did Crowdstrike touch you?
    Windows: "points at the kernel and starts to sob"
    The prosecutor: I have no further questions, your honor.

  • @SamBrockmann
    @SamBrockmann 4 หลายเดือนก่อน +226

    Hiring George Kurtz for your C suite seems to be a bad idea.

    • @libertybelllocks7476
      @libertybelllocks7476 4 หลายเดือนก่อน +9

      He might as well retire after this.

    • @SamBrockmann
      @SamBrockmann 4 หลายเดือนก่อน +59

      @@libertybelllocks7476 , that's the problem: he probably will get hired as the CEO somewhere else if he wants to be. Give it a few years, and he'll be fine. Instead of, you know, being poor and unemployed, like he deserves.

    • @loggjohnable
      @loggjohnable 4 หลายเดือนก่อน +5

      He is the founder too

    • @SamBrockmann
      @SamBrockmann 4 หลายเดือนก่อน +17

      @@loggjohnable , which makes it even worse.

    • @JodyBruchon
      @JodyBruchon 4 หลายเดือนก่อน +15

      And yet they hired him for their C++ suite

  • @k98killer
    @k98killer 4 หลายเดือนก่อน +52

    That stock footage of the smiling people all flipping off the camera is golden

    • @TheOneWhoMightBe
      @TheOneWhoMightBe 4 หลายเดือนก่อน

      I think it was personal for the blonde in the background. 😂👌

  • @NFSHeld
    @NFSHeld 4 หลายเดือนก่อน +8

    By the way, Friday 26th is "Admin appreciation day", where you can thank your system administrators who probably spend their weekend reading up on the issue and rebooting all the machines in safe-mode to remove the problematic config file.

  • @misubi
    @misubi 4 หลายเดือนก่อน +5

    I worked in software QA for years. Insane that they literally didnt have a battery of various os configurations setup to test their builds on either in real or virtual forms before live updating. 😮

    • @klausstock8020
      @klausstock8020 4 หลายเดือนก่อน

      It's also crazy that apparently a lot of companies bought and deployed the CrowdStrike software without having their penetration testers penetration test it first.
      "Nah, the marketing guy from CrowdStrike said that they did that test."
      "Did you also ask whether their test was successful?"
      "Yes, but suddenly there were free bottles of Champagne and free ladies everywhere..."

  • @mistersunday_
    @mistersunday_ 4 หลายเดือนก่อน +153

    I opt for multidimensional lizard overlords, because incompetence is scarier

    • @madmax43v3r
      @madmax43v3r 4 หลายเดือนก่อน +5

      It does make sense, they like to do test runs before the main event.

    • @KatR264
      @KatR264 4 หลายเดือนก่อน +8

      This is probably why conspiracy theories have the following they do, in the face of the more likely reality of incompetence.

    • @ThePowerLover
      @ThePowerLover 4 หลายเดือนก่อน

      Why not both?

    • @christophkogler6220
      @christophkogler6220 4 หลายเดือนก่อน +8

      @@KatR264 That's a significant part of the reason. People are distressed by chaos, so they look for patterns and signs to explain things away, and also enjoy feeling like they know more than others. Put those together and you get conspiracy theories that both explain chaos and strange events and let them feel superior for 'seeing the truth'.

  • @mursie100
    @mursie100 4 หลายเดือนก่อน +124

    4:16 this Stockton Rush OceanGate meme is unhinged 💀

    • @ren3059
      @ren3059 4 หลายเดือนก่อน +13

      darkest image☠

    • @kv4648
      @kv4648 4 หลายเดือนก่อน +10

      "...willing to die on that hill" 💀

    • @beskamir5977
      @beskamir5977 4 หลายเดือนก่อน +4

      @@kv4648 More like valley.

    • @Roboprogs
      @Roboprogs 4 หลายเดือนก่อน

      Thanks for context. I thought it was a nuke, rather than a sub. Too tired tonight, I guess.

  • @Xhadp
    @Xhadp 4 หลายเดือนก่อน +13

    I immediately knew this was a management/structural problem not a simple IT/QA "standard" miss. So not at all shocked by that being one key takeaway lesson from this.

  • @jeraldbottcher1588
    @jeraldbottcher1588 4 หลายเดือนก่อน +4

    This boggles my mind as an IT professional. I was part of a team that deployed patches and software for years. This included OS deployment patch deployment, software deployment the whole thing on both Workstations and Servers. We tested our patches extensively before pushing them out to the entire population of the environment. This 1st included a sandbox environment, then a select user / system environment, then we would stage our patches out over several hours so if something happened we could back out before catastrophe struck. And honestly sometimes we would find problems with the patches, and we would be able to immediately stop, suspend and even back out.
    Yes we would use 3rd party vendor solutions to help with this, and any time we changed ANYTHING we would follow our testing procedures and matrix, normal business. We would never shirk our procedures to test 1st, then deploy. To me this is a total failure of IT Governance and failure to maintain standards. (IT Governance is setting and maintaining standards and policies for the IT Infrastructure)

    • @TheSacredDude
      @TheSacredDude 3 หลายเดือนก่อน

      Also an IT professional. You must be very lucky and VERY sheltered, because way too much of the industry works like this nowadays. It's the kind of thing that happens when you let the normals worm their way in. They immediately make a run on all the leadership positions that all the competent staff don't want to have to do anyway, and then they start getting rid of every policy, procedure, and precaution that could potentially stand in the way of their yearly bonus. Eventually, that shit metastasizes all the way up to the C-Suite, and that's when the seriously unethical and even illegal shit starts happening. I just got kicked off a project this month due to refusing to perform a task that the customer leadership made an extremely public show of ordering us not to touch. The PR hire who was told to take it over waited for months for their leadership to be sequestered in a multi-week meeting, then went psycho on my entire department until my management gave in. There was never any complaint that I was wrong, that I caused any problems, or that I crossed any lines. The official reason is that I was "seen butting heads" too many times. Meanwhile, this guy has almost completely destroyed one application, and is very likely going to tank an upgrade for another, MUCH more vital one by the end of the year.
      tl;dr....Stay where you are. NEVER leave that company.

    • @jeraldbottcher1588
      @jeraldbottcher1588 3 หลายเดือนก่อน

      @@TheSacredDude Alas I retired from that job and no longer have to fight any of those battles

  • @nac.mac.feegle
    @nac.mac.feegle 4 หลายเดือนก่อน +6

    The school of "Hey it compiled, it must work." I've been coding for almost 40 years. Yeah, I'm old. It drives me nuts that we do not learn lessons. Company hiring a guy who thinks delivering and using software is testing should have the entire C-suite fired. What happened to the concept of continuous integration, automated testing? Bosses are always too cheap, arrogant, impatient, whatever to put money into testing. And clients, to be fair, are also disinclined to plan for and budget testing.

    • @0LoneTech
      @0LoneTech 4 หลายเดือนก่อน

      There are languages where that's far closer to truth. Of course some people complain bitterly when GHC says their program is incomplete rather than produce a broken executable. Ada in particular was designed with this goal, published in 1983, but it likely never will get the huge marketing campaigns Rust or Java enjoyed.

  • @lullullullul
    @lullullullul 4 หลายเดือนก่อน +76

    Sheesh I love opening TH-cam to a fresh Fireship 🥺

  • @bdd2ccca96
    @bdd2ccca96 4 หลายเดือนก่อน +6

    a huge part of the blame must go to the CTOs of the corporations. they are the ones who are "testing in production" by allowing auto updates to run on production servers, and without a working DR plan.
    it is gross negligence that any change to production is not run in a testing environment first.

  • @tommy516
    @tommy516 4 หลายเดือนก่อน +109

    "Real Men Test in Production", such a great mem...er, process.

  • @Tony-dp1rl
    @Tony-dp1rl 4 หลายเดือนก่อน +6

    We've started calling the practice of deploying to Production without testing ... CrowdStriking

  • @astrodysseus
    @astrodysseus 4 หลายเดือนก่อน +2

    4:10 well it's both the employee and an organization issue. So many "developers" write bad codes (and that's such a gentle way to put it) and have zero professionalism about it. And organizational of course as knowing that, you have to create safeguards around deployments

  • @mrlunatic2022
    @mrlunatic2022 4 หลายเดือนก่อน +179

    I use arch btw

    • @TheExcarlos
      @TheExcarlos 4 หลายเดือนก่อน +5

      Blahahah

    • @pluto8404
      @pluto8404 4 หลายเดือนก่อน +9

      why hospitals and airlines dont run arch, is mind-boggling.

    • @OnurErtas-q1o
      @OnurErtas-q1o 4 หลายเดือนก่อน +1

      Dude, me too!

    • @infinityyworks
      @infinityyworks 4 หลายเดือนก่อน +10

      “It runs on my machine”

    • @L7CK7
      @L7CK7 4 หลายเดือนก่อน +1

      I test in prod btw

  • @AntonAdelson
    @AntonAdelson 4 หลายเดือนก่อน +23

    4:15 that version of "real men test in production" is ... WOW!

  • @rekire___
    @rekire___ 4 หลายเดือนก่อน +82

    First test, you must.
    Production testn't, you don't.
    -Yoda, coding of art

    • @alexandredevert4935
      @alexandredevert4935 4 หลายเดือนก่อน +12

      Never a Friday you release

    • @backslash68
      @backslash68 4 หลายเดือนก่อน

      if ( nullptr == ptr) thou shall write, the unintended equal operator in comparison to avoid (those are called "Yoda conditions" btw.)

  • @DustinRodriguez1_0
    @DustinRodriguez1_0 4 หลายเดือนก่อน +63

    Extra little detail: There wasn't so much a logic problem in the channel file... the channel file was null. Not zero size, but full of nothing but null bytes. And their kernel module apparently does ZERO checking for validity before trying to work with such files. Should be criminal negligence, but that is literally legally impossible since zero enforceable software standards of any kind exist.

    • @tetrahedrontri
      @tetrahedrontri 4 หลายเดือนก่อน +4

      I shudder at the day I let politicians describe how my code needs to be written. Yikes on that whole concept.

    • @NoConsequenc3
      @NoConsequenc3 4 หลายเดือนก่อน +2

      @@tetrahedrontri thankfully in the USA they've decided that judges are more important than experts when it comes to this kind of thing. Wouldn't want people knowledgeable in a field to make decisions in it.

    • @prezentoappr1171
      @prezentoappr1171 4 หลายเดือนก่อน +1

      ​@@NoConsequenc3
      This saddens me cuz most congress results are not consulted before with a task-force of experts.
      Also why license to a game sux than buying it steam vs Gog.
      Bruce Willis (lack of reference of an old article, prolly mads up but made headline anyway because of lack of cross checking vs ahoy physical games videos)

    • @prezentoappr1171
      @prezentoappr1171 4 หลายเดือนก่อน

      Extra detail: most development on digital laws are from the states - that Bruce Willis iTune article lawyer called for making arguments on that article or any licence to a game instead of owning the game case.
      I think from Eurogamer, but I know it from a hyperlink rabbit hole from chrome suggestions

    • @jobicek
      @jobicek 4 หลายเดือนก่อน +1

      But it's not just their negligence, it's also on the heads of people running those systems. When you have a critical system, one of the things you control is updates. Because every update is a potential disaster.
      Back at university, we had a simple rule - if your code crashes, you're finished; zero points. Never assume anything. Just because a specification says that you'll receive two integers doesn't mean that you'll always receive two integers. Always fail gracefully. There should be no input that causes your program to crash.

  • @matthewthomasomeara
    @matthewthomasomeara 4 หลายเดือนก่อน +9

    The speaker casually rolls over "staggered roll out" as if it's just one of a laundry list of safeguards. But isn't this kinda the big one? Code errors happen. Stagger the roll out and you minimize the damage.

  • @YuNherd
    @YuNherd 4 หลายเดือนก่อน +48

    my hunch is that the juniors are left to fend themselves to make release

  • @ilirlluka6789
    @ilirlluka6789 4 หลายเดือนก่อน +39

    Love the skit with Bret "The Hitman" Hart lecturing the computer nerds about dereferencing a null pointer.

  • @trevorkiwoi
    @trevorkiwoi 4 หลายเดือนก่อน +18

    I was watching this video when I remembered that I didn't check for a nullptr before attempting to dereference my variable. Thanks for the reminder

    • @Caellyan
      @Caellyan 4 หลายเดือนก่อน +1

      Now switch to rust and that won't happen 🤣

    • @FemboyCatGaming
      @FemboyCatGaming 4 หลายเดือนก่อน

      @@Caellyan Switch to rust and your program wont compile

    • @DankMemes-xq2xm
      @DankMemes-xq2xm 4 หลายเดือนก่อน +1

      @@FemboyCatGaming better not to compile, than to compile and have things break in unforeseeable ways

    • @FemboyCatGaming
      @FemboyCatGaming 4 หลายเดือนก่อน

      @@DankMemes-xq2xm rusts borrowing and shadowing system is far more convoluted then c pointers

  • @jackatk
    @jackatk 4 หลายเดือนก่อน +6

    4:53
    “Pre-planned in advance”
    Bruh

  • @avram202
    @avram202 4 หลายเดือนก่อน +1

    That slap in the back about the null pointer is how my father taught me everything in life, it worked wonders, I'm definitely using it on all my children

  • @axazexz1991
    @axazexz1991 4 หลายเดือนก่อน +137

    Looks Bjarne Stroustrup pulled all his hair out while creating the language.

    • @YuNherd
      @YuNherd 4 หลายเดือนก่อน +4

      he went malding?

    • @javabeanz8549
      @javabeanz8549 4 หลายเดือนก่อน +7

      Are you sure he didn't get that from writing C code? So he wrote C++ while he still had some hair left.

    • @csibesz07
      @csibesz07 4 หลายเดือนก่อน +10

      He put his hair into c++

    • @trumpetpunk42
      @trumpetpunk42 4 หลายเดือนก่อน +1

      Very relatable

    • @Roboprogs
      @Roboprogs 4 หลายเดือนก่อน

      @@javabeanz8549nah, I’m pretty sure it was the ++ that did it.

  • @homeboy_jay
    @homeboy_jay 4 หลายเดือนก่อน +23

    "Well, not so fast" @2:41 pie in the face ABSOLUTE FREAKIN GOLD 🤣🤣🤣

  • @ren3059
    @ren3059 4 หลายเดือนก่อน +66

    Real men test in production… Insert OceanGate meme

  • @Troy_Built
    @Troy_Built 4 หลายเดือนก่อน +3

    I've seen several places today that still have the computers messed up. They are running but something else is going on. Files that they try to retrieve are no longer there. Customer histories wiped out. One place said the computers came back up on Friday and this morning the server fried.

  • @douglasphillips1203
    @douglasphillips1203 4 หลายเดือนก่อน +4

    Never attribute to malice that which can be attributed to incompetence. They simply never expected the file to contain null bytes so they never checked for it.

    • @0LoneTech
      @0LoneTech 4 หลายเดือนก่อน +2

      When their business model is entirely predicated on claiming they're less incompetent than the vendors of actually needed software on the same system, and will cover for those, this level of incompetence is gross negligence at best.

  • @watt_the_border_collie
    @watt_the_border_collie 4 หลายเดือนก่อน +11

    I had to Google Bret Hart's clip about dereferencing a null pointer to know if it's AI generated or not. It just looked so random, but I was relieved it was a real footage

  • @theApeShow
    @theApeShow 4 หลายเดือนก่อน +4

    1:30 THIS IS GOLD! Where do you even find this stuff?
    Amazing. I love the internet.

  • @mrhaftbar
    @mrhaftbar 4 หลายเดือนก่อน +16

    The printer in Ring 0 is killing me.
    because it is true

    • @roganl
      @roganl 4 หลายเดือนก่อน +3

      That's an artifact of the 90's an the bane of MSFT support - Gots to luv us some 3rd Party drivers - THEY SUCK.

  • @kriscollinstunes
    @kriscollinstunes 4 หลายเดือนก่อน +2

    That transition to the ad read was, well, brilliant!

  • @thegame4027
    @thegame4027 4 หลายเดือนก่อน +15

    The memes and visuals have been next level this video

  • @BalvinderSingh-uh3my
    @BalvinderSingh-uh3my 4 หลายเดือนก่อน +11

    "Real men test in production… "I couldn't help myself but LOL, thumbs up for that alone.

  • @HarrisonLuiEKYiss
    @HarrisonLuiEKYiss 4 หลายเดือนก่อน +4

    4:30 I think this is the same thing as “CrowdStrike didn’t check their code”? There is a CNA article which states that CrowdStrike “Skipped checks”. The article also mentions that the update “should have been pushed to a limited pool first”.

    • @akin242002
      @akin242002 4 หลายเดือนก่อน +1

      Also, rolling it out on a Friday. Programmer sins check list completed.

  • @Lord_Omni
    @Lord_Omni 4 หลายเดือนก่อน +2

    We had a bunch of bunches of autotests updates by our tester. And they were failing regularly, and there was a tiny code that was showing who to blame o) And we had full testing before going to production, and still had minor rare occurring bugs there.

  • @rentabestfriend
    @rentabestfriend 4 หลายเดือนก่อน +1

    Damn the transition into the ad was super smooth i barely noticed it

  • @iAmTaki
    @iAmTaki 4 หลายเดือนก่อน +28

    5:06 what do you mean "most of us"? LMAO

    • @se7en2021
      @se7en2021 4 หลายเดือนก่อน +3

      Ww3

  • @JxH
    @JxH 4 หลายเดือนก่อน +5

    0:38 Once upon a time, I posted that picture of McAfee on FB and it was more-or-less immediately taking down.
    Sometimes I think that Anti-Virus / Anti-Malware companies were invented to make Microsoft look simply-excellent in comparison.

  • @andrewwalsh2755
    @andrewwalsh2755 4 หลายเดือนก่อน +5

    It probably was an organisational failure...
    Like Boeing, the manifestation is doors blowing off etc... but the real cause is unwise organisational changes... to boost profits, personal and corporate... at the expense of quality...
    I don't know if Crowdstrike has shareholders, but there will be pressure to increase profitability...
    ... outsource IT to India... employ under qualified, cheaper, staff etc... put pressure on managers to deliver... who put pressure on staff to deliver...
    ... and the manifestation is... global computer failure...

  • @tejonBiker
    @tejonBiker 4 หลายเดือนก่อน +1

    3:05 the perfect moment to include Miyasaki photo (Fromsoftware).

  • @maxdemontbron9720
    @maxdemontbron9720 4 หลายเดือนก่อน

    This is 10x better than all the other explanations I've seen up until now

  • @giantWario
    @giantWario 4 หลายเดือนก่อน +42

    I love how you just casually implied that most people in the world will be dead in the next two years.

    • @ImperativeGames
      @ImperativeGames 4 หลายเดือนก่อน +5

      Nuclear war.

    • @TheOneWhoMightBe
      @TheOneWhoMightBe 4 หลายเดือนก่อน +2

      The living will envy the dead.

    • @XIIchiron78
      @XIIchiron78 4 หลายเดือนก่อน

      AI is here. Most or all of humanity is about to become obsolete. Hell, even worse - they're just competition.

    • @bramvanduijn8086
      @bramvanduijn8086 4 หลายเดือนก่อน

      @@XIIchiron78 No it isn't, those are just next-word or next-pixel predictors. They don't understand anything and they're definitely not thinking. There's no "they" there to do the thinking. These are just glorified calculators. You could build one from vacuum tubes.

  • @DanSoloha
    @DanSoloha 4 หลายเดือนก่อน +8

    1:05 is the best stock footage I’ve ever seen 😂

  • @petersuvara
    @petersuvara 4 หลายเดือนก่อน +9

    Dave from Dave's Garage has the best description of why this happened.

    • @JxH
      @JxH 4 หลายเดือนก่อน +1

      Dave discussed two "Rings", 0 and 1. Here it's four "Rings", 0 to 3.

    • @Knirin
      @Knirin 4 หลายเดือนก่อน +8

      @@JxHWhile technically x86 has four security rings, in practice most operating systems use just two. Ring 0 for the kernel and Ring 3 or occasionally Ring 1 for all user code.

  • @chounoki
    @chounoki 4 หลายเดือนก่อน +3

    0:06 The blue ball is hilarious. Real or fake?

  • @MakeDataUseful
    @MakeDataUseful 4 หลายเดือนก่อน

    God these videos are getting polished, seamless ad transition

  • @Itsallfun3000
    @Itsallfun3000 4 หลายเดือนก่อน +16

    "But I can't test outside prod my data doesn't exist!"😅

  • @vincenzusgaming
    @vincenzusgaming 4 หลายเดือนก่อน +17

    I feel like the dev who made the mistake won't be punished. The entire fault definitely goes to the QA team

    • @BTrain-is8ch
      @BTrain-is8ch 4 หลายเดือนก่อน +25

      The dev shouldn't be punished. This sort of failure is an institutional one not an individual one.

    • @o1-preview
      @o1-preview 4 หลายเดือนก่อน +6

      the dev probably already got fired.. now, I have no idea if he'll get sued as well and if a judge would understand it

    • @ayylien3070
      @ayylien3070 4 หลายเดือนก่อน +2

      Nah the higherups 100% threw him under the bus.

    • @akin242002
      @akin242002 4 หลายเดือนก่อน +1

      When you push code to production on a Friday without peer review/QA process, you deserve to be fired.

  • @saaofficial5415
    @saaofficial5415 4 หลายเดือนก่อน +3

    Now I understand why Null Pointers are called as Billion Dollar Mistakes 💀

  • @konfushon
    @konfushon 4 หลายเดือนก่อน +1

    Okay that Brilliant ad fit perfectly. I didn't even see it coming

  • @ssgamer5693
    @ssgamer5693 4 หลายเดือนก่อน +1

    Cant believe brilliant is sponsoring this guy😂

  • @Tixnou
    @Tixnou 4 หลายเดือนก่อน +10

    Imagine if this happened to one of the computers that run the Matrix we live in

    • @tinad8561
      @tinad8561 4 หลายเดือนก่อน +1

      This is actually the big argument against simulation theory-an environment of that complexity running any length of time at all wouldn’t have glitches, it’d have bricked by now.

  • @DagothDaddy
    @DagothDaddy 4 หลายเดือนก่อน +200

    What really happened explained below:
    Management doesn't actually read your PRs.

    • @ethanfreeman1106
      @ethanfreeman1106 4 หลายเดือนก่อน +17

      it's worse than that. the entire organization is based around maximizing profit without putting in the work.

    • @leswine1582
      @leswine1582 4 หลายเดือนก่อน

      😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅😅

    • @NN-sp9tu
      @NN-sp9tu 4 หลายเดือนก่อน +8

      You need to test the hell out of any changes to the codebase if a failure can wipe out millions of computers. Human eyes on a PR are not enough

    • @madmax43v3r
      @madmax43v3r 4 หลายเดือนก่อน +1

      It's probably a 20 year old pile of shit, with high turn-over, nobody wants to maintain old crap.

    • @keejj
      @keejj 4 หลายเดือนก่อน +3

      There probably was a problem report, but management couldn't read it because their account didn't have access because they reduced the number of licenses for the tool because it was too expensive.

  • @me_12-vw1vi
    @me_12-vw1vi 4 หลายเดือนก่อน +5

    1:30 bro this made my day

  • @rodeleon2875
    @rodeleon2875 4 หลายเดือนก่อน +1

    if your CI/CD process does not include a checksum validation against known good code as its last step before deployment then you run an outdated process. this validation step was difficult to implement into an existing container based micro services process but proved invaluable many times over. i would think it would be relatively easy in a monolithic build like CS.

  • @jessechen6735
    @jessechen6735 4 หลายเดือนก่อน +1

    Thanks for putting together the video I will be using in this coming Wednesdays postmortem.

  • @DiniduPerera
    @DiniduPerera 4 หลายเดือนก่อน +10

    One of the big issues is that crowdstrike tries to do two things differently than their competitors:
    1. They want to be fastest to protect machines around the world from novel new malware techniques.
    2. They want their sensors to be extremely lightweight.
    There are two types of antivirus updates: Agent/Sensor updates, and Content updates.
    Agent updates are slowly rolled out by an IT organization. (This allows IT to test on and brick say, 10 machines before they go and brick 10,000.)
    Content updates (definition updates) are pushed to all machines, because what the bad guys are doing is constantly changing
    Most EDR software vendors make major changes to kernel-level detection logic with Agent updates. Because of Crowdstrike's goals however, they push most of that logic into Content updates. That philosophy and design choice has come back to haunt them.

    • @klausstock8020
      @klausstock8020 4 หลายเดือนก่อน

      CrowdStrike is better than that. Lol, you mention just 10 test machines! 8500000 machines will you you a much better coverage! /s
      Not sure whether the /s is really needed here 😉

  • @ujnbhy67
    @ujnbhy67 4 หลายเดือนก่อน +4

    The title is Gold.

  • @AyoDamilareMichael
    @AyoDamilareMichael 4 หลายเดือนก่อน +5

    I knew it's gonna be hard not to mention rust.

  • @rivernet62
    @rivernet62 4 หลายเดือนก่อน +2

    Having made a lot of money preparing for and avoiding Y2K side effects, I can say with confidence that Y2K has absolutely nothing in common with this single point failure.

  • @noclue7080
    @noclue7080 4 หลายเดือนก่อน

    I know absolutely nothing about coding or programming, but I keep watching these videos it makes me feel special

  • @GRamerDim
    @GRamerDim 4 หลายเดือนก่อน +3

    0:11 pyrocynical jumpscare reference