Why Did Facebook Go Down? - Computerphile

แชร์
ฝัง
  • เผยแพร่เมื่อ 4 ต.ค. 2021
  • Just what was it that took Facebook, Instagram & WhatsApp offline on 4th October 2021? - Dr Steve Bagley investigates!
    previously titled "Facebook's Day Off"
    Facebook announcement on the outage: bit.ly/C_FB-Outage
    The visualisation software Steve was using was BGP play, downloaded from the RIPE website.
    / computerphile
    / computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottscomputer
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

ความคิดเห็น • 2.1K

  • @HerrHeisenheim
    @HerrHeisenheim 2 ปีที่แล้ว +6074

    What's a phone book you ask? Well, it's like DNS, but for people and their telephone numbers.

    • @caniggiaful
      @caniggiaful 2 ปีที่แล้ว +352

      Couldn't compile. Please check for circular references.

    • @EarthboundApocalypse
      @EarthboundApocalypse 2 ปีที่แล้ว +84

      @@caniggiaful - Please reinstall compiler.

    • @sodiboo
      @sodiboo 2 ปีที่แล้ว +53

      How does a phonebook resolve ambiguous usernames? I know DNS only works on domains that already have to be unique, but player usernames don't.

    • @Phroggster
      @Phroggster 2 ปีที่แล้ว +32

      @@sodiboo Generally alphabetically, order ascending, but it might depend on your location I guess.

    • @sodiboo
      @sodiboo 2 ปีที่แล้ว +30

      @@Phroggster How.. what? How can two identical names be sorted alphabetically? That's still ambiguous and only defines the order they appear, not a strategy to select the desired number

  • @danielcarlossmd
    @danielcarlossmd 2 ปีที่แล้ว +629

    it's like Stackoverflow going down and their devs not having where to search for the fix

    • @MM-zw8sm
      @MM-zw8sm 2 ปีที่แล้ว +24

      😂😂

    • @HenryTitor
      @HenryTitor 2 ปีที่แล้ว +95

      Stackoverflow going down is the most terrifying thing one can say

    • @theKonamacona
      @theKonamacona 2 ปีที่แล้ว +35

      So many jobs would be lost on this day

    • @JonatasAdoM
      @JonatasAdoM 2 ปีที่แล้ว +4

      @@HenryTitor It would take all overflows with them.

    • @JonatasAdoM
      @JonatasAdoM 2 ปีที่แล้ว +15

      Reminds me about that Onion news where Congress wants to pass a law but they don't remember how to do it anymore. They even vote to call a retired member for help.
      At least I think it goes like this.

  • @jonathanjacobson7012
    @jonathanjacobson7012 2 ปีที่แล้ว +1082

    Ironically, Facebook's mission statement is best fulfilled when Facebook is disconnected from the Internet.

    • @Dutch3DMaster
      @Dutch3DMaster 2 ปีที่แล้ว +220

      A Dutch group advocating for internet privacy send out a tweet during the outage: "Wow! Facebook is adhering to practically all privacy-statements in the GPDR for about 5 hours already!" :P

    • @Dutch3DMaster
      @Dutch3DMaster 2 ปีที่แล้ว +12

      @@glas4849 Bits of Freedom ;).

    • @gerdsfargen6687
      @gerdsfargen6687 2 ปีที่แล้ว +3

      Whats their MS again? Making the world a better place ?

    • @Dutch3DMaster
      @Dutch3DMaster 2 ปีที่แล้ว +3

      @@gerdsfargen6687 I thought it was "Connecting people" which ironically was impossible :P.

    • @LaNoireDetruit
      @LaNoireDetruit 2 ปีที่แล้ว +11

      @@Dutch3DMaster That was Nokia.

  • @MrRyanroberson1
    @MrRyanroberson1 2 ปีที่แล้ว +189

    What's a phone book, grandpa?
    "Well, an old philosopher put it like this: it was the great book of doxing. It told you the names, numbers, and sometimes locations of everyone in town."

    • @MattRose30000
      @MattRose30000 2 ปีที่แล้ว +7

      From today's standards it's quite baffling that people would just put their names, phone numbers and adresses into a book that's given to everyone... But I guess social engineering wasn't nearly as advanced back then.

    • @bpj1805
      @bpj1805 2 ปีที่แล้ว +8

      @@MattRose30000 What should rather be baffling is that we tolerate people who harass others, who bring their petty squabbles into people's personal spaces. Without those people, doxing would be an impotent act.

    • @kilroy1964
      @kilroy1964 2 ปีที่แล้ว +2

      Phone books also make great makeshift boosters!

    • @trempton4106
      @trempton4106 2 ปีที่แล้ว +3

      It is also very interesting that because you have a phone number technically you could be disturbd in your everyday life by everybody else on the world that has a phone number.

    • @kilroy1964
      @kilroy1964 2 ปีที่แล้ว +2

      @@trempton4106 And this is a tactic actually used by authoritarian mobs these days.

  • @thedrusus
    @thedrusus 2 ปีที่แล้ว +8400

    The world briefly became a better place

    • @Slay_No_More
      @Slay_No_More 2 ปีที่แล้ว +305

      Imagine if all social media just snapped out of existence.

    • @JasminUwU
      @JasminUwU 2 ปีที่แล้ว +121

      @@Slay_No_More that would be bad because I would lose all my friends

    • @test4274
      @test4274 2 ปีที่แล้ว +87

      >look at me how cool I am
      and now go back to reddit, discord, Twitter, you subhuman soylent npc

    • @lukaspinoti107
      @lukaspinoti107 2 ปีที่แล้ว +240

      @@test4274 using 4chan style formatting, saying "soylent" and "npc". kinda cring

    • @altmaster3288
      @altmaster3288 2 ปีที่แล้ว +127

      @@JasminUwU Social media friends aren't real friends.

  • @ferdievanschalkwyk1669
    @ferdievanschalkwyk1669 2 ปีที่แล้ว +1347

    Love the Apple magic mouse in its natural position, upside down, asking you to recharge it.

    • @TiagoJoaoSilva
      @TiagoJoaoSilva 2 ปีที่แล้ว +27

      Savage (but true)

    • @JNCressey
      @JNCressey 2 ปีที่แล้ว +45

      aEsThEtIc

    • @mrnarason
      @mrnarason 2 ปีที่แล้ว +111

      Apple brilliant engineering so you can't charge and use the mouse at the same time

    • @brianransom16
      @brianransom16 2 ปีที่แล้ว +17

      @@mrnarason Great at making software, HW not so much.

    • @ThisNameIsBanned
      @ThisNameIsBanned 2 ปีที่แล้ว +36

      One the worst designed hardware in history.

  • @robertoricardoruben
    @robertoricardoruben 2 ปีที่แล้ว +288

    The phone book names are the same that the teacher calls in "Ferris Bueller's day off". Brilliant

  • @EllaJameson
    @EllaJameson 2 ปีที่แล้ว +27

    Today I learned that with the BGP protocol, you have to constantly scream that you still exist, or everyone will assume you died.

    • @NeonNotch
      @NeonNotch 2 ปีที่แล้ว +4

      @Dyanosis it’s not that deep, enjoy the joke

    • @JonatasAdoM
      @JonatasAdoM 2 ปีที่แล้ว +1

      @@NeonNotch It's neither a joke or a meme, yea I'm going there.
      It is more like a fun fact and I'm annoyed of reading "it's a joke" on most threads, even when it has nothing to do with an actual joke!

    • @EgonFreeman
      @EgonFreeman 2 ปีที่แล้ว

      @@NeonNotch It actually _is_ that deep, because your brain is doing that as well! If it doesn't get feedback from your body's senses, it'll start increasing the sensitivity to _provoke_ a reaction.

  • @allesarfint
    @allesarfint 2 ปีที่แล้ว +2005

    It was beautiful, but as everything beautiful it didn't last long.

    • @retmotiv
      @retmotiv 2 ปีที่แล้ว +3

      I love the song of your pfp

    • @VivekYadav-ds8oz
      @VivekYadav-ds8oz 2 ปีที่แล้ว +13

      A thing isn't beautiful because it lasts. - Vision

    • @WarDogYtPersonal
      @WarDogYtPersonal 2 ปีที่แล้ว +2

      Facebook played hide and seek with everybody

    • @ArgentavisMagnificens
      @ArgentavisMagnificens 2 ปีที่แล้ว +2

      Yes, and that's my secret for eternal life

    • @virmirus
      @virmirus 2 ปีที่แล้ว +3

      I dunno... Stars are beautiful.

  • @grahameida7163
    @grahameida7163 2 ปีที่แล้ว +549

    One thing I learned the hard way on this, whenever working on a remote router always set a timed reboot on the router before you make the configuration changes. So if the worst happens you just wait for the reboot into the saved configuration and not the running one. instead of jumping in your car to the remote location.

    • @golebiewsky
      @golebiewsky 2 ปีที่แล้ว +16

      JunOS and confirming commits FTW

    • @srproductions8798
      @srproductions8798 2 ปีที่แล้ว +7

      This would take too long for all to reboot and facebook would be down for 5 to 10 mins

    • @benneboii8117
      @benneboii8117 2 ปีที่แล้ว +96

      @@srproductions8798 which is alot less than ~6 hours.

    • @srproductions8798
      @srproductions8798 2 ปีที่แล้ว +3

      @@benneboii8117 true but thats 10 mins st first..in total it would be 2 hours evetually...because once it restarts the timer shifts by some seconds...plus it would need to restart in a specific order otherwise an error would occur

    • @sodiboo
      @sodiboo 2 ปีที่แล้ว +28

      @@srproductions8798 what? if it succeeded you'd cancel the timed reboot - wdym 2 hours eventually? how?

  • @Muthwill
    @Muthwill 2 ปีที่แล้ว +114

    I love how you're talking about DNS, IPs, ISP and several technical terms but need to explain what a phonebook is

  • @LoPhatKao
    @LoPhatKao 2 ปีที่แล้ว +208

    imagine a world where it took days rather than hours to fix
    personally, i think this timeline lost

    • @steveclem7873
      @steveclem7873 2 ปีที่แล้ว

      Benson&Hedges!

    • @ragnkja
      @ragnkja 2 ปีที่แล้ว +2

      @@steveclem7873
      Is that meant as a reference to _Cuckoo's Egg?_

    • @CFSworks
      @CFSworks 2 ปีที่แล้ว +1

      @@ragnkja My first thought as well

  • @marklonergan3898
    @marklonergan3898 2 ปีที่แล้ว +806

    It may have been a throwaway question, but i was talking to my 11 year old nephew a couple of weeks ago and i mentioned a phonebook and he didnt have a clue. His best guess was a diary that you write your names and numbers into but he couldn't get his head around the idea that we used to get a catalogue of everyones phone number posted out to us every year.

    • @metacob
      @metacob 2 ปีที่แล้ว +165

      I don't think I've used a phone book in this century at all... To be honest, your nephew has a point, it is weird! Nowadays you wouldn't just put your name and number in a public place.

    • @soul0360
      @soul0360 2 ปีที่แล้ว +51

      @@metacob Actually, at least in my country. Everyone who doesn't specifically opt out via their phone company. Has their name and number advertised on the internet, in searchable registries/"phone books".
      These same registries are what's used if you have an app installed, that gives you the name of an incoming number, that is not in your phones internal contact list.
      So I'd imagine, most people, including you, regularly use the equivalent of a phone book.

    • @SineN0mine3
      @SineN0mine3 2 ปีที่แล้ว +70

      You should have explained that we mostly used phone books as paper weights and door stops, looking up phone numbers in them was just a handy bonus.

    • @metacob
      @metacob 2 ปีที่แล้ว +48

      @@SineN0mine3 Also: feats of strength. You're not really strong unless you can rip a phone book in half.

    • @metacob
      @metacob 2 ปีที่แล้ว +7

      @@soul0360 that's scary, what about phone scams and doxxing?

  • @FlorianEagox
    @FlorianEagox 2 ปีที่แล้ว +641

    it's kinda wild to me how people LOST THEIR MINDs over this.
    Back only a few years ago, stuff would go down and you'd just say, "idk, shit's down right now, I'll try again later"

    • @archivethearchives
      @archivethearchives 2 ปีที่แล้ว +163

      I read a lot of people explaining that in a lot of countries, whatsapp is a very common primary source of communication through phones because texting is not cheap in a lot of places. I am not sure about the full context but with a pandemic happening I know remote communication is pretty essential to staying in touch.

    • @FlorianEagox
      @FlorianEagox 2 ปีที่แล้ว +49

      @@archivethearchives ah, okay that makes sense then. For folks that depend on it, it's totally reasonable.

    • @Fataha22
      @Fataha22 2 ปีที่แล้ว +25

      @@archivethearchives can confirm
      Here 1 gb data plan equal 4 sms

    • @gabrielcoelho1623
      @gabrielcoelho1623 2 ปีที่แล้ว +73

      @@archivethearchives this is pretty much it. I live in Brazil and I can tell you: NO ONE here uses SMS or iMessage or whatever. The only texts you'll receive are from your phone carrier, your bank and those phone confirmation texts when you sign up for something. Day to day conversations with friends, school work, even whole businesses revolve around Whatsapp over here. When it's 100% down like that, it's truly bad.
      It's not just because of the pandemic either, it's always been this way. Well, not always, but I'm pretty sure we've been using for at least family and friends communication since before the acquisition by Facebook in the early 2010s.

    • @animewow311
      @animewow311 2 ปีที่แล้ว +33

      @@archivethearchives It's not only cost. It's cultural. Where I live SMS is not particularly expensive, but the idea of using it for communication is very unnatural. Like, I don't think people even consider it. Whatsapp is simply the regular mean of communication.

  • @4lecsg
    @4lecsg 2 ปีที่แล้ว +167

    "The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at and repair."
    Douglas Adams

    • @steveclem7873
      @steveclem7873 2 ปีที่แล้ว +1

      wrong=thawteEN2zAA!

    • @kanjakan
      @kanjakan 2 ปีที่แล้ว +22

      That's actually pretty true, yeah. When you believe that something cannot possibly go wrong, you usually don't even try to have any failsafe for it.

    • @well_as_an_expert_id_say
      @well_as_an_expert_id_say 2 ปีที่แล้ว +3

      My mind immediately went to the Titanic and the number of lifeboats on board. They're just taking up space anyways, the titanic could never be sunk

    • @EgonFreeman
      @EgonFreeman 2 ปีที่แล้ว

      That's pretty on-point. This is why the key first step in disaster management planning is to reliably identify _what can actually break,_ and how. You first have to accept that something could break, so that you design the thing with service accessibility in the first place.

  • @fmdj
    @fmdj 2 ปีที่แล้ว +215

    We need to start explaining things the other way around: a phone book is like a DNS for phone numbers.

    • @KaiHenningsen
      @KaiHenningsen 2 ปีที่แล้ว +11

      Actually, a phone book is more like the HOSTS.TXT file mailed out or downloaded from SRI-NIC - the thing that caused DNS to be invented because it didn't work so well, one list of all hosts. The only remains left of that are the local /etc/hosts files (I believe they still have the same syntax).

    • @fmdj
      @fmdj 2 ปีที่แล้ว

      @@KaiHenningsen interesting, didn't know about that

  • @locust76
    @locust76 2 ปีที่แล้ว +1528

    I don’t understand how Facebook didn’t have an out-of-band back door in case their ASN completely vanished from routing tables everywhere. I mean, these guys helped write the book on data center infrastructure Best Practices

    • @casperes0912
      @casperes0912 2 ปีที่แล้ว +190

      Read several of their research papers in my work on creating CRDT systems. They do indeed tend to know what they're doing

    • @lyoselli
      @lyoselli 2 ปีที่แล้ว +138

      they use BGP all the way to their top-of-the-rack switches. could be that they screwed up their internal network too.
      if that's the case, though, then why don't they have an out-of-band management network based on other hardware than their main net, though. I don't get it.

    • @ferdievanschalkwyk1669
      @ferdievanschalkwyk1669 2 ปีที่แล้ว +105

      I am guessing their management net has more traffic, redundancy and complexity than most people production nets. For all we know, they borked the firmware on all the routers.

    • @jaylewis9876
      @jaylewis9876 2 ปีที่แล้ว +46

      Perhaps they decided there might be a need to disconnect from the world. Like Switzerland having explosives on the bridges into the Valley as a way to make invasion hard (predating aircraft). Having alternate back doors might be bad.

    • @jonathanmitchell9779
      @jonathanmitchell9779 2 ปีที่แล้ว +97

      Concur. The notion that a trillion dollar corp doesn't have a separate physical network in place for remote management, in case of disaster, sabotage, etc, seems completely absurd to me. Giving them a pass on this less than 24hrs after because it 'must have been a bgp config mistake' is equally ridiculous. It will be very interesting to see what actually happened.

  • @gasdive
    @gasdive 2 ปีที่แล้ว +347

    A phone book is like a DNS server that's updated every 12 months.

  • @AttilaSVK
    @AttilaSVK 2 ปีที่แล้ว +157

    This reminds when I worked on tech support at an antivirus company. The IT admin of a mining company called us that he had accidentally pushed the wrong configuration out all of their computers, including the mission critical ones down in the mine. I cannot recall what his config was meant to look like, but the wrong one was set up to run a total scan of all hard drives every 10 minutes. Since a single scan took significantly longer than 10 minutes to complete, many parallel scans were running, slowing the system down even more and every 10 minutes a new scan job would start.
    We had to use a special, last resort tool, which would kill the antivirus kernel, but getting hold of it was hard (it was given out only with a special permission).
    My advice for that IT admin was to try new antivirus configurations on his own PC first (or get a sacrificial one for tests), then push out the config to 10 machines (preferably of tech savvy people). If there are still no problems, push out the config to 20-50 more computers, and so on, but never ever do it to the entire network.

    • @Dutch3DMaster
      @Dutch3DMaster 2 ปีที่แล้ว +8

      That's not really possible with something like the internet, there might be a small propagation delay after making the configuration change, but it's pretty much a self-learning thing from there (hence why some experts consider this the weakest link of what makes the internet work, because self learning protocols can also cause the downtime of one very important link to possibly overload another, much weaker one).

    • @hillaryclinton2415
      @hillaryclinton2415 2 ปีที่แล้ว +4

      So..you are saying pushout the vaxx for a small group first?

    • @Phourc
      @Phourc 2 ปีที่แล้ว +29

      @@hillaryclinton2415 Isn't that just clinical trials?

    • @Crazmuss
      @Crazmuss 2 ปีที่แล้ว +1

      You suppose to have test environment actually. Probably even several test environments.

    • @Dutch3DMaster
      @Dutch3DMaster 2 ปีที่แล้ว +4

      @@hillaryclinton2415 No. You can't compare stuff like this with vaccination campaigns in any way...
      Once a change is made in the internet and the configuration in this case has been made permanent (there's a Cisco term for it that I just can't recall...), the changes will be quick, slightly delayed maybe, but quick enough from you being able to stop the damage being done.

  • @kwanarchive
    @kwanarchive 2 ปีที่แล้ว +95

    Whoever slipped in that Bueller Bueller Bueller joke in deserves a raise.

  • @AlanTheBeast100
    @AlanTheBeast100 2 ปีที่แล้ว +202

    I like that FB designed things to lock themselves out of FB.

    • @EvilNeuro
      @EvilNeuro 2 ปีที่แล้ว

      What?

    • @Soken50
      @Soken50 2 ปีที่แล้ว +28

      @@EvilNeuro Facebook's DNS architecture was briefly a "keys locked in the car and I'm outside" kind of situation until they managed to sort it out

    • @EvilNeuro
      @EvilNeuro 2 ปีที่แล้ว

      @@Soken50 oh my bad.

    • @michaelmwasela5249
      @michaelmwasela5249 2 ปีที่แล้ว +7

      @@Soken50 SO did they break the windows? Or they had to let some guys into the server rooms without key cards? lol

    • @Soken50
      @Soken50 2 ปีที่แล้ว +12

      @@michaelmwasela5249 I don't have the details but I imagine they found a way to communicate without using the Internet so the people inside could get the credentials from those outside and unbrick their very heavy and expensive paperweight

  • @farhanmahalludin
    @farhanmahalludin 2 ปีที่แล้ว +117

    People thought that their internet went kaput during Facebook's outage, even my ISP had to clarify that it was Facebook's fault and not their network's. Just shows how large Facebook is.

    • @sujimayne
      @sujimayne 2 ปีที่แล้ว +13

      I heard some kids on a bus talking about it and saying "Yeah, I felt so disconnected", "It's like my whole internet was down", "I was so bored, didn't know what to do" and such.
      The only thing I noticed was WhatsApp, and so I just used Viber, Signal and normal messaging. It's important to always hsve backups, alternatives and not keep all your eggs in one basket.

    • @Dutch3DMaster
      @Dutch3DMaster 2 ปีที่แล้ว +8

      In some countries or very remote areas, Facebook provided an actual internet connection and it was people's gateway to a part of the internet (usually sites Facebook allows to be visited, not everything, but still).
      In my country where this is not the case, I was mainly annoyed by news outlets reporting on this as a problem with the internet, or as if with this not working my society was coming to a standstill, and talking about withdrawal symptoms...
      I had hoped at least news outlets be a little more sensible in recognizing there is so much more to do outside Facebook, one of them even went "Now it was Facebook, but let's think about the risk of Google going down for longer than this."

    • @boiwaif
      @boiwaif 2 ปีที่แล้ว

      @@sujimayne For some people the internet is facebook

    • @steveclem7873
      @steveclem7873 2 ปีที่แล้ว

      lloyk has2be reeezykilld spam roytasszz!!?

    • @Roxor128
      @Roxor128 2 ปีที่แล้ว +7

      Meanwhile, I never even noticed, because I don't use Facebook, or anything they own.

  • @RustyBrakes
    @RustyBrakes 2 ปีที่แล้ว +116

    So could one make an analogy that it's like a company deleting their address from their website, revoking it from Google maps and any other database? The roads could still take you there, but knowledge of where it is quickly dies out

    • @AMTunLimited
      @AMTunLimited 2 ปีที่แล้ว +26

      More like they took down all of the signs on the building. Then the Google maps car drives by, noticed that they took it down, and deleted the entry themselves, then Uber and Lyft noticed that Google Maps deleted it and removed it from their routes and so on

    • @EgonFreeman
      @EgonFreeman 2 ปีที่แล้ว +6

      @@AMTunLimited This is actually a very nice explanation for the discussed mode of failure, because even if _you_ happen to know it's still there, you can't get directions on Google, or a lift there because Uber doesn't care what _you_ know. :)

  • @Novacification
    @Novacification 2 ปีที่แล้ว +48

    When I was studying to be a software engineer we had a semester on networks. We were assigned network equipment and tasked with setting up a network with security, services and so on. It was common for students to update the router configuration resulting in cutting off the connection that they were using to configure the router and having to get help to reset things. Almost everyone did this (including our group) and it was like a rite of passage.

    • @gustaf0902
      @gustaf0902 2 ปีที่แล้ว +8

      Been there, done that

    • @babybirdhome
      @babybirdhome 2 ปีที่แล้ว +6

      Mine was creating a firewall DENY rule for a single IP with a /0 instead of /32. OMG that was an awful day!

    • @Dutch3DMaster
      @Dutch3DMaster 2 ปีที่แล้ว +2

      That sounds like Cisco lessons, we had to do that as well. I liked the practical side of the lessons a lot, and even managed to choose a router once that after all the troubleshooting commands in the book according to me was deemed broken on one of it's interfaces, something later also confirmed by someone from the IT department :P.

  • @OCPyrit
    @OCPyrit 2 ปีที่แล้ว +264

    It's remarkeable that one update affected all their servers. I imagined their EU and NA networks for example would be kind of seperated.

    • @Megaranator
      @Megaranator 2 ปีที่แล้ว +21

      Could be that EU network was dependent on the NA network.

    • @llearch
      @llearch 2 ปีที่แล้ว +51

      The issue is that the update in question nobbled (amongst other things) the authoritative DNS servers. So while everything else might be separated, if DNS goes down, EU could well be up and running happily, but unable to be contacted due to the US-side DNS servers being down.
      Redundancies for this would be having a separate set of DNS servers over in the EU, but that's likely not happened for other reasons - GDPR and similar things might have impacted how FB run their internal network. Or they might have been working on it, but it's not yet functional. Or they have it, but the secondaries require the primaries to be online. Or they got both with one shotgun blast. We really don't know, for all that the news is claiming.

    • @benjaminmiddaugh2729
      @benjaminmiddaugh2729 2 ปีที่แล้ว +5

      A fairly major hosted application provider recently had an outage caused by an update module on their Gentoo Linux-based systems deleting the approved software package list. This caused all systems that received the update to remove all the software installed on the server, which made them effectively useless (and difficult to fix). They got it fixed, but it took half a day.

    • @olivier2553
      @olivier2553 2 ปีที่แล้ว +4

      What was not said here is that BGP operates not on routers, but on what is called autonomous systems (AS) and it seems FB has only one (or two AS) even if they have multiple data centers across the world. So if you mess up with BGP, your AS becomes unreachable, all your datacenters are unreachable at the same time.

    • @autohmae
      @autohmae 2 ปีที่แล้ว +2

      @@olivier2553 it looks like the internal network was also broken, most likely it stopped talking to the outside world because it didn't have an internal network anymore.

  • @freezombie
    @freezombie 2 ปีที่แล้ว +148

    What I found really interesting is how long it took to get back. You're absolutely right that there's no way to fully prevent something like this from happening (and let's assume they did everything they could), but if you can believe the rumours about keycards, locksmiths, and angle grinders, they probably could have designed their physical security in a way that makes recovery in the event of a catastrophic failure a lot easier than it was.

    • @olivier2553
      @olivier2553 2 ปีที่แล้ว +15

      You have to factor the pandemic situation: in normal situation, they would have ppl inside the datacenters 24/7, but with WFH everyone was locked out.

    • @VivekYadav-ds8oz
      @VivekYadav-ds8oz 2 ปีที่แล้ว +5

      @@olivier2553 bro imagine being locked out in your corporation's data centre. That would be a hellhole for sure.

    • @eigenl
      @eigenl 2 ปีที่แล้ว +18

      Ease of use and secure are always on the opposite ends of the same scale. If you make it easy to recover, you make it easy to attack as well.

    • @olivier2553
      @olivier2553 2 ปีที่แล้ว

      @@VivekYadav-ds8oz I am the one having the key to override the system :)

    • @freezombie
      @freezombie 2 ปีที่แล้ว +5

      @@eigenl yes, of course. But the rumour is that the key card system used to open the doors also went down with the rest of it, which suggests to me that the security system may not have been sufficiently independent/out-of-band.

  • @flamencoprof
    @flamencoprof 2 ปีที่แล้ว +42

    The national network I have worked on since 1975 used to have a thing called "the Shout-down". It consisted of analogue audio circuits over copper only, between major control rooms or what-have-you. Everyone had to call in every morning to check its integrity. It was intended to circumvent network failures and keep communication alive robustly between these places. (As a young man then it reminded me of those old marine speaking tubes from the early 20th century; Phweet! "Half astern!" "Aye-aye, Captain! Half astern it is!")
    I knew the rot had set in when about a decade ago some genius decided that it was old-fashioned copper tech and they moved it to a VOIP connection.

    • @absalomdraconis
      @absalomdraconis 2 ปีที่แล้ว +4

      Sorry to clue you in, but here's a dirty little secret: if those were lines leased from the telephone company, then it _wasn't_ an analogue direct connect anymore, just a digital simulation. And for any "national network", well, unless they _were_ a telephone company then it was pretty much guaranteed to be a lease.

    • @Dutch3DMaster
      @Dutch3DMaster 2 ปีที่แล้ว +2

      @@absalomdraconis Pretty sure they were analog, my country has been quite at the forefront in converting stuff to digital because of ADSL coming up at the end of the 90's and I think the first fiber optic cable between phone hubs was layed down somewhere mid-80's, so flanencoprof was definitely talking about analogue-over-copper, we even used a direct 1-to-1 copper connection up untill 2015 at the local radio station to connect to our transmitter site, a type of connection then already phased out by the national network operator in 2005.

    • @steveclem7873
      @steveclem7873 2 ปีที่แล้ว

      @@absalomdraconis CEN2REunWhileHowelzaRT10hexBintvEMJurmz!

  • @cheesecakedoublepeanutbutt6511
    @cheesecakedoublepeanutbutt6511 2 ปีที่แล้ว +169

    Whoever did that, thank you for making this world a better place, even just for several hours.

    • @8koi245
      @8koi245 2 ปีที่แล้ว

      I heard it was an intern lol

    • @Hxcker471
      @Hxcker471 2 ปีที่แล้ว

      @@8koi245 🤣🤣🤣🤣

    • @dianeconrardy829
      @dianeconrardy829 2 ปีที่แล้ว

      Why be down on Facebook?

  • @tednewkumet
    @tednewkumet 2 ปีที่แล้ว +166

    Somewhere out there there are now two people one with a C64 the other an Atari 800XL with dial up modems. Their whole job is to call into the secret Facebook BBS interface to jump start the network the next time this happens.

    • @NickiRusin
      @NickiRusin 2 ปีที่แล้ว +16

      all we have to do is kidnap these men

    • @eekee6034
      @eekee6034 2 ปีที่แล้ว +1

      LOL! Best comment I've seen today. 🤣

    • @peter2327
      @peter2327 2 ปีที่แล้ว +5

      Nice idea :) But at least here in Germany, analog landlines have been replaced with IPonly DSL & VoIP, so modems are useless.

    • @Tudorgeable
      @Tudorgeable 2 ปีที่แล้ว

      @@peter2327 this is why we can't have nice things.

    • @peter2327
      @peter2327 2 ปีที่แล้ว

      @@Tudorgeable for example ENiGMA½ BBS uses openssh as a "modem" Other than that there are other means of concealed and private communication.

  • @Danny-hj2qg
    @Danny-hj2qg 2 ปีที่แล้ว +106

    I also recall phonebooks having addresses and advertisements for local companies.

    • @autohmae
      @autohmae 2 ปีที่แล้ว +8

      Didn't you mean Yellow Pages ?

    • @eekee6034
      @eekee6034 2 ปีที่แล้ว +4

      @@autohmae In Britain, Yellow Pages was a separate organization to British Telecom and published separate books. I'm not sure I remember exactly how it worked, but I think businesses got listed in a separate section at the back of the phone book. I do remember they were first allowed to buy adverts (or at least more prominent listing) in the phone book. It was a little bit scandalous because British Telecom was a national institution, not some for-profit company like Yellow Pages.

    • @steveclem7873
      @steveclem7873 2 ปีที่แล้ว

      @@autohmae 2ohm\son\lcorls\aaahYuhzzzRedOgrezLawlurch4HMedgCheepz2FatoyENtrollaz

    • @ragnkja
      @ragnkja 2 ปีที่แล้ว +1

      @@eekee6034
      Was the business section (not Yellow Pages) on pink paper, like it was in Norway the last couple of decades before they stopped printing phone books?

    • @eekee6034
      @eekee6034 2 ปีที่แล้ว

      @@ragnkja Now you mention it, I think it was. Pink pages in a blue book.

  • @88Xlmk
    @88Xlmk 2 ปีที่แล้ว +86

    It's like the South Park episode when the Internet went down, but for Facebook. :D
    Edit: I remember a story from the husband of my sister. He is a Cisco architect working for a very big and well know company at the time. One day he came back home laughing, because one of his colleagues made a mistake when entering IP addresses. Apparently he copy/pasted the wrong address and caused a power outage in a city in China.

  • @alangraham6164
    @alangraham6164 2 ปีที่แล้ว +57

    A minor correction is that iBGP is not a separate protocol for use internally but just part of BGP that is called Internal BGP (iBGP) when the peering between devices is within the same AS (autonomous system). An AS is defined by its AS number and if they are the same then that’s iBGP.

    • @abg44
      @abg44 2 ปีที่แล้ว

      So there can be separate AS with the same number give by the IANA ?

    • @KaiHenningsen
      @KaiHenningsen 2 ปีที่แล้ว

      @@abg44 Given that the number _is_ what defines an AS, the question doesn't make sense.

    • @michaelmwasela5249
      @michaelmwasela5249 2 ปีที่แล้ว

      @@abg44 YOur not allowed to think that way, no no no!

    • @abg44
      @abg44 2 ปีที่แล้ว

      @@KaiHenningsen That was my fault reading the question wrong after pulling a night shift lol. My mistake

    • @abg44
      @abg44 2 ปีที่แล้ว

      @@michaelmwasela5249 woah woah !!

  • @rabidbigdog
    @rabidbigdog 2 ปีที่แล้ว +10

    'A JCB through a fibre-optic' - such an understatement and incredible how often it happens.

    • @steveclem7873
      @steveclem7873 2 ปีที่แล้ว

      goldtfeesch hazza no katraxpilloz!![sparc no alfees!]

  • @ForSquirel
    @ForSquirel 2 ปีที่แล้ว +108

    It honestly couldn't have happened to a better company.

    • @aknkrstozkn
      @aknkrstozkn 2 ปีที่แล้ว +1

      google.

    • @kanjakan
      @kanjakan 2 ปีที่แล้ว +6

      @@aknkrstozkn That will probably legitimately break the internet though.

    • @ToastGreeting
      @ToastGreeting 2 ปีที่แล้ว

      @@kanjakan just use bing

    • @bruhSaintJohn
      @bruhSaintJohn 2 ปีที่แล้ว

      @@ToastGreeting eeeeeewwwwwww dude!

  • @AndyChamberlainMusic
    @AndyChamberlainMusic 2 ปีที่แล้ว +5

    I'm 20 and clearly remember going through phone books as a kid

  • @stephenmatura1086
    @stephenmatura1086 2 ปีที่แล้ว +90

    "Let your fingers do the walking." A phrase more apt nowadays than when it used to advertise Yellow Pages.

    • @Fogmeister
      @Fogmeister 2 ปีที่แล้ว +1

      Oh! I never realised why that was the logo on the yellow pages! Makes sense now!

    • @renragged
      @renragged 2 ปีที่แล้ว

      Hahaha. Soooo true!

  • @pgriggs2112
    @pgriggs2112 2 ปีที่แล้ว +156

    Their badge access failed because they had no route to their service provider. A local authenticator or “break glass” process would work. Physical structures have “Fireman” mode, allowing access to First Responders.

    • @FlorianEagox
      @FlorianEagox 2 ปีที่แล้ว +18

      Ooh Freeman mode?
      Crowbar time!

    • @hanelyp1
      @hanelyp1 2 ปีที่แล้ว +5

      A local cache of authorized access, in case the central database can't be reached.

    • @rhettorical
      @rhettorical 2 ปีที่แล้ว

      @@hanelyp1 Unless the internal network was also down, then the keypads wouldn't work either.

    • @absalomdraconis
      @absalomdraconis 2 ปีที่แล้ว +3

      @@rhettorical : The point of a "break glass" route is resilience _in case of_ such failures.

    • @hanelyp1
      @hanelyp1 2 ปีที่แล้ว +1

      @@rhettorical Keypad wired direct to a computer, which normally queries a master database but has a local cache, and wired directly to the door lock. Of course the local computer and wires to the lock need to be physically secured.

  • @JennaGetsCreative
    @JennaGetsCreative 2 ปีที่แล้ว +59

    The phone book entry "beuler, f" is awesome 😂

    • @ikagura
      @ikagura 2 ปีที่แล้ว +2

      Save Ferris

  • @exponentmantissa5598
    @exponentmantissa5598 2 ปีที่แล้ว +63

    Everybody has a robust resilient network until the day it goes down. Same with security, everyone has a secure network until it gets hacked. One thing I have see in IT big time is that they develops all these processes, procedures and policies to deal with emergencies and disasters but rarely do they ever test it.

    • @Duplicitousthoughtformentity
      @Duplicitousthoughtformentity 2 ปีที่แล้ว +7

      I don’t know if it has a name but I call that the Preparedness Paradox. People always feel so bright and proud of their contingency plans, but indeed, they never test them… like buying a generator for a hurricane, but not stocking up on gasoline.

    • @exponentmantissa5598
      @exponentmantissa5598 2 ปีที่แล้ว +4

      @@Duplicitousthoughtformentity Yup. I know a guy that worked for a major telco. They had a loss of power and their UPS systems got drained. What was supposed to happen was that they had a generator that was supposed to kick in. The generator had been tested from time to time but no one ever did an entire system test. What they discovered was that the switchgear blew up the second they tried to switch to backup power. The hardware that failed was worth 2K but the loss due to violations of SLAs and other costs was in the tens of millions. But worse yet they discovered that many of their systems had been running for years and trying to restart everything led to yet another round or failures and race conditions.

    • @JonatasAdoM
      @JonatasAdoM 2 ปีที่แล้ว +1

      @@Duplicitousthoughtformentity No, not even not stacking up on gasoline.
      *Literally turning ON the generator to check if the thing actually works!*

    • @EgonFreeman
      @EgonFreeman 2 ปีที่แล้ว +1

      An emergency procedure (like a backup) that's not tested _does not qualify as an emergency procedure._ At least for me. If it's not tested, _assume it doesn't work_ and plan accordingly.

  • @DevilbyMoonlight
    @DevilbyMoonlight 2 ปีที่แล้ว +10

    BGP updates very sloooowly... and is full of wonderful little quirks and traps for the unwary, which made it so much fun learning nearly 20 years ago....

  • @dsaints2344
    @dsaints2344 2 ปีที่แล้ว +39

    The scariest thing for me, during the outage, it made me appreciate open-source, decentralized software even more, because, I get the advantages of having a centralized infrastructure allowing better monitoring of the services of the various products of a company, allowing better communications with each other (integration)... But this is the edge case scenario that reminds us to not build everything under the same roots

    • @gorillaau
      @gorillaau 2 ปีที่แล้ว +7

      The same issue arises with cloud computing. It's someone else computer.

    • @deoabhijit5935
      @deoabhijit5935 2 ปีที่แล้ว

      It's more like blockchain

    • @EgonFreeman
      @EgonFreeman 2 ปีที่แล้ว +2

      @@gorillaau And one should _never_ forget that. The "Cloud" is an abstraction, and as Joel Spolsky teaches - they eventually _all_ leak in some way, _no exceptions._

  • @MeneGR
    @MeneGR 2 ปีที่แล้ว +22

    I have learned the hard way: "Reload in 5". If in 5 minutes I'm not able to cancel the reload, that's my lifeline.

    • @typo691
      @typo691 2 ปีที่แล้ว

      What does this mean?

    • @therealnemini
      @therealnemini 2 ปีที่แล้ว +5

      ​@@typo691 On Cisco routers "reload" means the current configuration is not saved and the router rebooted. You can schedule the reload by using the "in" parameter plus a number of minutes. First call "reload in X", then change the configuration. If you locked yourself out, the router will reboot after however many minutes you specified using the previous configuration, and you can get back in.

    • @typo691
      @typo691 2 ปีที่แล้ว +3

      @@therealnemini Oh, very cool thanks

    • @steveclem7873
      @steveclem7873 2 ปีที่แล้ว

      @@typo691 livapoolizbombaDaddyzinvadingBatolSpaizeeNCraveArse

  • @aland7236
    @aland7236 2 ปีที่แล้ว +17

    This reminded me of an event that happened here in the US back in something like 2014. An engineer at Verizon made some bad commit to a routing configuration and brought down a substantial portion of the internet in the States. That was a fun day. On October 4 2021 the sun shined a bit brighter.

  • @jesusoliveira2
    @jesusoliveira2 2 ปีที่แล้ว +6

    "The thing that's probably happened here is that someone has had a bad day at work, they've pushed something up to their servers and to their switches and things, and it just didn't work." Now, that was detailed, unexpected and fact-based! Someone then thought of disconnecting the Molex cable but it didn't improve things much.

  • @dunebasher1971
    @dunebasher1971 2 ปีที่แล้ว +5

    Loving all the retro computer goodness in the background. I'm seeing an early IBM PC, Atari ST monitor, Apple II and BBC Micro. Sure there's other stuff too.

  • @fostercathead
    @fostercathead 2 ปีที่แล้ว +108

    We're destroying evidence just as fast as we can, and expect our services to be restored as soon as possible.
    We apologize for the inconvenience ...

    • @moistmike4150
      @moistmike4150 2 ปีที่แล้ว +4

      Epic comment. : D

    • @theunknown4209
      @theunknown4209 2 ปีที่แล้ว +1

      Yup

    • @HUYI1
      @HUYI1 2 ปีที่แล้ว +1

      Evidence for what, what were they hiding I wonder...

  • @isyt1
    @isyt1 2 ปีที่แล้ว +2

    Another amazing video.
    I love that computing is such a huge topic that most experts only know about their wee part of it, except Steve seems to know everything about everything lol

  • @PhysioDetective
    @PhysioDetective 2 ปีที่แล้ว +1

    Fascinating. Thank you
    On a different note, your laptop on the edge of the desk was stressing me out!

  • @devnol
    @devnol 2 ปีที่แล้ว +8

    11:00 And this, my friends, is why we always keep two cups and a stretched wire as a communication backup.

  • @t34d1um
    @t34d1um 2 ปีที่แล้ว +14

    this is amazing how clear you can explain it that clearly ! I don't know much about tech and still managed to understand the majority of the video :D

    • @Dutch3DMaster
      @Dutch3DMaster 2 ปีที่แล้ว +1

      I honestly explained it to someone saying something like this: "It's like I'm saying "Hey everyone, I'm not living at house A anymore, but at house B! And while all people who received word about this send their information to house B, house B is like "I don't know who this is, return to sender" and while things start to go wrong (I'm not getting particular pieces of mail anymore) after a while all addresses that want to send stuff to me have now learned to sent everything house B. This time however, House A was also the place where it stored credentials for physical access control, which now was clueless where to find the right data to allow entry of House A".

  • @dread69420
    @dread69420 2 ปีที่แล้ว +11

    That is so true. Federated approaches are much more resilient to these sorts of events.

  • @duncancampbell9742
    @duncancampbell9742 2 ปีที่แล้ว +5

    Really well explained - thanks. Having recently retired after years of data networking, my first thought was for the poor guy that pushed out the change ... and whether he (and his boss) will still be employed next week. I hope they (fb) learn from this and maybe segregate their networks a bit more ... especially (if the media stories are to be believed) the physical security and building access networks so staff can at least have a chance to start diagnosis

  • @shinoobie1549
    @shinoobie1549 2 ปีที่แล้ว +59

    What are the odds of a once in al lifetime catastrophic failure happening exactly when the company was embroiled in a scandal?

    • @VascoCC95
      @VascoCC95 2 ปีที่แล้ว

      what scandal?

    • @sageinit
      @sageinit 2 ปีที่แล้ว +3

      @@VascoCC95 60 minutes

    • @NeedForMadnessSVK
      @NeedForMadnessSVK 2 ปีที่แล้ว +37

      ​@@VascoCC95 Two things in the last 72 hours. There was a data breach and 1.5 billion users data got stolen and a whistleblower came out with internal documents and research from facebook, that show facebook, built to maximize retention time and interactions, promotes hate, polarizing ideas, depression, eating disorders and misinformation. And all important people at facebook know this, but they wont change anything because it would make them less money.

    • @RevRussSmith
      @RevRussSmith 2 ปีที่แล้ว +4

      The odds of a once in a lifetime catastrophic failure happening at some arbitrary time are exactly the same as the odds of happening at some other arbitrary time unless you also assume some sort of causal link. That doesn't appear to be the case.
      If it were malicious in intent, someone could have done far more harm to the FB brand by releasing more information of the type that just came out, not by taking FB down.
      If it were an attempt to do something in FB's favor the event would have been a lot shorter and stopped as soon as they realized the effect it was having on stock value.
      From watching the network fail, the most reasonable explanation is the one in this video. It was a catastrophic coding error that caused FB to disconnect itself from the Internet.

    • @shinoobie1549
      @shinoobie1549 2 ปีที่แล้ว

      @@RevRussSmith cool

  • @DjResROfficial
    @DjResROfficial 2 ปีที่แล้ว +56

    It was quite interesting to see how much of facebook and instagram is buffered locally on the smartphone, I didn't spot it being offline until starting to watch a video and it failed mid-way._

    • @EarthboundApocalypse
      @EarthboundApocalypse 2 ปีที่แล้ว +11

      Wait until you watch how much data FB gathers from you when you allow them to access your internal storage, so you'll see whats very scary when using these apps.

    • @hil449
      @hil449 2 ปีที่แล้ว

      @@EarthboundApocalypse so what? I have nothing to hide lol

    • @AuroraAce.
      @AuroraAce. 2 ปีที่แล้ว +13

      @@hil449 always one with that weak argument

    • @EarthboundApocalypse
      @EarthboundApocalypse 2 ปีที่แล้ว +6

      @@hil449 - Its all about privacy, not about having or not something to hide. I wouldn't like to be profiled just by my web surfing history, or when I'm shopping.

    • @EarthboundApocalypse
      @EarthboundApocalypse 2 ปีที่แล้ว +1

      @@AuroraAce. - the only thing weak thing in this world is your mind for not realizing violations to your main rights. I suppose you are of those who can't hire a lawyer when a problem arrives, because you don't believe in laws, principles and rights. You may think you comply all your obligations as well, but behave like a 'Karen' when no one known is watching you.

  • @jacquesmorrisseau7420
    @jacquesmorrisseau7420 2 ปีที่แล้ว +18

    Love the retro hardware in the back! For my part, I always kept a physical key to our company building because I did not trust the access system. We are now in a new building completely run by keytag accesses. I guess i'm going to add a physical key to my request list to the new building operator and thank Facebook for giving arguments in favor of this.

    • @steveclem7873
      @steveclem7873 2 ปีที่แล้ว

      coyENroWStayTinGLassEZ! GraphiotGRAFTedWardsQUINX!!

    • @mikesully110
      @mikesully110 2 ปีที่แล้ว

      lol do you really think corporate will listen to you? are you one of the bosses? in my place they would ask me why am I so special that I deserve a physical key when everyone else make does with cards?

    • @jacquesmorrisseau7420
      @jacquesmorrisseau7420 2 ปีที่แล้ว

      @@mikesully110 I am the IT director, my boss is the CEO. Curiously enough whenever someone in the company do not agree with me, their bandwidth get mysteriously throttled...

    • @mikesully110
      @mikesully110 2 ปีที่แล้ว

      @@jacquesmorrisseau7420 lol I'm the IT director at my place too but honestly my main concern is, when does my shift end. Bring on UBI if we can't have UBI bring on a 4 day workweek that's all I will say. 1950's optimism promised us 10am - 2pm by now.

  • @as-qh1qq
    @as-qh1qq 2 ปีที่แล้ว

    The clarity of the explaination

  • @mu11668B
    @mu11668B 2 ปีที่แล้ว +15

    Pushing your new code to production that you felt like "it was just some minor errors that don't have be re-tested" be like:

    • @steveclem7873
      @steveclem7873 2 ปีที่แล้ว

      HaRdazeKnight[Mov awe no?]...

  • @krzysztofkwietniewski9100
    @krzysztofkwietniewski9100 2 ปีที่แล้ว +17

    At least *NSA* "Prism" network was still statically *connected* .

    • @bhatkrishnakishor
      @bhatkrishnakishor 2 ปีที่แล้ว +4

      They probably run on adjacent server rack, under the same roof 😉

    • @eekee6034
      @eekee6034 2 ปีที่แล้ว

      I wonder if it would still be up if Google went down. 🤔🤣

    • @eekee6034
      @eekee6034 2 ปีที่แล้ว

      @@glas4849 Possible, but I'm not sure what the motive would be.
      I've been thinking of social-engineering type hacks because there's this situation with China, which I'm told has been going on since the early 90s, and Russia too, and also it seems entirely reasonable to believe the statement that industrial espionage and sabotage are 100% normal now, but you never see anything overt. You know how there's always someone to explain why it's necessary to mess up the entire project for a 1% performance gain (or other excuse)? Some of those people are undoubtedly a bit crazy, but what if some of them were paid to do that by a competitor or a foreign power?

  • @DeoMachina
    @DeoMachina 2 ปีที่แล้ว

    The thumbnail for this is pure genius

  • @RBLevin
    @RBLevin 2 ปีที่แล้ว

    I love the fact that he has an old IBM PC running on the background, an original Mac keyboard sitting around, and a big old CRT on a shelf, too.

    • @eekee6034
      @eekee6034 2 ปีที่แล้ว

      The CRT is badged Atari and looks like an original monitor for the ST series; pretty uncommon and something I really wished I had back in the late 80s. ;) I was looking at it more than listening to the video at first.

  • @boyan3001
    @boyan3001 2 ปีที่แล้ว +3

    Email is exact example I was thinking about during whole video! I think it would be nice topic for Computerphile to discuss: open protocol based services and systems, with various implementation such as email is, vs. proprietary and centralized ones like popular instant messengers (from ICQ, MSN and Skype to modern such as WhatsApp, Viber, FB Messenger etc.) or whole platforms like Facebook, Instagram are.

    • @autohmae
      @autohmae 2 ปีที่แล้ว +2

      So basically a video about: centralized, federated and decentralized.

  • @sasuke2910
    @sasuke2910 2 ปีที่แล้ว +73

    It's funny how at the end they seem to be saying it's impossible for Facebook to create something that doesn't have a single point of failure.

    • @sageinit
      @sageinit 2 ปีที่แล้ว +10

      Technically correct given that we don't actually have an Internet, we only have a catenet.

    • @nathanb011
      @nathanb011 2 ปีที่แล้ว +12

      No matter how many servers that facebook makes, if the DNS doesn't tell you where they are, you can't do anything about it.

    • @lordcirth
      @lordcirth 2 ปีที่แล้ว +5

      With a sufficiently broad definition of "point of failure", yeah. Making physical hardware redundant, easy with their budget. Preventing any one admin from making large changes, hard, but possible. Preventing one bug in the distributed software from taking down their network? How?

    • @brendawilliams8062
      @brendawilliams8062 2 ปีที่แล้ว

      Most things aren’t perfectly made.

    • @goeiecool9999
      @goeiecool9999 2 ปีที่แล้ว +1

      I think he was promoting the use of services that are not centrally controlled by one entity. Aka federation.

  • @Daniel-us1dl
    @Daniel-us1dl 2 ปีที่แล้ว +4

    Thank you Mr Philip Seymour Hoffman for your outstanding explanation!

  • @namedeleted5329
    @namedeleted5329 2 ปีที่แล้ว +1

    Hey, almost 2 million subscribers, great job, computerphile!😀👍

  • @support2587
    @support2587 2 ปีที่แล้ว +17

    Reminds me of the AT&T long distance failure. Cascaded due to a code issue.

    • @CTimmerman
      @CTimmerman 2 ปีที่แล้ว +1

      Standard insertion of nonstandard code.

    • @steveclem7873
      @steveclem7873 2 ปีที่แล้ว

      egg smekz?...

  • @SKyrim190
    @SKyrim190 2 ปีที่แล้ว +5

    2:43 "You see...a phonebook was like a physical DNS server, where you had this book that you would look up the person's telephone number by their name"

  • @chrisedwards4929
    @chrisedwards4929 6 หลายเดือนก่อน

    Loved this analysis. Watched it at the time and just rewatched it in light of the optus outage here. Dr Steve is there any chance you could take a look at that using the same tool to see if it's the same problem. Thx

  • @Spasmomen
    @Spasmomen 2 ปีที่แล้ว +1

    Thanks for the great explanation! I loved that wire diagram.

  • @TheJaguar1983
    @TheJaguar1983 2 ปีที่แล้ว +8

    We had something like this happen to the Australian internet a few years back. The internet provider Dodo (who I worked for years prior) sent out an update that forced all traffic in the country through Dodo's routers, slowing down the entire country's internet.

    • @deus_ex_machina_
      @deus_ex_machina_ 2 ปีที่แล้ว +3

      Did anyone even notice the difference?

    • @tenseikenzx-3559
      @tenseikenzx-3559 2 ปีที่แล้ว +1

      Probably not, our internet could be a lot better

    • @TheJaguar1983
      @TheJaguar1983 2 ปีที่แล้ว +1

      @@deus_ex_machina_ Yeah, everything was ⅓ to ½ speed.

    • @deus_ex_machina_
      @deus_ex_machina_ 2 ปีที่แล้ว +2

      @@TheJaguar1983 It was banter about Australia's stereotypically slow internet speeds.
      The joke is that it was already so slow that no one even noticed that it had become even slower.

  • @baseddepartment1306
    @baseddepartment1306 2 ปีที่แล้ว +39

    They had to reboot to deploy the skynet version

  • @JCTsFascinatingHobbies
    @JCTsFascinatingHobbies 2 ปีที่แล้ว

    Cool, an Acorn RiscPC, just sat in the background!! One of my favourite machines of all time. I own a StrongARM SA110 upgraded RiscPC 600, which I use, occasionally.

  • @superxxai
    @superxxai 2 ปีที่แล้ว

    Thanks for the video! I have one question. I thought BGP just published changes. I mean, BGP once a route is published, there is no need to announce it again. Are you sure it is correct that thing that Steve is saying "Facebook stopped announcing...." (min 8:30)? This would mean that border routers are continually announcing their routes. Thanks again. Your videos are so nice! Love them!

    • @SuperSpecies
      @SuperSpecies 2 ปีที่แล้ว

      The whole point of having a dynamic routing protocol is so that routes are able to change.

  • @DavidKutzler
    @DavidKutzler 2 ปีที่แล้ว +3

    This incident is a good argument against the IoT. If a single-point failure can knock out your phone system and lock you out of your own building, it's a real vulnerability.

  • @levyroth
    @levyroth 2 ปีที่แล้ว +29

    For a short while, the world was simply normal. Wish FB stayed offline for good.

    • @DeosPraetorian
      @DeosPraetorian 2 ปีที่แล้ว +1

      And it would be replaced by something else

  • @honestgoat
    @honestgoat 2 ปีที่แล้ว

    Almost 2 million. Congrats guys.

  • @MatthewPiercey
    @MatthewPiercey 2 ปีที่แล้ว

    Brilliant n00b-friendly explanation and hilarious thumbnail 😅.
    This made me think of some kind of thriller movie, where a technician gets trapped in a massive datacenter in an outage like this. So they gotta go all Gordon Freeman to get through the complex's security systems - to get the servers back online in time, before a bomb goes off or something idk 😂

  • @thefirehawk1495
    @thefirehawk1495 2 ปีที่แล้ว +3

    I could actually feel the positivity returning to people around me and people engaging more with eachother. Can't we just end Facebook?

    • @kapa1611
      @kapa1611 2 ปีที่แล้ว

      👍
      here is something funny i heard: when the arab spring happened, the shutdown of social media (by the regimes that wanted to undercut the protests) accelerated them. apparently because people started talking to each other more in person xD

  • @philiplee8663
    @philiplee8663 2 ปีที่แล้ว +5

    You make good videos. Looking forward to the next one!

  • @SephirothEGS
    @SephirothEGS 2 ปีที่แล้ว +2

    That map of nodes with the flashing connections reminded me of that scene in Jurassic Park when they look at the map full of failed security systems and they realize that they're screwed. I imagine the people at Facebook had much of the same "Oh crap..." feeling.

  • @philstuf
    @philstuf 2 ปีที่แล้ว

    I have NO idea why there are ANY downvotes to this video. It's accurate, it's informative, and it is entertaining...

  • @MoosesValley
    @MoosesValley 2 ปีที่แล้ว +6

    For a brief period of time Carrier Pigeons made a spectacular come back and became the preferred way for people to communicate in the 21st century ...

  • @yarik12341
    @yarik12341 2 ปีที่แล้ว +4

    I took a networking class last year and what he is talking about made a lot of sense. Thanks cs degree

    • @steveclem7873
      @steveclem7873 2 ปีที่แล้ว

      declasse intelejinx railROded ritz cyr compress22 hellier joneseys ai chapshax java class? bowlEN2bird teechez lloyfaz,wikamainz? KhanSciASS!...

  • @MsRope93
    @MsRope93 2 ปีที่แล้ว

    One video that best explained what really happen with WB.
    Kudos @computerphile

  • @Mar_Ten
    @Mar_Ten 2 ปีที่แล้ว

    Thanks for this intresting video. I knew DNS but needed a bit more context to this!

  • @bcn1gh7h4wk
    @bcn1gh7h4wk 2 ปีที่แล้ว +11

    IT person: "Mark, our servers are down!"
    The Zucker: "Have you tried turning them off and on again?"

    • @JonatasAdoM
      @JonatasAdoM 2 ปีที่แล้ว

      We need to turn it on to let us in so we can shut it off!

  • @Peewee0413
    @Peewee0413 2 ปีที่แล้ว +3

    Nice description of the issue....
    The Facebook Song : Scrub scrub scrub your servers, scrub them till they're clean. If you do it well enough, you'll avoid the guillotine.

  • @rudiklein
    @rudiklein 2 ปีที่แล้ว

    Great video! What is the software you are using to display the network connections?

  • @jmfriedman7
    @jmfriedman7 2 ปีที่แล้ว

    Living in Japan, did not notice any downtime but read about it in online news. Apparently it must have been down from 11:30 pm until about 5 am our time.

  • @HikaruKatayamma
    @HikaruKatayamma 2 ปีที่แล้ว +5

    Would have been nice to add ASN numbers in the explanation. You could still keep it fairly simple and explain how BGP4 pathing works.

  • @dimasveliz6745
    @dimasveliz6745 2 ปีที่แล้ว +13

    I love this channel both because of the content and because of the desk of so many awesomely smart people looking messy as mine :)

  • @vladciobanu3672
    @vladciobanu3672 2 ปีที่แล้ว

    Really informative! Do you think you could make a similarly in-depth video on how Apple’s Private Relay will work/works?

  • @dustinmorrison6315
    @dustinmorrison6315 2 ปีที่แล้ว

    The part you discussed in the end there is called a deadlock in concurrency.

  • @chadfreestyle4371
    @chadfreestyle4371 2 ปีที่แล้ว +126

    I feel like a cave man who literally walked out of a cave homeless knowing nothing about computer's but I'm learning and I find this really interesting. Thanks.

    • @boratsagdiev6486
      @boratsagdiev6486 2 ปีที่แล้ว +2

      It’s never too late to start coding.

    • @steveclem7873
      @steveclem7873 2 ปีที่แล้ว

      aumheThyKatatNvohMat? oofeeeliangszzz....soapyDOpeMEanzCenTrueyanks?!!apeKink...oofNohTelefonKordaaz,,,

    • @EgonFreeman
      @EgonFreeman 2 ปีที่แล้ว

      Many people don't even _realise_ how much is going on "behind the scenes". Many _professionals_ don't even realise. And it's nobody's fault - the systems we construct are _so complex_ that there often isn't one person who "understands it all" and can solve any and every emergency... It's just _impossible_ to understand _all_ of it. Even this - the event that killed Facebook for _a few hours_ - is just the tip of the iceberg that IT people deal with day-in and day-out. So don't feel bad for not grasping this - you're in the overwhelming majority.

  • @irispounsberry7917
    @irispounsberry7917 2 ปีที่แล้ว +39

    "Facebook know what they're doing when it comes to networking", "they're not some fly by night operation"... which is why there's been chatter about them scrubbing some things from their network in the wake of the negative press they got the previous day. That being said, it definitely sounds like a regular corporate screwup to route everything through their own servers, depending on that constant update/refresh cycle, to secure everything in the IoTs rather than rely on any analog system.

    • @moshet842
      @moshet842 2 ปีที่แล้ว

      Your comment makes no sense.

    • @dr.angerous
      @dr.angerous 2 ปีที่แล้ว +1

      @@moshet842 wtf shut up

    • @abg44
      @abg44 2 ปีที่แล้ว +2

      @@moshet842 No, you just don't understand L2/L3 routing

    • @irispounsberry7917
      @irispounsberry7917 2 ปีที่แล้ว +2

      @@moshet842 There have been guesses that FB needed to scrub something from their system and their logs in light of the stuff said in the news recently. In order to do that, it is speculated that they took their own system off the internet (in broad terms) to do it properly - they know their tech and how to engineer a "mistake" to give themselves the time on the servers to alter stuff deep in the code without outside observation for any future investigation.
      However, they are a regular corporation, likely with regular managers that know little about what the people they manage actually do. So, people who are not tech minded (like regular managers) would be inclined to route everything electronic through their own secure system, because they trust it more than other security methods. This lends credence to FB's given explanation of what happened.

    • @moshet842
      @moshet842 2 ปีที่แล้ว

      @@abg44 Networking is literally what I do.

  • @tsuchan
    @tsuchan 2 ปีที่แล้ว +1

    Just a thought... if the cameraman is going to take part in the discussion, the cameraman needs to have some audio pick-up. I have to mention this, because video-after-video the cameraman doesn't seem to have worked it out for himself. Cameraman: we cant' hear what you're saying!

  • @ericcartmansh
    @ericcartmansh 2 ปีที่แล้ว +1

    Great point about decentralization at the end

  • @cheguevara8500
    @cheguevara8500 2 ปีที่แล้ว +18

    The guy who pushed the wrong config got probably very hard time after he realized what happened. I wouldn’t like to be in his shoes. Poor guy.

    • @quintrankid8045
      @quintrankid8045 2 ปีที่แล้ว +1

      What would you do to him if you were the boss of all bosses at the company?

    • @autohmae
      @autohmae 2 ปีที่แล้ว +10

      I wonder if they use a git review process before pushing changes, that would mean it was checked by one or more other people before the update went out on the network to reconfigure the network.

    • @autohmae
      @autohmae 2 ปีที่แล้ว +5

      @@quintrankid8045 at first ask what happened.

    • @globalincident694
      @globalincident694 2 ปีที่แล้ว +1

      @@autohmae They must use some sort of review process, they're not idiots.

    • @autohmae
      @autohmae 2 ปีที่แล้ว +2

      @@globalincident694 So Facebook released more information, it wasn't a configuration change, they were running a command on all routers to request some information but the command did something else by accident.

  • @mLyonJE
    @mLyonJE 2 ปีที่แล้ว +50

    There's a hint of Facebook's arrogance here.
    "They wrote the book on best practice", people say. They're world leaders and have extensive backup systems, apparently. And so on.
    The thing is, their perception was that they could put all their eggs -- external, internal, backup, everything systems all in one notional basket. I run the tiniest of networks and even I think to put some systems "out of band", so if the unforeseen occurs and kills my primary systems, I have a little alternative that I can reach/access/use. Note my use of the word UNFORESEEN. There's your "best practice" -- write the book, follow the book, and know there is more BEYOND said book.

    • @steveclem7873
      @steveclem7873 2 ปีที่แล้ว

      finkDratBiYuhMoofiaRitENz2?

    • @tanmaypanadi1414
      @tanmaypanadi1414 2 ปีที่แล้ว

      @@glas4849 nope they didn't get hack

    • @pl4nty
      @pl4nty 2 ปีที่แล้ว

      Have you read their blog posts/research papers? The affected system was their backbone - the cost of running a global OOB network just for config failure is far greater than the 7 figures they lost due to downtime. Far more effective to have strong auditing controls (like AWS, Google, Microsoft) which failed in this case. The real issue with their response was the lengthy downtime, which was blamed on security controls.

    • @OlivierChambon1
      @OlivierChambon1 2 ปีที่แล้ว

      What is the name of this book?

  • @nicknesler
    @nicknesler 2 ปีที่แล้ว +1

    I would love to be in those retro meetings.

  • @JamesJansson
    @JamesJansson 2 ปีที่แล้ว

    The "Ferris Buller's Day Off" thumbnail is awesome.