Could The Internet Send You The WRONG Thing?

แชร์
ฝัง
  • เผยแพร่เมื่อ 15 ก.ย. 2024
  • Visit www.brilliant.... to get started learning STEM for free, and the first 200 people will get 20% off their annual premium subscription.
    Learn about checksums and their many uses - including ensuring data gets from one place to the other without being changed in transit.
    File checksum utilities:
    www.abelhadigi...
    www.quickhash-...
    Leave a reply with your requests for future episodes.
    ► GET MERCH: lttstore.com
    ► LTX 2023 TICKETS AVAILABLE NOW: lmg.gg/ltx23
    ► GET EXCLUSIVE CONTENT ON FLOATPLANE: lmg.gg/lttfloa...
    ► SPONSORS, AFFILIATES, AND PARTNERS: lmg.gg/partners
    FOLLOW US ELSEWHERE
    ---------------------------------------------------
    Twitter: / linustech
    Facebook: / linustech
    Instagram: / linustech
    TikTok: / linustech
    Twitch: / linustech
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 633

  • @juaniththomas3591
    @juaniththomas3591 ปีที่แล้ว +2575

    yes my son got sent adult videos after he tried to download homework questions, the internet is scary

    • @pofjiosgjsoges
      @pofjiosgjsoges ปีที่แล้ว +347

      Happens to me all the time. So much wasted data on my mobile plan...

    • @timothytorpy4837
      @timothytorpy4837 ปีที่แล้ว +240

      Sounds more like someone got caught lol

    • @qazhr
      @qazhr ปีที่แล้ว +174

      Umm dude I think we need to talk about what really going on with your son.

    • @ordinarryalien
      @ordinarryalien ปีที่แล้ว +71

      What was it like? I mean.. uh... I'd like to know so I can protect myself from it.

    • @Seelendrache
      @Seelendrache ปีที่แล้ว +40

      ​@@qazhr depends on how old the son is 😂

  • @miigon9117
    @miigon9117 ปีที่แล้ว +652

    Programmer here: Linus mentioned that the data goes through a CRYPTOGRAPHIC hash. While they certainly can be used to verify file integrity, more commonly used for such purpose are NON-cryptographic hash algorithms since they are generally faster. While the cryptographic ones are generally reserved for well, cryptography. (Difference: Protection against active malicious manipulation vs just plain transmission damage)
    What makes a hash function cryptographic(or not)? Basically it's how hard it is for a bad actor to crack (or produce a collision). MD5 is once considered a cryptographic hash and are popular among website as a mean as storing password, but has since been deprecated and regarded as a non-cryptographic hash just for file integrity verification after people start finding fast ways to crack MD5 hashes

    • @techaddictdude
      @techaddictdude ปีที่แล้ว +11

      NERD alert!

    • @JustPlayerDE
      @JustPlayerDE ปีที่แล้ว +110

      @@techaddictdude you are literally watching ltt, we all are nerds here lol

    • @I3erow
      @I3erow ปีที่แล้ว +19

      dude MD5 is outdated und SHOULD NEVER be used for security related features anymore....

    • @JustPlayerDE
      @JustPlayerDE ปีที่แล้ว +51

      @@I3erow md5 is still fine to check if the file is not currupted on transfer tho

    • @hubertnnn
      @hubertnnn ปีที่แล้ว +6

      @@I3erow It depends on how secure something should be.
      I keep using md5 for securing image downloads.
      It is fast and simple, and no one is going to waste time cracking an md5 hash to get access (remove watermark) to a $1 picture, it will cost more to do so than just buying the picture itself.

  • @CarlosCabrera-kn1jb
    @CarlosCabrera-kn1jb ปีที่แล้ว +712

    Actually Linus, as a CS Major it’s actually a miracle that information gets from point A to point B, fucking magic man, that’s why those low level devs always have a long beard, they’re magicians.

    • @GSBarlev
      @GSBarlev ปีที่แล้ว +90

      Forget Point A to Point B--the potential that a stray cosmic ray strikes your RAM and flips a bit at just the wrong time is why I'm super psyched that DDR5 has ECC built in.

    • @JazGalaxy
      @JazGalaxy ปีที่แล้ว +31

      Just… for your own information, don’t write a message to professional technologists and say “ as a CS Major..”

    • @akaiappears
      @akaiappears ปีที่แล้ว +49

      ​@@GSBarlev And God said, let this bit be switched: and the bit was switched. And God saw the bit, and it was fucking magic to them; and God divined low level devs into existence

    • @damienchambers3302
      @damienchambers3302 ปีที่แล้ว +1

      ​@Gilad Barlev it does? That's awesome. Does it have any practical applications for the average user though?

    • @CarlosCabrera-kn1jb
      @CarlosCabrera-kn1jb ปีที่แล้ว +13

      @@JazGalaxy chill, it was just a joke my friendo.

  • @demophoon
    @demophoon ปีที่แล้ว +34

    Fun fact, every credit/debit card has a checksum built into the number so computers can quickly determine if users accidentally typoed their number in wrong when paying for things online. Many other numbers which humans are expected to enter manually usually are designed with these sorts of checks in mind like insurance numbers, IMEI numbers and even those lil survey codes you find on receipts

    • @I.____.....__...__
      @I.____.....__...__ ปีที่แล้ว +1

      The keyword being "many". Many also _don't_ have them. In fact, some of the survey-codes on receipts are practically human-readable and you can edit them as you please to fill out multiple surveys. (Not that there's any point; all you're doing is wasting your time giving them market-research for free. I'm not convinced they ever give ANYONE the cash prizes they claim to. 😒)

    • @bettercalldelta
      @bettercalldelta ปีที่แล้ว +1

      the difference is the "checksum" on credit card numbers is quite primitive, google Luhn's Algorithm, it's just the numbers added, multiplied and moduloed, even still, it's good enough to catch common mistakes like wrong and swapped digits

    • @stevensavoie856
      @stevensavoie856 ปีที่แล้ว +1

      I love that the word typoed is a typo.

    • @finadoggie
      @finadoggie ปีที่แล้ว

      Interestingly, Social Security Numbers specifically do not do this

  • @redaceFR
    @redaceFR ปีที่แล้ว +60

    7-zip can generate the checksums of files too ! It's the CRC option in the 7-zip menu in the explorer. It avoids downloading something else to check it or typing a command.

    • @lucasrem
      @lucasrem ปีที่แล้ว

      WHY YOU NEED LINUS ?????
      ads you need ?

  • @JV-pu8kx
    @JV-pu8kx ปีที่แล้ว +111

    Data transmitted with UDP does not get resent. The UDP protocol is used for things like video streaming where an occasional dropped packet won't be missed.

    • @BrianG61UK
      @BrianG61UK ปีที่แล้ว +22

      There is certainly no standard mechanism whereby UDP always gets resent if a packet is corrupted or lost. However, software that uses UDP might be able to tell when packets need to be resent and then do so. Consider, for example, QUIC. QUIC uses UDP, and packets that are lost most definitely do get resent. In fact, consider simple DNS using UDP. Most DNS clients will resend the outgoing packet if no response is received, it might only be resent once before giving up, or get resent but to a secondary DNS server instead, but it also might get resent to the same DNS server if a second one isn't specified.

    • @MrBleach163
      @MrBleach163 ปีที่แล้ว +3

      I'm not sure that in video streaming one packet won't be missed given the complexity of compression algorithms...

    • @BrianMelancon
      @BrianMelancon ปีที่แล้ว +7

      Brian Gregory is correct. It's not the case that using UDP means there are no validation checks. It's just not included in the UDP protocol, and is instead left to the application layer to handle as appropriate to the situation. Almost every application that uses UDP does in fact do some sort of data validation. For example, Wireguard uses UDP. Traffic over Wireguard is encrypted and needs to be 100% accurate.

    • @thoria
      @thoria ปีที่แล้ว +3

      UDP also doesn't strictly require checksumming. With hardware generally always able to do it (see TOE, the TCP offload Engine protocol for more), it's almost always there, but it's not on *every* packet as Linus claimed, likely because the tangent would just eat up way more time than it's worth.

    • @BrianG61UK
      @BrianG61UK ปีที่แล้ว +2

      @@MrBleach163 Yes. For something like a Skype call it depends, but you'll probably be lucky if there isn't some kind of visible glitch, but the point is, you don't want to wait for retransmission and have the time delay keep increasing to the point where you're waiting ages for the person you're calling to respond to what you say.

  • @danimayb
    @danimayb ปีที่แล้ว +72

    Checksums are redundant pieces of information added to data that allow the receiver to verify if the data was received correctly. A simple checksum is to use only the first seven of the 8 bits in a byte for data. The eighth bit is a sum of the first 7 bits (modulo 2) that acts as a check for the first 7 -therein the name 'checksum'. In a nutshell :)

    • @southernflatland
      @southernflatland ปีที่แล้ว +4

      But wait, there's more!
      They make 16 bit checksums too!
      Utilized in SNES - JRR Tolkien's Lord of the Rings
      Edit: I should know, I mostly decoded them. Punch in "3P5" multiple times in a row, like 8 times in a row, tell me that don't unlock all the characters...

    • @swimfan6292
      @swimfan6292 ปีที่แล้ว +1

      LT really needs to do more videos like these. For himself and his viewers... Many are clueless

    • @eksmad
      @eksmad ปีที่แล้ว +2

      Checksums as fast as possible .. YOU win! Instead of Linus.

    • @Rudxain
      @Rudxain 11 หลายเดือนก่อน

      Actually 🤓, the simplest (and fastest) checksum for binary computers is `xorsum`. It's "infinitely"- parallelizable, but doesn't have "avalanche-effect"

  • @theyruinedyoutubeagain
    @theyruinedyoutubeagain ปีที่แล้ว +272

    Not all CRCs are cryptographic, actually I'm pretty sure most fast checksums are not crypto hardened. Still, great video!

    • @GSBarlev
      @GSBarlev ปีที่แล้ว +33

      I mean yeah, the simplest checksum is just a parity bit.

    • @SelecaoOfMidas
      @SelecaoOfMidas ปีที่แล้ว +8

      I mean, even CRC32 is highly susceptible to collisions (files that are different from each other, but having the same hash value), and SHA-1 had that issue take hold around 2013. Most entities have moved on to hash algorithms like SHA-256 up to SHA-2048, depending on the importance of the data and urgency vs compute cost per file.

    • @ShadowSlayer1441
      @ShadowSlayer1441 ปีที่แล้ว +1

      Why would you use anything other than sha256?

    • @lordsponge10
      @lordsponge10 ปีที่แล้ว +1

      @@ShadowSlayer1441 you would use SHA3-256 which is more robust to attacks. Most programs still use SHA2-256.

    • @tr7zw
      @tr7zw ปีที่แล้ว +5

      @@ShadowSlayer1441 Speed? Simplicity?

  • @JV-pu8kx
    @JV-pu8kx ปีที่แล้ว +53

    I've heard that many of the free cloud storage services use checksums to save space. They run the hash and compare to what is already on their servers. If two matching files are uploaded, only one copy gets stored. It does not matter if the files were uploaded to separate accounts, only one copy is actually stored.

    • @DraxTrac
      @DraxTrac ปีที่แล้ว +3

      You know, I've always wondered about that.

    • @GSBarlev
      @GSBarlev ปีที่แล้ว +4

      Pretty sure it happens with Plex / Jellyfin metadata fetchers as well, which is why occasionally you'll get results that aren't just a little bit off, but, like, wildly off.

    • @lPlanetarizado
      @lPlanetarizado ปีที่แล้ว +4

      as a bug bounty guy, that opens options to find bugs, thanks

    • @blahorgaslisk7763
      @blahorgaslisk7763 ปีที่แล้ว +12

      This is called deduplication, and is a staple feature in large storage systems. However they should not stop at running a simple hash to decide if the files are the same. At next step they check the file length and then if it still matches they check the actual binary data. This has to be done as the checksum of two files can be the same, even with different content and even file size.
      Simple checksums like CRC32 is very easy to manipulate. A anime fan sub group used to make sure their releases all had a CRC32 checksum that showed the episode number. So episode 1 had the checksum 01010101, episode 2 got hashed as 02020202 and so on. This is (marginally) harder with MD5 and a lot harder with SHA256 or better. But even without malicious intent there are only so many hash values possible in say 256 bits that eventually two files will have the same hash. This means that a hash value can't guarantee that the file is what you think it is. It can only guarantee that the file hash is the same. So use it to check for transmission errors, and file integrity. Not as prof of content not being manipulated by a third party.

    • @CodeAsm
      @CodeAsm ปีที่แล้ว +2

      @@lPlanetarizado "hash collision" and can be very tricky. Id probably do it on multiple levels, data chunk wise, file wise and check metadata (file size, dates, entropy)

  • @SuperFromND
    @SuperFromND ปีที่แล้ว +68

    if you have 7-zip installed (which you really should if you don't, it's amazing), it actually adds all sorts of checksum-generation options to the right-click menu in windows, its really handy

    • @talon262
      @talon262 ปีที่แล้ว +9

      And you're not limited on using 7-zip on just ZIP/RAR/other compressed archive format files to pull their checksums... you can use that built-in functionality for pretty much any file.

    • @Gohan1138
      @Gohan1138 ปีที่แล้ว +3

      Winrar gang here

    • @TylerTMG
      @TylerTMG ปีที่แล้ว

      i use breezip from microsoft store

    • @rf8003
      @rf8003 ปีที่แล้ว

      Peazip good as well...

    • @kvbc5425
      @kvbc5425 ปีที่แล้ว

      WINrar 👑

  • @thoria
    @thoria ปีที่แล้ว +35

    I'm kind of surprised there was no mention of block-level CRC for storage media, the checksum that makes it possible for RAID-scrubbing to find faults, and for disks in general to be reasonably certain they're reading back the same values that were written in the first place, something almost everyone takes for granted.

    • @HarpaxA
      @HarpaxA ปีที่แล้ว +1

      He's talking abt Internet Checksum, not RAID

    • @thoria
      @thoria ปีที่แล้ว

      ​@@HarpaxA He, and the writers, are, but a large part of the runtime is spent on offline and local-network file-validation and things like passwords. The fact that this is a design consideration that allows data-integrity issues to be found in RAID when the multi-disk abstraction might otherwise hide problems until way too late is just one application and a way to get people's attention with a topic that seems to draw some number of views (yay, algorithm).
      Block-device-level checksums seem relevant to this topic specifically because there's an emphasis on "how does data reliably get from point A to point B?" and it needs to be stored and retrieved from somewhere. There's nothing about a magnetic head or voltage-assessment that provides assurance that read-mistakes won't happen without a checksum of their own.

  • @tr7zw
    @tr7zw ปีที่แล้ว +26

    Noteworthy that a checksum on the download page being the same as the downloaded file doesn't mean that it hasn't been tampered with. If you're man in the middle-d or the site is compromised enough, hackers could also just replace the hash shown on the site to match the modified file.

    • @thorbear
      @thorbear ปีที่แล้ว +6

      Yeah, comparing checksums for downloaded files when the checksum and file are on the same server feels like security theater, just giving a (false) sense of security without actually adding any security.
      Doesn't the practice come from (and make more sense in) the scenario where a 3rd party file hosting service (or a mirror) is used to store the actual files, while only a link and a checksum is on the website itself, so you can verify that the file you get from the 3rd party is the one intended by the owner of the website?

    • @blahorgaslisk7763
      @blahorgaslisk7763 ปีที่แล้ว +3

      Very important post! A hash is never proof of what the file contains, just that the file you ran the hash algorithm on has the same hash result as what you were told to expect. So use it to verify that the file wasn't corrupted in transmission or changed in some form. But don't rely on the content being what you expect just because the hash matches what's on the site you got it from.

    • @o0Donuts0o
      @o0Donuts0o ปีที่แล้ว

      Then sign your files…

    • @I.____.....__...__
      @I.____.....__...__ ปีที่แล้ว +1

      @@o0Donuts0o File-signing certificates are expensive af. There's no LetsEncrypt for that. 😕

    • @stayfunsteven2207
      @stayfunsteven2207 ปีที่แล้ว

      The first thing that came to my mind when I first heart about checksums. But then to other attacks it can be helpful. So it is obviously not useless.

  • @Bacender
    @Bacender ปีที่แล้ว +4

    Hashtab is one of the best checksum tools for Windows. It adds a tab in the properties dialog of a file to let you compare checksums.

  • @VFPn96kQT
    @VFPn96kQT ปีที่แล้ว +2

    There is a big difference between *Cryptographic* checksum and the one used for verification that a file arrived correctly such as TCP/IP protocol

  • @soup5344
    @soup5344 ปีที่แล้ว +8

    personally i prefer the method of looking at the files and going "Yeah that seems about right"

    • @TylerTMG
      @TylerTMG ปีที่แล้ว

      wait THIZ|S ISNT MY 8K TOY STORY 1 VIDEO

  • @delofon
    @delofon ปีที่แล้ว +2

    1:33 If those bad actors could replace the download with a malicious one on some website, it would be of no hassle for them to replace the checksum as well. MITM attacks are guarded against with protocols like SSL. Checksums are not used to validate the security of a file but rather to confirm it was downloaded correctly from the origin (even though TCP handles it too) so that, in the worst case scenario, your PC doesn't break down from an incorrect OS download.

    • @kpcraftster6580
      @kpcraftster6580 ปีที่แล้ว +2

      Yes BUT... often downloads are hosted on a different domain than the checksums. Ideally you download the file from the least suspicious mirror and get copies of the checksum from multiple other sources.

  • @SamarthCat
    @SamarthCat ปีที่แล้ว +2

    4:23 not every service uses TCP, for example, most online games and realtime apps use UDP to reduce latency because packets don't have to arrive correctly.

  • @The_Life
    @The_Life ปีที่แล้ว +16

    Whenever he says "bad actors", I can't help but think actors who just suck at their job doing shady things

    • @Duaality.
      @Duaality. ปีที่แล้ว

      If they're acting at doing their job and they still suck, then I'd argue that still makes them a bad actor

  • @ArdentMoogle
    @ArdentMoogle ปีที่แล้ว +354

    And we're slowly moving to quantum-resistant hash functions, to avoid the issue of quantum computing in the future.

    • @Dinkleberg96
      @Dinkleberg96 ปีที่แล้ว +5

      That will be a problem for security

    • @matthewparker9276
      @matthewparker9276 ปีที่แล้ว +32

      ​@@Dinkleberg96 no it won't. By the time a quantum computer powerful enough to work on decrypting real internet packages exists all important things will be using quantum secure algorithms. It'll be Y2K all over again.

    • @robspiess
      @robspiess ปีที่แล้ว +31

      @@matthewparker9276 Not necessarily. Important long-term data which was encrypted with "good at the time" cryptographic ciphers are being saved for future quantum computers to decrypt. Even though we can't break it now, saving RSA-4096 encrypted "Who_Shot_JFK.docx" and "Herbs_and_Spices_v11.KFC" for computers 30 years from now could cause real national security issues.

    • @triciaf61
      @triciaf61 ปีที่แล้ว

      @@robspiess i cant wait for "Herbs_and_Spices_v11.KFC" to get cracked and cause the USA to fall into utter chaos due to it revealing the real herbs and spices.

    • @WolvenSpectre
      @WolvenSpectre ปีที่แล้ว +1

      @@robspiess I think he though the first reply was saying that Quantum Resistant Algos were bad for security, and not Quantum Computing will be bad for security. That post was kinda ambiguous.

  • @VivekYadav-ds8oz
    @VivekYadav-ds8oz ปีที่แล้ว +17

    Checksums are only useful if you're expecting errors not malicious intervention. Anybody could just change the source, and then re-hash the source and send that as the checksum. Encryption will be necessary regardless.

    • @monkeyoperator1360
      @monkeyoperator1360 ปีที่แล้ว +1

      not exactly, most websites write the checksum out, so that you run the checksum yourself the file doesn't check itself

    • @evertchin
      @evertchin ปีที่แล้ว +1

      wrong.... this is why we use strong crypto as checksum, it will take you forever to reshash the contents to match the original checksum.

    • @fishyfish2679
      @fishyfish2679 ปีที่แล้ว

      Yeah no, encryption alone does not mean attacker can't change plaintext. E.g. stream ciphers are vulnerable to known plaintext attacks. What you want is an unforgeable checksum, and in the field of cryptography you have two ways for that, digital signatures (software/drivers/official email etc), and message authentication codes (generally instant messaging). It's very common data that is assigned a MAC or digital signature is also encrypted, but unless we're talking about authenticated encryption, integrity and authenticity is provided by algorithms other than the encryption.

    • @Gramini
      @Gramini ปีที่แล้ว

      @@evertchin OP meant that if you can change the file on someones server, you probably can also change the displayed checksum on the web page.

  • @HyperGadgets
    @HyperGadgets ปีที่แล้ว +1

    Couple of points:
    - the cryptographic hash function outputs aren't guaranteed to be unique, but are generally designed to avoid collisions.
    - passwords aren't just hashed (or at least they shouldn't be 😅), if they were, then if the Database was leaked, the attacker would be able to tell the simple passwords. Two people with the same password would then have the same hash output. This could also mean the attacker can generate hashes from a list of common passwords and compare against the database to find people with common passwords and hack their accounts.
    To get around this, a "salt" is added. The salt is randomly generated and when combined with the password and then hashed, it will create a new output, even if two users have the same password.
    This is why you should use unique/random passwords, because if the server doesn't salt the passwords, common passwords can be found easily and then anywhere you use that same password is then potentially compromised - even if the other places do salt them.

  • @Alphalaneous
    @Alphalaneous ปีที่แล้ว +3

    Note that 7zip has a checksum viewer as well, so if you have that, you can view the checksum of a file easily

  • @TheARN44
    @TheARN44 ปีที่แล้ว +72

    I’m surprised that this video didn’t mention google registering the .zip domain.

    • @Bert-og9rk
      @Bert-og9rk ปีที่แล้ว +18

      I thought it was going to bring that up considering the thumbnail.

    • @MrSevenEleven
      @MrSevenEleven ปีที่แล้ว +8

      Why would it?

    • @Zikeji
      @Zikeji ปีที่แล้ว +4

      Given the title and the thumbnail that is exactly what I thought as well. Disappointed lol.

  • @StubbornProgrammer
    @StubbornProgrammer ปีที่แล้ว +5

    Awww I really wanted Linus to mention salting in the password segment. I know it's too much of a tangent for such a short video but it's a neat solution to an unfortunately real security problem.

    • @o0Donuts0o
      @o0Donuts0o ปีที่แล้ว

      I think this comment section has all the salt covered over CRC vs checksum.

  • @dshcfh
    @dshcfh ปีที่แล้ว +1

    On the "Windows Explorer doesn't compute hashes" note;
    It wouldn't take a lot of resources to do that at all. They would only have to add a checksum middleman to the file transfer stream.

    • @jkahgdkjhafgsd
      @jkahgdkjhafgsd ปีที่แล้ว +1

      with the number of cores medium & high-end systems have these days it's not like performance is a concern either (just make it optional)

  • @CoolJosh3k
    @CoolJosh3k ปีที่แล้ว +1

    CRC32 will do for a quick check, but for security it is best to use SHA256 to ensure nothing was tampered with.

  • @grantjoseph2730
    @grantjoseph2730 ปีที่แล้ว +1

    It's worth pointing out that the reason TCP's checksums aren't for security is because anyone who could replace the file being downloaded with malware could also just change the checksum to match the malware they inserted. That's why TLS/HTTPS uses an enhanced version of checksums called digital signatures that uses special encryption tricks to prove that the checksum was calculated by the server you're downloading the file from and not an attacker.

  • @phil2of3
    @phil2of3 ปีที่แล้ว +3

    A good hacker that changes a file for something malicious one on some server would also change the checksum file at the same time

    • @GSBarlev
      @GSBarlev ปีที่แล้ว +2

      Except good opsec is to never store your checksums (or your salt) on the same server as your sensitive data.
      Checksums in my circles also tend to be cryptographically signed via PGP.

  • @nvmuzrowrihk
    @nvmuzrowrihk ปีที่แล้ว +32

    Looking at the thumbnail, I thought this was about the new .zip TLD...

    • @BakersTuts
      @BakersTuts ปีที่แล้ว +1

      _sigh… unzips_

    • @mahdi9064
      @mahdi9064 ปีที่แล้ว +1

      can you give more context ?

    • @TheDakes
      @TheDakes ปีที่แล้ว +2

      ​@@mahdi9064 In short: Google registered .zip (and .mov) tlds for its domain service. This is bad because many programs will automatically convert zip file names into links now, even if sent by a trusted person. So bad actors could now register domains of common file names to host malware.

    • @robertlinke2666
      @robertlinke2666 ปีที่แล้ว

      @@TheDakes then it's probably good google snatched them before any actual malicious actors could. sure i dont trust google, and niether should anyone, but they wont use this to send you to malicious sites

  • @randomgeocacher
    @randomgeocacher ปีที่แล้ว +1

    Checksums vs Hashes vs Keyed Hash (MAC) and signatures could have been more clearly separated / explained. Fitting it into a technique format/speed is a challenge but would add a lot of value / clarity.

  • @neilalcoseba6978
    @neilalcoseba6978 ปีที่แล้ว

    This is still used if you have a slow internet and constant disconnection when doing downloads. Checksum is a way to check if your downloaded file is not corrupted.

  • @ramavabray
    @ramavabray ปีที่แล้ว +1

    I use TeraCopy in windows to handle all file copy and moves because I can turn on its verify option as a default and never have to worry about it again.

  • @mikejetzer4155
    @mikejetzer4155 ปีที่แล้ว +1

    If you want to verify a file that's copied locally (i.e., both the source and destination file are on locally-accessible filesystems), doing a file compare (e.g., the Unix/Linux "cmp" command) should be much faster than doing a checksum, and will tell you exactly where the first different byte appears.
    I'm not a Windows guy, so I don't know how easy this is to do in Windows, but if you're going to get a third-party product to perform your checksums for you, you could probably get a third-party "cmp" program.

  • @bladewind0verlord
    @bladewind0verlord ปีที่แล้ว +1

    literally right after he said "make sure they don't get corrupted" at 3:40, my blender simulation used up the last of my RAM and made the video start stuttering and I swear to god I just assumed that it was just a gag for the video

    • @TylerTMG
      @TylerTMG ปีที่แล้ว

      lmao

    • @TylerTMG
      @TylerTMG ปีที่แล้ว

      subbing btw please release vid

  • @6Twisted
    @6Twisted ปีที่แล้ว +1

    Pretty abysmal that Windows doesn't use checksums. I've had a few known corrupted files before and who knows how many unknown corrupted files.

  • @prawny12009
    @prawny12009 ปีที่แล้ว

    One thing you didn't mention is that corrupt file downloads can be deliberately induced by your isp because of "traffic shaping",
    The worst part is that the download would have been faster and use less data/bandwidth if they had simply allowed the download to go unimpeded instead of forcing you try over and over.

  • @amikadm
    @amikadm ปีที่แล้ว +2

    interesting question : could you "reverse" the SHA to get the file back from it ?

    • @Gramini
      @Gramini ปีที่แล้ว

      Absolutely not. Those fancy hashing functions are lossy, so you loose details. SHA1 is 160 bits / 20 bytes, sha256 is 256 bits / 32 bytes. If I give you the hash of my 3 MB file, well, you cannot restore it. That's also why those hashes are used for storing password, as they cannot be reversed.

  • @TechX1320
    @TechX1320 ปีที่แล้ว +5

    And yet hash collisions exist. We use this to crack files on some games to mod them

    • @GSBarlev
      @GSBarlev ปีที่แล้ว

      Yup. Since they mentioned Steam, it's worth noting that before SteamOS added the ability to directly change the boot animations, you could still swap in your own custom -Shrek supercut- video on the Steam Deck as long as it was precisely (down to the byte) the same length as the OG ani.

    • @TechX1320
      @TechX1320 ปีที่แล้ว +1

      @@GSBarlev old-school game called combat arms, you can use hash collisions to modify the game files to create exploits like wallhacks.
      Ash collisions don't mean the same byte size. Generally when you perform hash collisions, the file gets bigger

    • @GSBarlev
      @GSBarlev ปีที่แล้ว

      @@TechX1320 True. I'm conflating checksums with hashes. But we're on the subject of file verification anyway, so I think the point is fair.

  • @christopherchappell8881
    @christopherchappell8881 ปีที่แล้ว +4

    Did discover something interesting with MS Teams. Apparently, it is possible to get corrupted files sent out over Teams between users. Colleague of mine had a known good file direct from the manufacturer. They then sent that file via teams to several other users that needed access to the file but couldn't access the direct download. 2 of those it was sent to could not use the firmware file because the device they were updating kept throwing an error saying the file could not be validated. I had them send me their copy through a program that I know does checksums and when I compared the file size just on its face it was smaller than the verified original. So while it seems teams attempts to deliver files, I can say first hand that it's not guaranteed to arrive in one piece.

    • @shadamethyst1258
      @shadamethyst1258 ปีที่แล้ว +1

      I mean, it's MS Teams, I wouldn't expect it to work properly for anything

    • @blahorgaslisk7763
      @blahorgaslisk7763 ปีที่แล้ว

      A quick solution is to archive the file using 7zip and add the checksum to the file name. When the receiver run the file through 7zip to unarchive it will check the checksum and even if it matches it still will throw a fit when trying to unarchive the file if the archive has been changed in any way. This should be enough to catch any unintentional tampering, such as lost or corrupted packages.

    • @o0Donuts0o
      @o0Donuts0o ปีที่แล้ว

      Sooo it couldn’t be corrupted from pc to device requiring firmware? It’s just MS Teams? Lord your diagnosis skills are terrible.

  • @Pixelcrafter_exe
    @Pixelcrafter_exe ปีที่แล้ว +1

    The output of a hash functions are not necessarily unique since the input may be infinite but the output is finite. Its just higly unlikely to happen.

  • @zeddesnetos
    @zeddesnetos ปีที่แล้ว +7

    I never got the concept of checksums on a website.. I mean, if a hacker could replace the file, he can also replace the checksum to match the new file.

    • @someguy4915
      @someguy4915 ปีที่แล้ว +2

      Exactly, at best it could stop a hacker that intercepts the real file by a MITM but at that point the hacker could also just modify the website via MITM anyway...
      They hold the same value as a 'Verified', 'Clean' or 'Safe' label next to a download...

    • @rohansampat1995
      @rohansampat1995 ปีที่แล้ว +4

      .... thats not the point f00l. Checksums are not a way to prevent malicious actor file downloads. The problem is MITM attacks. The checksum is not transmitted data. The verification is run independently so you can attest, i expect this file from Website A, and I have this FILE B, and checksum of B matches the checksum on website A.
      If the website is comprimised, this will do nothing, but you are now sure that you got the right file from the website. Even if its compromised, the file is what the website intended. Other tools are needed for security against viruses.

    • @GSBarlev
      @GSBarlev ปีที่แล้ว +5

      That's why the truly paranoid host checksums on different servers than the files themselves.
      You can also cryptographically sign your checksums. So if you're downloading a package and expect it to say, "Linus Torvalds computed this checksum," then when the checksum instead indicates that the signer was "Linus Sébastien" you'll know that something's wrong.

    • @someguy4915
      @someguy4915 ปีที่แล้ว +1

      @@rohansampat1995 And that's the entire point 'f00l'... Any MITM that can intercept and replace a file download from a website, can also modify said website already.
      Unless you know of any website in the entire world that offers checksums on their HTTPS (encrypted) website while sending files over HTTP...
      Maybe some 20 years ago some websites would have HTTP pages of downloads which where then offered through FTP but no websites were showing checksums at that point yet.
      The checksum is not transmitted data? Well tell me then, how did the checksum reach my computer screen from the webserver if it was not transmitted through HTTPS like the file itself?...
      Why even point out that the checksum is calculated on the webserver's end, obviously it is, but it still needs to be *transmitted* (sent) to the client so that you can see and after the download of the file compare the two checksums...

    • @someguy4915
      @someguy4915 ปีที่แล้ว +2

      @@GSBarlev That would be more valuable and is luckily done by most major developers as long as the PKI signing company/organization is secure.
      But the 'simple' sites where it's just an MD5 of the file that same site offers as a download is completely useless. That doesn't defend against MITM attacks nor does it do anything if the webserver would get compromised.

  • @erice6755
    @erice6755 ปีที่แล้ว +3

    You should mention that there is a difference between UDP and TCP in this instance. Because if we're doing something over UDP it's not gonna bother with resending it, lost is lost at that point.

  • @Raistling
    @Raistling ปีที่แล้ว

    TCP/IP doesn't actually do a checksum in that way.
    now, it has been a while since I read up on it, but if memory serves, then TCP checks on a per packet basis instead.
    it also uses a kind of "session" number in order to keep track of a session of communication.
    sending info from A to B would look something like this:
    A: Sending packets 1-14
    B: received packet 14
    A: sending packets 15-34
    B: received packet 31
    A: sending packets 32-42
    B: received packet 36
    A: sending packets 36-40
    So in addition to having a session token in all this information, A tags all sent packets with a number per packet as well. B will read every packet it gets until it has either read all packets or the packet it receives is not the one numerically after the last. so if it gets 1, 2, 3, 5, then it stops and sends back that it got packet 3.
    Notice how little data B actually uses by just sending a response of the latest packet it received in a series. This makes sure that TCP is not gonna use tons of data to communicate back and forth.
    But it still does communicate back and forth in order to keep signal integrity.
    UDP/IP on the other hand is not like that.
    UDP is like pouring a bucket of water down the drain. Most of it should arrive sequentially, but some might not arrive in order. Or at all. But it doesn't matter, since the receiver isn't checking it. Video streaming is done like this in order to keep up with the massive amounts of data being sent, where TCP might lag behind. But it comes at the cost of sometimes being out of order and have a little lag spike here and there.

  • @stefanos6505
    @stefanos6505 ปีที่แล้ว

    Sometimes at low level shit goes wrong, but TCP also contains an ACK signal: if something does not arrive it will resend it.

  • @hikariyouk
    @hikariyouk ปีที่แล้ว +1

    TCP/IP -along with a lot of other things - uses CRC-32, which categorically isn't a cryptographic hash (even if it's used as one sometimes).

  • @OutlawJackC
    @OutlawJackC ปีที่แล้ว

    I remember tom scott going on about websites that put their checksums on there
    And he said if they are able to change the file sent it wouldnt be too dificult to change the hash on the website to the hackers file

  • @-B.H.
    @-B.H. ปีที่แล้ว

    TeraCopy as a windows file transfer replacement has been my go to for years for this.

  • @chuckthetekkie
    @chuckthetekkie ปีที่แล้ว

    And yet we still get corrupted downloads sometimes and have to manually download the file again.

  • @sadravin1
    @sadravin1 ปีที่แล้ว

    Video Suggestion: How to clean and maintain a Linux OS. Example: in windows you can delete temp files and stuff. How do we do that stuff on Linux. when i use the command prompt to install apps and frameworks; how do i know how to remove the bloat and leftover files after install? what are the common practices for keeping it clean?

  • @Deadi12
    @Deadi12 ปีที่แล้ว +2

    Thought this was going to be a video on the osi model. This is just as good.

  • @MatthewSuffidy
    @MatthewSuffidy ปีที่แล้ว +1

    Since checksums are a much smaller set of data than the data itself, it is possible for certain permutations of data to produce the same checksum, but improbable. That fact and others means that computers are not necessarily totally reliable but may be 1 in 1 x 10 e 10 reliable per bit or so.

    • @InfernosReaper
      @InfernosReaper ปีที่แล้ว

      Improbable, but inevitable due to the sheer amount of files and limitations of the system, which is why it's a terrible way for companies to check data on people's phones to send to law enforcement agencies

  • @ChitChat
    @ChitChat ปีที่แล้ว

    Hashing is one way encryption and used to digitally sign files. Like for root servers handing out certificates for intermediates.

  • @grayfox8547
    @grayfox8547 ปีที่แล้ว +2

    Linus has been cooking in that sun

  • @Schalari
    @Schalari ปีที่แล้ว

    This is the MOST informative Video I´ve watched so far. Thanks!!

  • @kylejohnson779
    @kylejohnson779 ปีที่แล้ว

    A TQ vid on cryptography, specifically password storage and Rainbow tables would be pretty cool as a sequel to this. Would love to see more security related content

  • @tharsis
    @tharsis ปีที่แล้ว +1

    Going by the thumbnail, here I thought this video was an incredibly speedy response to .zip top-level domains now being a thing, making phishing and tricking people into downloading malicious data stupidly easy.

  • @notenoughmonkeys
    @notenoughmonkeys ปีที่แล้ว

    To address confusion. Checksums//CRC’s/Hashes etc. are often used interchangeably but basically all have the same basic goal. Can you with reasonable confidence know the file/data you have is the one that you actually wanted.
    Simple checksums use very lightweight insecure algorithms but their only purpose is detecting simple corruption. There’s no security component., meaning it’s relatively trivial to modify the file and tweak it such that it still has a valid, if not identical checksum if you were a malicious actor.
    When you bring in cryptography the intent is to prevent that attack vector. In that whilst you can modify a file, doing it such a way that leaves the files hash unchanged is non trivial.
    Any/all methods of hashing will suffer collisions by the very nature of containing less data than the thing it’s describing. I.e. you can’t uniquely describe a 1gb file using only 256 bytes of data, if that were true we’d all just download the file hash and magically reconstruct the original file from that.
    The essence of the more secure methods is to make it that the collisions will be a function of chance, not intent.

  • @SongStudios
    @SongStudios ปีที่แล้ว +1

    Yeah, I've been getting these weird "ads" or "sponsored segments" for every video I watch.

  • @electricz3045
    @electricz3045 ปีที่แล้ว +2

    1:00 that's wrong, passwords don't become stored as hash, they become encrypted. Hash and encryption are not the same. Hash is 1 way directional so it csnt be reversed (thus hashing passwords in DB won't let users login anymore as it can't verify rhe oassword's correctness) while with encryption like AES or MD5 user authorization will work.

    • @CarlosCabrera-kn1jb
      @CarlosCabrera-kn1jb ปีที่แล้ว +2

      Most DBs will store the password’s hash. The encryption part goes from frontend to backend, backend then transforms the password into it’s hashed form and stores/compares against the hash saved in DB. Nothing wrong with that.

    • @GSBarlev
      @GSBarlev ปีที่แล้ว

      ​@@CarlosCabrera-kn1jb Yup. Technically it's possible that two passwords will share the same hash, but the likelihood (assuming good encryption) is far less than the odds that the key to your dad's 1998 Ford Taurus could also have started someone else's car (look it up)

    • @tercmd
      @tercmd ปีที่แล้ว

      1. AES is __encryption_ and MD5 is _hashing._
      2. The same password will produce the same hash so the hashes can just be compared.

  • @louisloudogtrottier3310
    @louisloudogtrottier3310 ปีที่แล้ว

    TY for bringning that up.

  • @MrSuspicious0
    @MrSuspicious0 ปีที่แล้ว

    Hashtab is another great checksum utility for windows, adds a hash tab to the properties of any file, showing it's hashes in many common hashing functions, you can paste in your hash and it will verify if its correct.

  • @semmu93
    @semmu93 ปีที่แล้ว

    i expected you to talk about error correction codes and how they are used in transmit, would love to see a video of it from you!

  • @der_rechtsamwald
    @der_rechtsamwald ปีที่แล้ว

    Hashtab inserts a extra tab "hash" into the files-options where you also can compare

  • @SkyboxMonster
    @SkyboxMonster ปีที่แล้ว

    I requested a personal data download from a website.
    instead they served me a Zip file that contained a full length movie. How in the fek did a MOVIE end up on that website... and how the fek did I get the movie and not my data download?

  • @shgysk8zer0
    @shgysk8zer0 ปีที่แล้ว +1

    PGP / cryptographic signatures are even better still. Every computer and device should come with PGP/GPG... so useful! Even works for signing email. Anyone can generate a hash, and using HMAC requires sharing the password/key (which makes it easy to fake authenticity). Public key crypto is the only real solution.

  • @AlFasGD
    @AlFasGD ปีที่แล้ว

    My computer literally crashed at 4:47 and rebooted on its own, a very creepy coincidence

  • @colt5189
    @colt5189 ปีที่แล้ว

    I had to download a program that would verify a copy/paste. As Windows doesn't do it for some reason. As sometimes a copy/paste gets corrupted. And you don't know until you try and open the file later. And then, that could be real bad if you don't have a backup.

  • @tkanal1
    @tkanal1 ปีที่แล้ว

    It is not only at TCP/IP layer but eve on the link layer the Etherner frames have CRC of that frame...

  • @dobelini303
    @dobelini303 ปีที่แล้ว

    You should do a video on the Border Gateway Protocol (BGP). One of the most fundamental and cool pieces of internet infrastructure that even most software engineers have no idea about!

  • @shanent5793
    @shanent5793 ปีที่แล้ว

    The cryptographic hash of a small file takes more than a trivial amount of time, and that time is constant for files smaller than the hash block size. Large files can take advantage of pipelining and amortize any required context switches. Hashing a gigabyte of data in one file will take much less time compared to hashing the same data divided into 2²⁸ four-byte long files.

  • @Progaros
    @Progaros ปีที่แล้ว

    if you have 7-zip installed, you can right click a file and get "all" hash-sums

  • @00001Htheprogrammer
    @00001Htheprogrammer ปีที่แล้ว

    But TCP already ensures a packet drop/corruption will raise an error, right?
    That means manually checking the full file isn't neccesary?
    And if hackers want to tamper with the file, they can also easily change the checksum to the one calculated from the malicious file.

  • @rawl1
    @rawl1 ปีที่แล้ว +1

    Love it when u make 5 minute videos with 1 minute ad

  • @kethernet
    @kethernet ปีที่แล้ว

    Worth noting that TCP and UDP use a small non-cryptographic checksum. It's only 16-bits, not nearly as long as the one the animation showed. That means random collisions are far more likely (but still pretty rare), where random bitflips could pass the check, and since the checksum is part of the packet itself, it doesn't provide meaningful security from intentional changes by a "man in the middle". HTTPS provides end-to-end security that prevents that, but basic TCP and UDP don't.

  • @finkelmana
    @finkelmana ปีที่แล้ว +1

    You better hope your password is not stored as a hash, as rainbow tables solve that problem. Salted hashes... well thats different.

  • @GabrielAngelfire
    @GabrielAngelfire ปีที่แล้ว +1

    Why do I have to worry if U2 gets hacked or not?

  • @anthonymorris8891
    @anthonymorris8891 ปีที่แล้ว

    There was a forum I used to use that got confused and applied a scantily clad woman with the text SEND NUDES as my profile picture. I named my PFP the same as that one so the server was like, yeah these are the same.

  • @amd2800barton
    @amd2800barton ปีที่แล้ว

    Why have I never thought to use checksums when copying files on my own computer and network? I usually just resorted to verifying that the total byte count was identical.

  • @sorak185
    @sorak185 ปีที่แล้ว +1

    The thumbnail made me think this was going to be about the .zip TLD issue currently going on...

  • @anderstroberg3704
    @anderstroberg3704 ปีที่แล้ว

    Getting a hash for a large file while you are copying it does only take a trivial amount of time. You already have the file in memory, it's just a few instructions extra per byte. What takes time is getting a hash for a file you aren't reading anyway, as file operations is where time is spent.

  • @I.____.....__...__
    @I.____.....__...__ ปีที่แล้ว

    You can get the wrong file even despite hashes; the files can simply be named incorrectly. For example, P2P sites that have a search function like the donkey network. You do a search, find a file, download it, the client confirms the file was transferred correctly by checking its hash, then you try to open it, and nope, it's the wrong file. Be careful with those, the file could be various different kinds of bad (not just malware…)

  • @L9MN4sTCUk
    @L9MN4sTCUk ปีที่แล้ว

    Resilient File System (ReFS) which comes with Workstation editions of Windows does this checking.

  • @FlyboyHelosim
    @FlyboyHelosim ปีที่แล้ว

    So how does that explain uploaded or downloaded files getting corrupted? This was especially an issue a with slow or unstable internet connection.

  • @Ch40zz
    @Ch40zz ปีที่แล้ว +3

    Checksum != Hash
    A checksum is - wait for it - just a sum of bytes/words forming a number. quicker than most hashes but also very insecure, usually fit into one register. A single bit flip will often not have a big impact on the result unlike for hashes.
    You also forgot to mention that there are cryptographically secure hashes like SHA256, BLAKE etc. which you can generally trust because they are hard to spoof and there are cryptographically insecure hashes like MD5, FNV, etc which are easy to spoof and shouldnt be trusted.
    SHA1 was broken some years ago by google already btw.

  • @safakkoklu6414
    @safakkoklu6414 ปีที่แล้ว +1

    I am surprised to see you havent mentioned "torrent" while making a video about checksums. In torrent protocol every chunk has a checksum. In that way, you dont need to download an entire file if you recieved corrupted data.

  • @riiiiiiiiiiiiiiiiip
    @riiiiiiiiiiiiiiiiip ปีที่แล้ว +1

    Gosh darn it, Colton..

  • @98SE
    @98SE ปีที่แล้ว +1

    I remember when the channel was called "Fast as Possible", god that was quite a long time ago now and I've been watching LMG since 2012!...

    • @TylerTMG
      @TylerTMG ปีที่แล้ว +1

      comrade

  • @youtubegaveawaymychannelname
    @youtubegaveawaymychannelname ปีที่แล้ว

    Love to see a techquickie on Algorithms. What they do, how they're made, what they look like. Etc. If not a techquickie then maybe a full LTT video.

    • @samuelhulme8347
      @samuelhulme8347 ปีที่แล้ว

      Simply an algorithm is just a set of instructions to perform a task, like a recipe for baking a cake or a computer program. The algorithm creation process starts with a problem which needs to be solved. A common way to solve problems is to use “computational thinking”, it involves abstraction and decomposition. Decomposition is the process of splitting the problem in smaller tasks. Abstraction is the process of removing unnecessary details from the problem. Once you understand the problem you can design a solution, for a computer program, flowcharts and pseudocode is used. Once that is done the program can be implemented in a real programming language.

  • @Zyo117
    @Zyo117 ปีที่แล้ว

    I had a problem just the other day where every news article from the CBC I clicked on in Google News led to a completely different unrelated article, also from the CBC. It only happened with them, and only through Google News. Some weird DNS error maybe?

  • @lpprogrammingllc
    @lpprogrammingllc ปีที่แล้ว

    Checksums are also how automatically de-duplicating filesystems for incremental backups work. Each file is stored not by its name, but by its content hash. The file metadata then just records the hash of the content, and any other file with the same content will point to the same physical extent. Tahoe-LAFS leverages this for distributed files between friends, and Freenet uses a similar process to shard and distribute files pseudo-anonymously across the entire Freenet network.

    • @flameshana9
      @flameshana9 ปีที่แล้ว

      Why do backup programs still make duplicates then? I've tried so many and they all do a painfully bad job at it.

    • @BrianG61UK
      @BrianG61UK ปีที่แล้ว

      Best to use a long cryptographic hash for de-duplicating. A long CRC could work too. A simple sum is not really suitable for this, too much chance of collisions.

    • @BrianG61UK
      @BrianG61UK ปีที่แล้ว

      @@flameshana9 In my experience it mainly gets used to avoid re-backing up a file (or block of data) that hasn't changed since last time it was backed up. Not to avoid backing up a file that is a copy of another file also on the source media.

  • @Tomatobird8
    @Tomatobird8 ปีที่แล้ว +1

    Lots and lots of great notes in the comments this time. I think some of it could've been easily adressed in the video itself but somehow got overlooked.

    • @JazGalaxy
      @JazGalaxy ปีที่แล้ว

      Sometimes I sure they cut useful content for time or clarity of topic.

  • @CodeAsm
    @CodeAsm ปีที่แล้ว +1

    OOof, at first I thought your gonna talk about googles new TLD idea that got real, .zip domains.
    I love downloading random zip files from the internet that are not what I thought they where. (yeah I got myself a few .zip domains)

  • @carlanderson5068
    @carlanderson5068 ปีที่แล้ว

    One point you missed. Hashes aren't unique as you stated. That's why we can have collisions. After all, no finite value uniquely mapped to an infinite value. Intentional collisions are difficult to figure out currently due to provably hard math, but accidental collisions aren't what these are trying to protect against. Adding in details like original file size can help reduce collisions even more, but don't completely eliminate the possibility (speaking mathematically).

  • @jessegames6714
    @jessegames6714 ปีที่แล้ว +1

    Hey guys , I heard that LTT Store Shipping Method is to expensive. Insane.!!
    You're guys are awesome 👌 👏 👍🏻

  • @revcrussell
    @revcrussell ปีที่แล้ว

    TCP/IP _needs_ the checksum because of data collision is a real problem.

  • @mort_brain
    @mort_brain ปีที่แล้ว

    A quite tingling theme regarding your recent story =)

  • @Hotrob_J
    @Hotrob_J ปีที่แล้ว +1

    Is it just me, or does it look like Linus got some sun over the weekend?

  • @CasterbalTV
    @CasterbalTV ปีที่แล้ว

    *Teracopy* - It checks checksum after files transfer.

  • @bou222
    @bou222 ปีที่แล้ว

    More of this!!!! plz and thank you

  • @jkahgdkjhafgsd
    @jkahgdkjhafgsd ปีที่แล้ว

    I hate that windows has no easy checksum copy. I use 7zip's checksum to check large backups.