I love this guy. he's like the James Grime of programming. he's just so animated and passionate about what he's talking about, all the time, it's really cool.
1:37 It's not always bad to have a very very fast hashing algorithm, it's actually very good to have a fast hashing algorithm when using it for the file checking stuff explained earlier. The only time you don't want a fast hashing algorithm is when it deals with security, mainly hashing passwords.
My wife loves Numberphile but often doesn't really understand all the content in a video. She really particularly likes James Grime. Now I know how she feels! I don't understand much of what Tom Scott says, but he's just so utterly watchable and engaging!
The download verification hash is usually to verify that the download was received without being corrupted during transmission, not really for security.
You can use hashes when you're distributing a file through various mirrors - you have some confidence that the version on your site is clean, but to ensure that the version on others' sites match your own, you use the hash to verify.
If it's the same password, they can tell because the previous password hash will be the same as the current hash. If it's "too similar", it is possible that they have stored other information about the password (eg length), although most likely they have just stored it in a form that is convertible to plain text which is really quite bad.
The hash for file downloads is usually used by open source projects, where the executable may be mirrored by countless universities which the software author doesn't have control over. In such a case, it certainly is *not* trivial to compromise both locations.
+Skrapion However, executable hashes are easily manipulated because you can always pad the file with some extra bits that affect the hash but are never actually executed. So if you have a weak hash like MD5 then it's relatively easy (-ish) to make a malicious executable file with the same hash. Of course if you have a good, secure hash like SHA-2 (so far!) then they only way to figure out what bits to use for padding is to try it, check the hash, and keep trying until something matches - i.e., brute force. This takes an impractical amount of time to do, though, as long as they hashing algorithm isn't *too* fast (as Tom mentioned). Of course this is a different argument. You're talking about whether you can trust the posted hash value to be correct or forged (which would be a problem no matter how secure your hash algorithm is). I'm talking about if the posted hash value is correct, but the hash algorithm itself is weak... The only thing I'd say to your point is, how many people actually cross-check the posted hash against an independently run mirror/fork? If you do, then I'm sure you're in the minority. ;)
Awesome video Tom! In a previous job i use to test slot game software and for verification of the program we use to use multiple algorithms like MD5, SHA-1 and SHA-2 to ensure what we approved in the lab is what gets used in the fields.
The point of publishing hash codes for downloads is still useful for the common case where the download comes from a 3rd party (mirror or torrent), and also to ensure integrity (no transmission errors in the downloaded file or storage media).
Watching this video 9 years after it was uploaded, he was right SHA-1 is broken now but still incredible helpful to get a high level view over that topic!
Including a hash with a file to be downloaded is a common practice when the file itself is hosted on various mirrors rather than on the main site. In this case, being able to retrieve the hash from the trusted site, and compare it to the binaries from the potentially untrusted mirrors allows one to be reasonably confident that the mirror didn't send you a modified version of the file.
About downloading files and using hashes to check what has been downloaded is not supposed to be a "security feature". It's meant to provide a simple check that the download itself went on without disruptions. In other words that one actually received the whole file as intended and that the file wasn't broken due to a poor connection.
Length of the hash is how you get security, not making a slow hash function. In other words, you should always try to make the hash as fast as possible, but have at least *some* fixed overhead. This means a reverse or "dictionary" attack is stuck at brute force -- if a security system relies on the failure of a brute force attack due to the slowness of the hash function, the system should be considered weak. The issue with MD5 and SHA0 is that they have mathematical weaknesses the reduce their security. In other words you can, in a sense, pre-match some of the hash bits before you start the dictionary attack, effectively reducing the length of the hash. The "rumors about SHA-1" are almost certainly to do with the relatedness of SHA0 and SHA-1. Probably the weakness discovered in SHA0 were generalized and expanded upon then applied to SHA-1. SHA256 is basically safe because it uses an overkilled sized hash space (256 bits). It means all collisions searches are suppose to take about 2^129 operations, which the sum total computing power of the earth has yet to even come close to doing. The idea is that even if SHA256 is significantly weakened (say, by 32 bits) there would still be enough space left over that it would still be basically impossible to break. Longer hashes are always a little more inconvenient than smaller ones. This is why SHA1 existed. It only output 160 bits, which assuming the SHA1 algorithm were no weakened in any way, requires 2^81 operations to break which is still too much in practical computing environments. The problem is that attacks which can reduce the number of bits by, say, 52 bits, would be sufficient to reduce the problem space to 2^54 which some super computers may be able to tackle given enough time. MD5, which only outputs 128 bits, was easily cracked when its output space was reduced below 100 bits, independently by a French team and a Chinese team in the early 2000s (the French team did not reduce the number of bits as much, but used much more computing power). Nowadays, it is easy to construct MD5 hash collisions using the Chinese team's approach. The remedies to this are to use updated algorithms such as Keccak (aka SHA3) or to use more bits (like 224, or higher like SHA256 does).
I thought they wrote the hash for 2 reasons, 1. To make sure the mirrors arent tweaked or 2. to avoid bit errors (where a 1 goes to 0 or vice versa) caused by physical factors. Doesn't happen too often now a days though.
Definitely should bring up BCrypt during password hashing. It's "work factor" is a design feature to help with the speed concern mentioned in this video.
that is not true anymore - while bcrypt currently is the "safest" key derivation function, it has one major weakpoint: it has a very small memory footprint which allows it to be parallelized - but that is ok, the algorithm is almost 15 years old (and it is based on the 20 year old blowfish) - time for a replacement: scrypt is almost ready to go and solves the memory issue by creating a ridiculously large memory footprint
Corey Ogburn bcrypt is computed in-place and does not utilize and RAM (besides that, the operating system or the process consumes - and that is very litte) - scrypt on the other hand consumes lots of RAM which makes it considerably slower than bcrypt because RAM is "slow" (even if it is GDDR5) in practical use scrypt is around 2000 to 4000 times slower than bcrypt
Corey Ogburn it means you can't use ASICS to compute it. Because RAM in ASICS is very expensive. If some algorithm has a current state that consists only of few dozen bytes it can be calculated very cheaply with very very fast speed.
Here's a fabulous idea that will extend the life of MD5 and make security much, much harder to circumvent: Use TWO OR MORE hash algorithms and test against multiple signatures. It may be trivial to break one hashing algorithm, but the likelihood of being able to change a file in such a way that multiple hash algorithms still match is infinitesimal, and may even be zero.
MD5 is still used regularly to check file integrity. No one should expect anything but an assurance the file is intact, but if this is all you need to know, it works beautifully. Secondary systems should to be in place to secure the file to be delivered anyways, and if you are checking for malicious modifications via hash check client side after the file is delivered, you are doing all wrong. However, it is very important to make sure the delivered file did not get corrupted, so MD5 in this case is perfect, its fast and extremely reliable for this job.
ReturnOfTheBlack Sure it was, he is talking about extending the life of md5, however, it still has a long life ahead of it, just not relating to security.
checking for multiple simple hashes may, and usually do, take more time than checking one really secured, not yet breakable, hash, such as SHA2 ou SHA3. Therefore the best option is still, to develop a very secure hash instead of using multiple ones.
using 2 hash algorithms would be worse. an example would be. hash A has a collision with BOB and FRED both equalling 5. So you can change bob to fred and noone will know. if has B has a collision of 5 and 19 both equalling rabbits and on hash A JOHN and NICK both equal 19. Then you can replace BOB with FRED, JOHN or NICK and noone will know. Also it will take twice as long, for a non exponential increase in security.
I always thought it was odd to check the file integrity by trusting the hash from the same site that you downloaded the file from. I mean, if the site is compromised, why would I trust their "checksum"?
I remember with those download sites, there were some groups that actually did modify the file to make the CRC or later MD5 spell out something interesting in hex. Like the version number or the episode number of the series.
My first experience with hashing was keying in programs on my Commodore 64 from magazines like Compute!'s Gazette, RUN, or Ahoy in the 1980s. They included a program that would start a wedge (a daemon, of sorts) to compute and display a checksum for each line of code entered. Each line of code printed in the magazines ended with a comment containing a checksum for that line. If the checksum appearing on the screen didn't match the one printed in the magazine, I'd know I entered the line of code incorrectly and I'd need to look for mistakes. That's how we did it back then, and WE LIKED IT! 😉
Just wanted to say that CyanogenMod (and many other Android-flashables) use md5 for their downloads. It works very well for me and has nothing to do with security. Sometimes, I get a file, download it onto my pc, then try to transfer it to my phone. Then when I flash the file, it doesn't work. Some transfer process I used changed the file along the way. I really don't know why. Maybe it's because many of them are .zip's, and perhaps they want to use a different method of compression or something. When the hash code is shared next to the download, I can verify the file's fidelity by the time it reaches my phone.
Isn't the last point about verifying file downloads a little off? The point isn't that the file downloaded is verified to be the same file as the one put on that site. The point is that a file downloaded from 3rd parties (mirror sites etc) is checked to be the same file as a central master copy that the author has certified.
Surely it is common place for people to be storing passwords with an additional, random, salt nowadays which, combined with the hashing method (sha1, sha2, md5 etc..) makes it a lot more secure than just hashing a password directly?
If your database has been compromised and you've let customers/end users know, a year SHOULD be plenty of time.. the biggest weakness is any system is the end user..
Yes, a salt should be used, but only in combination with the right hash algorithms (not MD5 or the SHA family)! A salt prevents the hash being broken using a precomputed hash-password dictionary (a.k.a. "Rainbow Tables"). But on modern hardware, computing hashes such as an MD5 or SHA1 is often faster done directly than having to look it up in a giant file on disk. Attackers probably won't bother using rainbow tables, and salts will therefore not slow them down. The key is to not use MD5 or SHA. These hash algorithms are designed to be very fast for message authentication, but they are not suitable for hashing passwords! When hashing passwords you need a very slow and computationally intensive algorithm (such as scrypt or bcrypt), such that it becomes hard to attack using brute force. A salt is then used to mitigate attacks using precomputed hash tables.
@Tom Scott - Actually MD5 HAS been broken with text documents. It's been done with carefully constructed PDFs that say different things and have the same hash. AFAIK it's based on some APPEND attacks on MD5 and the ability of PDFs to overlay stuff.
If I make a hash algorithm in PHP or JS, how do I hide that algorithm securely from users? I could make a kind of secure hash algorithm, but that is useless if everyone can just read the instructions
hash codes on websites offering a download are also used to make sure the download went well and nothing got corrupted (or involuntarily changed by a machine error or noise)
I'm very much looking forward to the next video, as I greatly disagree about password storing using hashes. Obviously not unsalted and not only one iteration of hashing, but still.
I guess you watched the video, but just in case. In here, he says that hashing algorithms should not be used to store passwords. The alternative I would have suggested when I posted this is exactly what he posted in the next video, or the standard HMAC using a unique salt per user as it's also known.
Okay it's been a while since I last had to code passwords into a database but back then md5 was still perfectly acceptable. It's good to know this is no longer the case and web scripting languages have adapted to provide the extra layer of security. So in PHP where I used to use md5() for passwords I shall now use password_hash() Thanks :)
Please let me explain something: Download websites doesn't use MD5 hash verification to guarantee that the file hasn't been changed by a hacker from their server, they know that if a hacker could've did that, then why he couldn't change the hash as well! The hash is only used to check whether there wasn't any network error while downloading the file that flipped or deleted some bits from the file, essentially corrupting it. This is especially important and widely used when downloading OS images (those things are large, take time to download, thus are vulnerable to network corruption) or when downloading files using the Torrent protocol, which downloads the file as chunks then the client glues them together again, so it's a check whether the network or the client didn't miss any piece of the file.
This video was mentioned in a later video so I went looking for this one, but couldn't find it. Now it just appear in my What To Watch list. Weird. Good video. I would like to see something more in-depth, perhaps by numberphile.
When using hashes for file or packet verification, wouldn't using multiple hash types on the same file/packet and comparing all the hash types applied provide much greater reliability? The chances of multiple hash types having overlapping collisions is infinitesimally small with just 2 hash types let alone more. Thanks for the great videos!
Actually, the use of md5 in webs for verifying files is to be sure that there weren't errors transferring the file, not to be sure that te file you wanted is that one. (i.e: y want to download a .bin with number 50, wich is 00110010 but the first bit has an error on one of the routers and instead I recieve 10110010, translated, 178. So well, now instead of sending them 50 pidgeons, i'll send 178.
The software or file download that has the hash along with it is actually secure. Provided they sign the hash. That is they run RSA on the hash using the Private key of the company. So, nobody can change the hash. If they should change the hash, they need the private key of the company.
Or share the hash on a company owned HTTPS server while file download could be provided using plain http mirrors. If the file format supports signing, that is better. Like signing an exe.
I don't think the hash of the file on a download page is a measure against hackers. I always thought it was supposed to be a way to check if there were any loss in downloading the data.
actually, the idea of putting the hash file along side the original file is a very good idea, if you are aware of what you are doing. 1- people with such level of security use multiple servers and mirrors, and you should really really get the two files (the hash and the document or program) from different sources/paths. that is internet security 101. 2- mostly the hash is used against man in the middle, hashes does little good if your server is compromised (that is not its intended use). using hashes assumes the source is trusted and the channel is not. hash could be very useful for example if you are getting the file from a ptp network and you'd like to verify that you got the actual file, not a different one. and passwords should be encrypted in the database, specifically encrypted with algorithms that have decryption algorithms with key or salt and such. hashing the password in database (with any algorithm, secure or not) forces you to use non secure authentication methods that exposes the password at the client side.
I was hoping for exactly the details he said would take hours to explain. But then again maybe the mathematics of hashing would fit Numberphile better. Here's to hoping James will tackle this!
4:10 Could this be used for, say instead of changing the name of the next lunar astronaut, which MAY get you on the fast track but probably won't. (after all they are bound to notice you are grossly unqualified for such a mission) but instead manipulate troop moment orders in Pakistan. If I could get 6 or seven armor battalions to suddenly be ordered to the India- Pakistan border, well that's bound to get India to respond, which could begin a chain of events that ends in nuclear war. Even if it is discovered that it was fake orders that started it, it might go out of control before it could be stopped.
So, here's a analogy that I thought of, it's like the file is an engine and the hash is a photo of that engine: the hash won't work as an engine and you can't figure out how that engine works, But you will be able to see that the engine is the same as the one that is in the photo (probably).
With the new innovations in compute… how reliable is hashing? I’m not well versed in IT security but it sounds like if computers got too fast for md5… then the AI stuff can probably break any.
Very informative, I was wondering if this also relates to how many schools tjek for fraud? When you turn in a paper, they somehow tjek if it correlates (to much) with some preexisting text, does hashing algorithms play a part in that process?
Hashes would be quite useless in that regard, because changing a single character would make the two papers' hashes completely different. I would assume that they use regular expressions, but I could be wrong.
Good point, I somewhat expect that the method is either very simple, or extremly complicated.. could be cool to see on computerphile.. I suppose that a lot of students are enjoying the channel :)
I forget which game it was, but gaining access to it's protected memory region required generating hash collisions until it overflowed. Once you got access it was just function call to reset the stored hash with your injected code factored in.
You said that adding the hash code to the website is a bad idea, because if the file is stored on the server and someone's able to change it, then they can easily change the hash code as well, pretending that it's safe. But what about when you're using P2P downloads, rather than a server? In a torrent sites, for example, I can see how having the hash come in handy! Someone could tamper with the links and whatnot, but the end file that i'll get will be different than the hash on the site, thereby revealing that someone messed with the file.
This is really the kind of video I would love more from on Computerphile. :P (Maybe going a bit more into detail... just slightly but not as superficial as some of the other videos on this channel. - That's also why I like Tom Scott btw :p) Also those "nostalgia" videos about the times, where most of us probably didn't even exist yet or knew what a computer is about are pretty neat. (Forgot the name of this nice older professor) Just some feedback - That's all. Bye and thanks for your content, Brady
Question... Suppose my system stored both the MD5 and SHA1 hashes of an input X. Individually, MD5 and SHA1 are broken. But is it possible to construct a separate input Y which matches both the md5 and sha1 hashes of input X?
if my md5 key is like a randomly generated string of 3000 characters and numbers, will that highly decrease the chance of something else hash collision it?
You know, I've been thinking: Why haven't we switched to gallium phosphide for our CPU and GPU yet? I'm well aware that the material is inappropriate for FET's at the moment, that it is expensive, but I don't get why we can't just switch back to BJT's in order to accommodate the new material- surely TTL is adequate for the modern integrated circuit! PS: I just looked at the pertinent Wikipedia pages more closely, and it turns out they're experimenting with aluminium oxide for this stuff.
You can store passwords just fine with SHA256, as long as you don't just hash the passwords without any algorithmic changes applied to them. What works best, is quite probably storing a password together with a ridicoulusly long cryptographically secure random salt (say for example 1024bit salt).
I think the intent of the MD5 hash under a software program was to be used to validate the download was successful, especially back when we were all on dialup.
If checking the hash on the site from which the file was downloaded is useless for security purposes, what is the best source for the hash? What reputable sites provide these hashes?
I disagree that a hash on a download is pointless. It lets you use mirrors that you don't necessarily trust. You have the hash stored on your hosting (since it doesn't use much bandwidth), and then have the actual download hosted on other hosting (mirrors?). Now, even if one of the download sites are hacked, you can always verify the file against the hash on your hosting, which you know is secure.
hey guys thanks for this it has really helped he to understand the hash algorithm but in relation to the public and private keys would this be used in conjunction with the digest to verify digital signatures
You should note that the checksums published by the application/file author (vendor) can be digitally signed so you'd be able to verify nobody changed the checksum on the site. Checkout the Fedora Linux ISO verification progress here: fedoraproject.org/verify 1.) download ISO 2.) import Fedora public key (GPG) 3.) download the ISO checksum file, verify it via the public key 4.) validate the ISO against the checksum Yes, the GPG key could in theory be changed, but I'm guessing it's under much higher scrutiny than the checksum files. Also, you fetch it via HTTPS so there's no man-in-the-middle attack vector.
It'll take you an hour to explain? Great! Who else wants a 1h computerphile special with Tom Scott? :D
Sounds like a great idea!
3:41 That's an… interesting rocket.
I love this guy. he's like the James Grime of programming. he's just so animated and passionate about what he's talking about, all the time, it's really cool.
1:37 It's not always bad to have a very very fast hashing algorithm, it's actually very good to have a fast hashing algorithm when using it for the file checking stuff explained earlier.
The only time you don't want a fast hashing algorithm is when it deals with security, mainly hashing passwords.
Damn.... someone who truly breaks things down to the best way ones can understand. Thank You tons! 9yrs later LOL.
My wife loves Numberphile but often doesn't really understand all the content in a video. She really particularly likes James Grime.
Now I know how she feels! I don't understand much of what Tom Scott says, but he's just so utterly watchable and engaging!
he's my personal hero, hands down the most entertaining guest on computerphile :)
Tom Scott is the only reason I subscribe to this channel. He has so much enthusiasm that makes me want to know more about the subject he's explaining.
Just realized the opening title card says "" and the end title says ""....
Very clear, no bullshit introduction to skip. Right to the core. Thanks a lot.
The download verification hash is usually to verify that the download was received without being corrupted during transmission, not really for security.
I was thinking the same thing. Everytime I've seen the MD5 on a download I never thought it implied that it was secure.
You can use hashes when you're distributing a file through various mirrors - you have some confidence that the version on your site is clean, but to ensure that the version on others' sites match your own, you use the hash to verify.
Someone clearly had a bit of fun drawing that cocket... er I mean rocket.
Sometimes webistes deny a password reset since the new password is "too similar" to the old one. How do they know this is all they have is a hash?
well, then run!
If it's the same password, they can tell because the previous password hash will be the same as the current hash. If it's "too similar", it is possible that they have stored other information about the password (eg length), although most likely they have just stored it in a form that is convertible to plain text which is really quite bad.
The hash for file downloads is usually used by open source projects, where the executable may be mirrored by countless universities which the software author doesn't have control over. In such a case, it certainly is *not* trivial to compromise both locations.
+Skrapion However, executable hashes are easily manipulated because you can always pad the file with some extra bits that affect the hash but are never actually executed. So if you have a weak hash like MD5 then it's relatively easy (-ish) to make a malicious executable file with the same hash. Of course if you have a good, secure hash like SHA-2 (so far!) then they only way to figure out what bits to use for padding is to try it, check the hash, and keep trying until something matches - i.e., brute force. This takes an impractical amount of time to do, though, as long as they hashing algorithm isn't *too* fast (as Tom mentioned).
Of course this is a different argument. You're talking about whether you can trust the posted hash value to be correct or forged (which would be a problem no matter how secure your hash algorithm is). I'm talking about if the posted hash value is correct, but the hash algorithm itself is weak...
The only thing I'd say to your point is, how many people actually cross-check the posted hash against an independently run mirror/fork? If you do, then I'm sure you're in the minority. ;)
These videos with Tom Scott are my favorite on computerphile. You should do more with him!
That moon rocket looks very similar to.. uhh..
Tom is such a great teacher, I hope that's his job in real life because the world needs more teachers like him
"if you have 50 pigeons into 25 pigeon holes, you have to stuff 2 of the pigeons into 1 of the holes"
Awesome video Tom! In a previous job i use to test slot game software and for verification of the program we use to use multiple algorithms like MD5, SHA-1 and SHA-2 to ensure what we approved in the lab is what gets used in the fields.
Such a passionate guy.
Kind of a shame this video was mostly an intro to the next episode but i'll definitely be watching out for it!
The point of publishing hash codes for downloads is still useful for the common case where the download comes from a 3rd party (mirror or torrent), and also to ensure integrity (no transmission errors in the downloaded file or storage media).
Or if you use a download manager to download from multiple sources, to ensure you got all your sources right.
The first audio book I ever listened to, is still to come.
You are by far the best of the computerphiles. The sorting guy is good too, but wavey-hair is superb.
Watching this video 9 years after it was uploaded, he was right SHA-1 is broken now but still incredible helpful to get a high level view over that topic!
Including a hash with a file to be downloaded is a common practice when the file itself is hosted on various mirrors rather than on the main site. In this case, being able to retrieve the hash from the trusted site, and compare it to the binaries from the potentially untrusted mirrors allows one to be reasonably confident that the mirror didn't send you a modified version of the file.
1:34 Isn't that the first 6 digits of pi?
I would love to see a whole series about computer science, or security, by Tom Scott.
3:51 how is the rocket appearing behind the moon? The rest of the moon is still there
About downloading files and using hashes to check what has been downloaded is not supposed to be a "security feature". It's meant to provide a simple check that the download itself went on without disruptions. In other words that one actually received the whole file as intended and that the file wasn't broken due to a poor connection.
Length of the hash is how you get security, not making a slow hash function. In other words, you should always try to make the hash as fast as possible, but have at least *some* fixed overhead. This means a reverse or "dictionary" attack is stuck at brute force -- if a security system relies on the failure of a brute force attack due to the slowness of the hash function, the system should be considered weak.
The issue with MD5 and SHA0 is that they have mathematical weaknesses the reduce their security. In other words you can, in a sense, pre-match some of the hash bits before you start the dictionary attack, effectively reducing the length of the hash. The "rumors about SHA-1" are almost certainly to do with the relatedness of SHA0 and SHA-1. Probably the weakness discovered in SHA0 were generalized and expanded upon then applied to SHA-1.
SHA256 is basically safe because it uses an overkilled sized hash space (256 bits). It means all collisions searches are suppose to take about 2^129 operations, which the sum total computing power of the earth has yet to even come close to doing. The idea is that even if SHA256 is significantly weakened (say, by 32 bits) there would still be enough space left over that it would still be basically impossible to break.
Longer hashes are always a little more inconvenient than smaller ones. This is why SHA1 existed. It only output 160 bits, which assuming the SHA1 algorithm were no weakened in any way, requires 2^81 operations to break which is still too much in practical computing environments. The problem is that attacks which can reduce the number of bits by, say, 52 bits, would be sufficient to reduce the problem space to 2^54 which some super computers may be able to tackle given enough time.
MD5, which only outputs 128 bits, was easily cracked when its output space was reduced below 100 bits, independently by a French team and a Chinese team in the early 2000s (the French team did not reduce the number of bits as much, but used much more computing power). Nowadays, it is easy to construct MD5 hash collisions using the Chinese team's approach.
The remedies to this are to use updated algorithms such as Keccak (aka SHA3) or to use more bits (like 224, or higher like SHA256 does).
would you recommend SHA 512 over SHA 256 in order to avoid collision searches even if weaknesses in SHA 2 are found?
I think that the "rumors about SHA-1" have to do about that the NSA has built in backdoors
I thought they wrote the hash for 2 reasons, 1. To make sure the mirrors arent tweaked or 2. to avoid bit errors (where a 1 goes to 0 or vice versa) caused by physical factors. Doesn't happen too often now a days though.
Tom Scott for the win with my cyber security assignment
I love seeing Videos with Tom, he is just awesome and the topics are interesting.
I bet most people here won't mind longer videos if stuff gets explained!
Definitely should bring up BCrypt during password hashing. It's "work factor" is a design feature to help with the speed concern mentioned in this video.
that is not true anymore - while bcrypt currently is the "safest" key derivation function, it has one major weakpoint: it has a very small memory footprint which allows it to be parallelized - but that is ok, the algorithm is almost 15 years old (and it is based on the 20 year old blowfish) - time for a replacement: scrypt is almost ready to go and solves the memory issue by creating a ridiculously large memory footprint
What exactly do you mean by that? A small memory imprint?
Corey Ogburn it means it uses very little amount of memory (RAM)
Corey Ogburn bcrypt is computed in-place and does not utilize and RAM (besides that, the operating system or the process consumes - and that is very litte) - scrypt on the other hand consumes lots of RAM which makes it considerably slower than bcrypt because RAM is "slow" (even if it is GDDR5)
in practical use scrypt is around 2000 to 4000 times slower than bcrypt
Corey Ogburn it means you can't use ASICS to compute it. Because RAM in ASICS is very expensive. If some algorithm has a current state that consists only of few dozen bytes it can be calculated very cheaply with very very fast speed.
Watching this in 9 years and what he said about SHA-1 is so funny.
This was a very nice video..My first comment on any video on youtube !
You explained it significantly better than my lecturer did
Here's a fabulous idea that will extend the life of MD5 and make security much, much harder to circumvent: Use TWO OR MORE hash algorithms and test against multiple signatures. It may be trivial to break one hashing algorithm, but the likelihood of being able to change a file in such a way that multiple hash algorithms still match is infinitesimal, and may even be zero.
MD5 is still used regularly to check file integrity. No one should expect anything but an assurance the file is intact, but if this is all you need to know, it works beautifully. Secondary systems should to be in place to secure the file to be delivered anyways, and if you are checking for malicious modifications via hash check client side after the file is delivered, you are doing all wrong. However, it is very important to make sure the delivered file did not get corrupted, so MD5 in this case is perfect, its fast and extremely reliable for this job.
Richard Smith None of what you said was really relevant to what he said.
ReturnOfTheBlack Sure it was, he is talking about extending the life of md5, however, it still has a long life ahead of it, just not relating to security.
checking for multiple simple hashes may, and usually do, take more time than checking one really secured, not yet breakable, hash, such as SHA2 ou SHA3. Therefore the best option is still, to develop a very secure hash instead of using multiple ones.
using 2 hash algorithms would be worse. an example would be.
hash A has a collision with BOB and FRED both equalling 5.
So you can change bob to fred and noone will know.
if has B has a collision of 5 and 19 both equalling rabbits and on hash A JOHN and NICK both equal 19.
Then you can replace BOB with FRED, JOHN or NICK and noone will know.
Also it will take twice as long, for a non exponential increase in security.
I always thought it was odd to check the file integrity by trusting the hash from the same site that you downloaded the file from.
I mean, if the site is compromised, why would I trust their "checksum"?
7:08 its exacly not the reason why that hash is there. Its there to check if the file downloaded properly
I remember with those download sites, there were some groups that actually did modify the file to make the CRC or later MD5 spell out something interesting in hex. Like the version number or the episode number of the series.
My first experience with hashing was keying in programs on my Commodore 64 from magazines like Compute!'s Gazette, RUN, or Ahoy in the 1980s. They included a program that would start a wedge (a daemon, of sorts) to compute and display a checksum for each line of code entered. Each line of code printed in the magazines ended with a comment containing a checksum for that line. If the checksum appearing on the screen didn't match the one printed in the magazine, I'd know I entered the line of code incorrectly and I'd need to look for mistakes.
That's how we did it back then, and WE LIKED IT! 😉
Just wanted to say that CyanogenMod (and many other Android-flashables) use md5 for their downloads. It works very well for me and has nothing to do with security.
Sometimes, I get a file, download it onto my pc, then try to transfer it to my phone. Then when I flash the file, it doesn't work. Some transfer process I used changed the file along the way. I really don't know why. Maybe it's because many of them are .zip's, and perhaps they want to use a different method of compression or something.
When the hash code is shared next to the download, I can verify the file's fidelity by the time it reaches my phone.
3:45 I hope I'm not the only one who sees something completely different in this animation :D
10 years later and he still looks the same
Isn't the last point about verifying file downloads a little off?
The point isn't that the file downloaded is verified to be the same file as the one put on that site. The point is that a file downloaded from 3rd parties (mirror sites etc) is checked to be the same file as a central master copy that the author has certified.
Surely it is common place for people to be storing passwords with an additional, random, salt nowadays which, combined with the hashing method (sha1, sha2, md5 etc..) makes it a lot more secure than just hashing a password directly?
Yes, it's all about buying time for worst case scenario, everyone is vulnerable (as proved by the recent events at major corporations)
***** Best case scenario you can hash it a few times that it would take years to brute force it.
***** if done properly, and the password is reasonably secure, it should take years to brute force only one password.
If your database has been compromised and you've let customers/end users know, a year SHOULD be plenty of time.. the biggest weakness is any system is the end user..
Yes, a salt should be used, but only in combination with the right hash algorithms (not MD5 or the SHA family)!
A salt prevents the hash being broken using a precomputed hash-password dictionary (a.k.a. "Rainbow Tables"). But on modern hardware, computing hashes such as an MD5 or SHA1 is often faster done directly than having to look it up in a giant file on disk. Attackers probably won't bother using rainbow tables, and salts will therefore not slow them down.
The key is to not use MD5 or SHA. These hash algorithms are designed to be very fast for message authentication, but they are not suitable for hashing passwords!
When hashing passwords you need a very slow and computationally intensive algorithm (such as scrypt or bcrypt), such that it becomes hard to attack using brute force. A salt is then used to mitigate attacks using precomputed hash tables.
@Tom Scott - Actually MD5 HAS been broken with text documents. It's been done with carefully constructed PDFs that say different things and have the same hash. AFAIK it's based on some APPEND attacks on MD5 and the ability of PDFs to overlay stuff.
If I make a hash algorithm in PHP or JS, how do I hide that algorithm securely from users? I could make a kind of secure hash algorithm, but that is useless if everyone can just read the instructions
Thank you Tom and Computerphile!
hash codes on websites offering a download are also used to make sure the download went well and nothing got corrupted (or involuntarily changed by a machine error or noise)
6:46 I think the hash is not used to verify that the file wasn't manipulated, but just to verify that the file is not damaged.
I'm very much looking forward to the next video, as I greatly disagree about password storing using hashes. Obviously not unsalted and not only one iteration of hashing, but still.
I guess you watched the video, but just in case. In here, he says that hashing algorithms should not be used to store passwords. The alternative I would have suggested when I posted this is exactly what he posted in the next video, or the standard HMAC using a unique salt per user as it's also known.
Okay it's been a while since I last had to code passwords into a database but back then md5 was still perfectly acceptable. It's good to know this is no longer the case and web scripting languages have adapted to provide the extra layer of security. So in PHP where I used to use md5() for passwords I shall now use password_hash()
Thanks :)
8:05 Red has won three times and yellow has won two times.
Please let me explain something: Download websites doesn't use MD5 hash verification to guarantee that the file hasn't been changed by a hacker from their server, they know that if a hacker could've did that, then why he couldn't change the hash as well! The hash is only used to check whether there wasn't any network error while downloading the file that flipped or deleted some bits from the file, essentially corrupting it. This is especially important and widely used when downloading OS images (those things are large, take time to download, thus are vulnerable to network corruption) or when downloading files using the Torrent protocol, which downloads the file as chunks then the client glues them together again, so it's a check whether the network or the client didn't miss any piece of the file.
This video was mentioned in a later video so I went looking for this one, but couldn't find it. Now it just appear in my What To Watch list. Weird. Good video. I would like to see something more in-depth, perhaps by numberphile.
When using hashes for file or packet verification, wouldn't using multiple hash types on the same file/packet and comparing all the hash types applied provide much greater reliability? The chances of multiple hash types having overlapping collisions is infinitesimally small with just 2 hash types let alone more.
Thanks for the great videos!
Actually, the use of md5 in webs for verifying files is to be sure that there weren't errors transferring the file, not to be sure that te file you wanted is that one. (i.e: y want to download a .bin with number 50, wich is 00110010 but the first bit has an error on one of the routers and instead I recieve 10110010, translated, 178. So well, now instead of sending them 50 pidgeons, i'll send 178.
Why wasn't I subscribed to this channel before? Love the way you explain.
I love this channel! How did I not find it sooner?
The software or file download that has the hash along with it is actually secure. Provided they sign the hash. That is they run RSA on the hash using the Private key of the company. So, nobody can change the hash. If they should change the hash, they need the private key of the company.
Or share the hash on a company owned HTTPS server while file download could be provided using plain http mirrors.
If the file format supports signing, that is better. Like signing an exe.
***** Agreed. Using HTTPS is a simpler idea, given that it's popular and has many easily available implementations.
I don't think the hash of the file on a download page is a measure against hackers. I always thought it was supposed to be a way to check if there were any loss in downloading the data.
I love these videos, and I love this guy, Great explanations.
actually, the idea of putting the hash file along side the original file is a very good idea, if you are aware of what you are doing.
1- people with such level of security use multiple servers and mirrors, and you should really really get the two files (the hash and the document or program) from different sources/paths. that is internet security 101.
2- mostly the hash is used against man in the middle, hashes does little good if your server is compromised (that is not its intended use). using hashes assumes the source is trusted and the channel is not.
hash could be very useful for example if you are getting the file from a ptp network and you'd like to verify that you got the actual file, not a different one.
and passwords should be encrypted in the database, specifically encrypted with algorithms that have decryption algorithms with key or salt and such. hashing the password in database (with any algorithm, secure or not) forces you to use non secure authentication methods that exposes the password at the client side.
Please, add the numberphile link in the description. I failed to find it easily, had to see the video from scratch again. Thanks.
I was hoping for exactly the details he said would take hours to explain. But then again maybe the mathematics of hashing would fit Numberphile better. Here's to hoping James will tackle this!
I use hash algorithms in image processing to compare images.
If images return the same hash I know they are equal.
Hashes can be very helpful!
I've been seeing all the videos. hope you talk about salting hashing in the future tkx. really loving this videos.
Been waiting for this one :) Great video Brady.
Brady's not actually the one filming and editing; Sean Riley is (description).
4:10 Could this be used for, say instead of changing the name of the next lunar astronaut, which MAY get you on the fast track but probably won't. (after all they are bound to notice you are grossly unqualified for such a mission) but instead manipulate troop moment orders in Pakistan. If I could get 6 or seven armor battalions to suddenly be ordered to the India- Pakistan border, well that's bound to get India to respond, which could begin a chain of events that ends in nuclear war.
Even if it is discovered that it was fake orders that started it, it might go out of control before it could be stopped.
Eric Taylor what are you even saying mate
moog500 I'm talking about making a deliberate hash collision, so that the order seems genuine but it isn't.
2:07 I would not mind listening to him for hours
So, here's a analogy that I thought of, it's like the file is an engine and the hash is a photo of that engine: the hash won't work as an engine and you can't figure out how that engine works, But you will be able to see that the engine is the same as the one that is in the photo (probably).
With the new innovations in compute… how reliable is hashing? I’m not well versed in IT security but it sounds like if computers got too fast for md5… then the AI stuff can probably break any.
Very informative, I was wondering if this also relates to how many schools tjek for fraud? When you turn in a paper, they somehow tjek if it correlates (to much) with some preexisting text, does hashing algorithms play a part in that process?
Hashes would be quite useless in that regard, because changing a single character would make the two papers' hashes completely different. I would assume that they use regular expressions, but I could be wrong.
Good point, I somewhat expect that the method is either very simple, or extremly complicated.. could be cool to see on computerphile.. I suppose that a lot of students are enjoying the channel :)
I forget which game it was, but gaining access to it's protected memory region required generating hash collisions until it overflowed. Once you got access it was just function call to reset the stored hash with your injected code factored in.
This guy is great, can't wait for the followup!
In a few years sha-3 will be the standart. Yeah, still waiting.
let that one slip in @ 4:31. Thanks for the great explanation.
You said that adding the hash code to the website is a bad idea, because if the file is stored on the server and someone's able to change it, then they can easily change the hash code as well, pretending that it's safe.
But what about when you're using P2P downloads, rather than a server?
In a torrent sites, for example, I can see how having the hash come in handy!
Someone could tamper with the links and whatnot, but the end file that i'll get will be different than the hash on the site, thereby revealing that someone messed with the file.
This is really the kind of video I would love more from on Computerphile. :P (Maybe going a bit more into detail... just slightly but not as superficial as some of the other videos on this channel. - That's also why I like Tom Scott btw :p)
Also those "nostalgia" videos about the times, where most of us probably didn't even exist yet or knew what a computer is about are pretty neat. (Forgot the name of this nice older professor)
Just some feedback - That's all. Bye and thanks for your content, Brady
SHA1 officially broken by Google today lol
Question...
Suppose my system stored both the MD5 and SHA1 hashes of an input X. Individually, MD5 and SHA1 are broken. But is it possible to construct a separate input Y which matches both the md5 and sha1 hashes of input X?
I thought hashes for files on websites (like Microsoft Windows ISO images) are used for you to verify that your download did not corrupt the file.
if my md5 key is like a randomly generated string of 3000 characters and numbers, will that highly decrease the chance of something else hash collision it?
Moree of Tom ! These are the best vids
You know, I've been thinking: Why haven't we switched to gallium phosphide for our CPU and GPU yet? I'm well aware that the material is inappropriate for FET's at the moment, that it is expensive, but I don't get why we can't just switch back to BJT's in order to accommodate the new material- surely TTL is adequate for the modern integrated circuit!
PS: I just looked at the pertinent Wikipedia pages more closely, and it turns out they're experimenting with aluminium oxide for this stuff.
You can store passwords just fine with SHA256, as long as you don't just hash the passwords without any algorithmic changes applied to them. What works best, is quite probably storing a password together with a ridicoulusly long cryptographically secure random salt (say for example 1024bit salt).
Woo, thanks for this video. I've been wanting to learn more about this, and I've had trouble getting through some articles on this topic.
Uhm,... Nice spaceship design lol.
I think his use of hash and program verification is off. I've never seen it imply that the program was safe, just that your download wasn't corrupted.
I think the intent of the MD5 hash under a software program was to be used to validate the download was successful, especially back when we were all on dialup.
How about lastpass, how secure is their method of storage & managing passwords
James talked about it on Numberphile video number 1 on 11.11.11.
If checking the hash on the site from which the file was downloaded is useless for security purposes, what is the best source for the hash? What reputable sites provide these hashes?
I disagree that a hash on a download is pointless. It lets you use mirrors that you don't necessarily trust. You have the hash stored on your hosting (since it doesn't use much bandwidth), and then have the actual download hosted on other hosting (mirrors?). Now, even if one of the download sites are hacked, you can always verify the file against the hash on your hosting, which you know is secure.
This explains that lecture I was sitting in on when I visited Cornell.
hey guys thanks for this it has really helped he to understand the hash algorithm but in relation to the public and private keys would this be used in conjunction with the digest to verify digital signatures
You should note that the checksums published by the application/file author (vendor) can be digitally signed so you'd be able to verify nobody changed the checksum on the site.
Checkout the Fedora Linux ISO verification progress here: fedoraproject.org/verify
1.) download ISO
2.) import Fedora public key (GPG)
3.) download the ISO checksum file, verify it via the public key
4.) validate the ISO against the checksum
Yes, the GPG key could in theory be changed, but I'm guessing it's under much higher scrutiny than the checksum files. Also, you fetch it via HTTPS so there's no man-in-the-middle attack vector.