Uses of Information Theory - Computerphile

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 พ.ค. 2024
  • Looking at some real world uses of information theory with Dr Tim Muller
    / computerphile
    / computer_phile
    This video was filmed and edited by Sean Riley.
    Computer Science at the University of Nottingham: bit.ly/nottscomputer
    Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

ความคิดเห็น • 134

  • @Amonimus
    @Amonimus ปีที่แล้ว +240

    "If you use Log 2, you're a computer scientist. If you use Log 10, you're an engineer."

    • @emuccino
      @emuccino ปีที่แล้ว

      If you use natural log, you're a physicist

    • @tpobrienjr
      @tpobrienjr ปีที่แล้ว +18

      what if you use ln?

    • @TheDavidlloydjones
      @TheDavidlloydjones ปีที่แล้ว +23

      @@tpobrienjr space cadet intellectual.

    • @jhbonarius
      @jhbonarius ปีที่แล้ว +11

      @@tpobrienjr then you're a biologist

    • @Kram1032
      @Kram1032 ปีที่แล้ว +22

      @@tpobrienjr mathematician? - unless you study the general thing. Then you'd use base b

  • @Buzzfly
    @Buzzfly ปีที่แล้ว +57

    Sean said that you learn a quarter of the dataset with 1 of the 4 dice results revealed. Later on he points out you learned roughly 2.6 bits of information, out of 10.3. 2.6 divided by 10.3 is 0.252... or almost exactly a quarter. Sounds like Sean was right! Maybe just not expressing the answer in a way that Tim expected.

    • @pirmelephant
      @pirmelephant ปีที่แล้ว +16

      Yes, because case 1: log_2(6*6*6*6) = log_2(6^4) = 4*log_2(6)
      and case 2: log_2(6*6*6) = log_2(6^3) = 3*log_2(6)
      So one the difference is exactly a quarter.

    • @Geezon
      @Geezon ปีที่แล้ว +4

      @@pirmelephant The entropy of the information is not the information itself, you don't learn a quarter of the information.

    • @pirmelephant
      @pirmelephant ปีที่แล้ว +5

      @@Geezon Yes that's true. I just wanted to show that you learn exactly a quarter of the entropy and not a only almost exactly a quarter.

    • @luxhey
      @luxhey ปีที่แล้ว +11

      @@pirmelephant Still not true.
      Every increment of one bit doubles the amount of information. So 2.6 bits is not 0.252% of 10.3 bits. Remember, there was a log calculation in there, so linearity is out the window.

    • @scottrothbaum8296
      @scottrothbaum8296 ปีที่แล้ว +3

      ​@@luxhey When told the value of the first die, you do in fact learn 25% of the information. Bits of entropy = information. The number of possibilities isn't the information. If I told you that an mp4 of Back to the Future was 2 GB, and I gave you a 500 MB sample equating to the first 1/4 of the movie, you do in fact have 1/4 of the information.
      The number of possible arrangements of the bits went from 2^(2* 10^9) to 2^(5*10^6) when I gave you the first 1/4 of the movie, but your reasoning would suggest that we learned basically none of the information

  • @Richardincancale
    @Richardincancale ปีที่แล้ว +24

    We owe so much to the work of Turing and Shannon! In the satellite world discussion of the Shannon limit of information throughout is part of our daily life in optimising power, bandwidth and error correction codes!

  • @macemoth
    @macemoth ปีที่แล้ว +4

    Having a coffee machine on his own desk is truly a proof that Dr Tim is a very smart man.

  • @recklessroges
    @recklessroges ปีที่แล้ว

    My favourite Dr Muller video so far! Really helps me with designing some solid k-anonymous data sets.

  • @AbelShields
    @AbelShields ปีที่แล้ว +12

    9:15 you say "that's not quite right" that you've learned a quarter of the information but then go on to say that the entropy before is log2(6*6*6*6)=4log2(6), after the entropy is log2(6*6*6)=3log2(6), so I'd argue its kinda right... Unless I'm missing something?

    • @thishandleistaken.
      @thishandleistaken. ปีที่แล้ว

      Good point. I'd like someone smarter to clarify

    • @Geezon
      @Geezon ปีที่แล้ว +6

      @@thishandleistaken. The point was that it wasn't a set of 4 reduced to 3 by learning one of the dice values, it was a set of 1296 (all possible outcomes) reduced to 216 (possible outcomes for the other 3 dice). 1080 possibilities being removed is not 25% of 1296 it's 5/6th or ~83%.

    • @luxhey
      @luxhey ปีที่แล้ว +1

      @@thishandleistaken. Every increment of one bit doubles the amount of information. So 2.6 bits is not 0.252% of 10.3 bits. Remember, there was a log calculation in there, so linearity is out the window.

    • @tomaz2007
      @tomaz2007 ปีที่แล้ว +1

      "A quarter of entropy" is right, but that's not a quarter of information. Think of it like this: how much time does it take to brute-force a 6 bit password compared to an 8 bit one? It's not 25% less, it's 4x less.

  • @electrictommichael
    @electrictommichael ปีที่แล้ว +2

    You can massively increase the entropy of the four words if you use more than one language or use names and brands and not just dictionary words

  • @Mutual_Information
    @Mutual_Information ปีที่แล้ว +8

    👇Useful intuition from 9:20
    The information of an observation is the (log of) the factor by which the space of possibilities was reduced upon seeing that observation.

  • @wernercoetzee2316
    @wernercoetzee2316 ปีที่แล้ว +23

    I have the exact same coffee machine, the auto off is very annoying, especially if it switches off just as you walk up to it to make a cup

    • @TomTheDutchy
      @TomTheDutchy ปีที่แล้ว

      You can change the duration it stays on before turning off

    • @MichaelKingsfordGray
      @MichaelKingsfordGray ปีที่แล้ว

      Drug addict.

    • @VRchitecture
      @VRchitecture หลายเดือนก่อน

      @@TomTheDutchyIt isn’t about the settings I guess but the life itself being ill-timed sometime 🤷🏻‍♂️

  • @Yupppi
    @Yupppi ปีที่แล้ว +5

    Question: when you select your "random" words as a human and pick the xkcd method of four words, how does the hacker algorithm know that it's a human chosen so it limits the words to what words you know/would use/the wildest combination of words that have never before seen connections less than random? How does cracking the password get faster in practice for the hacker? How does he know it's words over random characters and four words of specific length to combine? Can he snoop the size of the password? I know the cracker can just try to combine every possible 4 words, but does that make it more plausible to crack the password in real life? I understand that in math theory they are not equivalent to this video, but does your password get cracked because of this? Or does some website actually not protect the passwords adequately and they just get leaked instead of cracked?

    • @ninopangolino1762
      @ninopangolino1762 ปีที่แล้ว +6

      Actually, the hacker doesn't know that the password was chosen from human words. But, because brute forcing all the possible length/characters is so expensive (in terms of bits of work), the hacker might begin with a dictionary attack, then use some modifications of that same attack, like combinations of words, and finally pure brute force, like AAA, AAB and so on. So, trying to answer your question, it is faster because you try the word guessing technique before classical brute force attack.
      Of course, a little bit of social engineering might help...

    • @mariushusejacobsen3221
      @mariushusejacobsen3221 ปีที่แล้ว +2

      The diceware method uses a very long list of words. This list is very long when compared with the number of words that a person can think of just off the top of their head. Thus a shorter list of more common words will suffice - or starting with only those more common words before trying the rest of the list, which in such a case will hit a match quicker.
      This all assumes the least one cracker using the assumption that diceware was used for generating the password.

  • @severoon
    @severoon ปีที่แล้ว +2

    Is there some kind of measure of information utility that would allow us to mathematically distinguish between the two cases "we know die 1" and "we know dice 1 and 2 are the same"?

    • @entropie-3622
      @entropie-3622 ปีที่แล้ว +1

      Not a specialist in the field but you could look at the entropy of projections of the distribution onto sub-selections of dice.
      In practice this could mean having a lower bound for the entropy of each individual dice, another bound for the distributions of each pair of dice, another for each triplet etc.
      In the given example if you know dice 1 the distribution of that dice will have no entropy.
      If, however, you only know that dice 1 and 2 have the same result the individual dice would not see an entropy reduction at all.
      In both cases the pair of dice 1 and 2 together will be subject of an entropy reduction by 1/2.
      So using that metric the 2nd distribution would perform better than the first distribution.

  • @LukeVilent
    @LukeVilent ปีที่แล้ว

    What's your name? - Same as grandpa's. - And what's your grandpa's name? - Same as mine!

  • @nicholaswillhite5888
    @nicholaswillhite5888 ปีที่แล้ว +2

    How much does misspelled words increase entropy?

    • @MichaelKingsfordGray
      @MichaelKingsfordGray ปีที่แล้ว

      Not very much.
      Algorithms are adept at shuffling character order.

  • @bixnqq
    @bixnqq ปีที่แล้ว

    What if I use words for password from different laguages? Would it somehow increase the strenght?

  • @nHans
    @nHans ปีที่แล้ว +4

    Glad to hear you got better pens for future videos. The scratching noise doesn't bother me quite as much as the lack of legibility. As others too have pointed out, blue ink from a dried-out pen on blue-ruled paper is a bit hard to read.
    Do you get black Sharpies in the UK? I'm a fan-and believe me, it's _not_ because of Trump!

  • @alexandrecuchi2400
    @alexandrecuchi2400 ปีที่แล้ว

    Ok, but just as comparation, have all diferent values. How that change the possibilitys? I will try to calculate that, but! I'm still want to know if i'm right

  • @swap3034
    @swap3034 ปีที่แล้ว

    fanstastic thanks made my day

  • @nahco3994
    @nahco3994 ปีที่แล้ว +3

    What about increasing the number of languages used in the passphrase? Wouldn't that drastically increase the entropy again?

    • @entropie-3622
      @entropie-3622 ปีที่แล้ว

      If you know as many words in a second language using both would increase the entropy by 1 bit per word, I would not consider that a drastic improvement.
      It should be considered that the 11 bit per word entropy already assumes that you are only working with about 2000 possible words anyways so you could also just increase the selection of English words.
      At the end of the day the limiting factor is the given person's capacity to remember a password well, even though the words were chosen at random.
      The strategy of building a small story as a mnemonic device might suffer a bit, if the words randomly switch languages.

    • @nahco3994
      @nahco3994 ปีที่แล้ว

      @@entropie-3622 How many languages are there on this planet? How would an attacker know which languages are being used? Do I have to know a language in order to use a word in a passphrase? Let's say my passphrase consists of 5 words, from 5 different languages. So which word lists do you pick?

    • @entropie-3622
      @entropie-3622 ปีที่แล้ว

      @@nahco3994 Obviously if you could use any language that exists it will increase entropy quite a lot but for most people it will decrease their capacity to memorize the password quite a lot as well.
      If you just want a secure password not caring about the ability to memorize it (usually because you are using a password manager) it will always be best to just use a completely random string of characters.
      If you care about the ability to memorize the password most people will be restricted to 1 or maybe 2 foreign languages that they know well enough.
      Relying on the idea that the attacker will not be able to guess the used languages is usually not a good approach to security. Generally a password should be chosen in a way so that it is secure even if the attacker knows the distribution that was used to create it.

  • @styleisaweapon
    @styleisaweapon ปีที่แล้ว +4

    Cant you post-process some edge sharpening on the blue channel to help with the pen?

    • @Computerphile
      @Computerphile  ปีที่แล้ว +5

      I did have a play with that but the papers stripes were extremely close to the blue and by the time I isolated it his arms/hands looked a very strange colour indeed. Decided to live with it and buy some new pens for future! Apologies for how terribly feint they are, originally bought them because they seemed quieter than other pens and we get complaints about pen squeak -Sean

  • @nHans
    @nHans ปีที่แล้ว +1

    So you can derive Information Theory from Probability Theory by taking the log to base 2? 🤔

  • @TomeczekH
    @TomeczekH ปีที่แล้ว +1

    The paper looks like musical staff but then I noticed it has 6 lines too close to each other. What is this paper for?

    • @MetPhotoDotNet
      @MetPhotoDotNet ปีที่แล้ว +3

      This is traditional computing printer paper. It has been around for 50 years or more. It helps someone to read long text prints, in the past often computer code print outs.

    • @MichaelKingsfordGray
      @MichaelKingsfordGray ปีที่แล้ว

      "Eyeline" tractor-feed paper.

  • @drooplug
    @drooplug ปีที่แล้ว

    Can someone explain why the xkcd format is 11 bits per word?

    • @entropie-3622
      @entropie-3622 ปีที่แล้ว +1

      I would guess that there is an assumption that there are about 2000 "common English words" to choose from uniformly at random.

  • @ivanontech9156
    @ivanontech9156 ปีที่แล้ว +6

    ‘’Courage taught me no matter how bad a crisis gets ... any sound investment will eventually pay off."

  • @alanegewarth3456
    @alanegewarth3456 ปีที่แล้ว

    I propose anyone who can to solve these 3 chess problems problems:
    1 - Is it possible in a chess engine, like STOCKFISH NNUE or Google's ALPHA ZERO to translate their machine learned parameters into a HUMAN INTELIGIBLE ALGORITHM (as simplified as it can get) on how to best evaluate a chess position/choose the best move?
    2 - Can you mathmatically prove that there's a maximum number of parameters a chess engine must have to play a perfect game (always play the move that leads to the faster check mate OR maintains a drawing position ( or NOT LOSING NOR WINNING within the range of the next 50 moves, as todays chess rules)?
    3- If there's such a maximum number of parameter, can you find it?
    I'm a mediocre chess player and inthusiast, so that would be nice if you guys made a video about those questions. Thank you in advance!

    • @charon7320
      @charon7320 ปีที่แล้ว

      james grime on the sister channel numberphile, has a video on chess games. thing is that you have more posibile moves as the games goes further, until you hit a maximum and number starts to drop. theoretically, all posibile moves can be simulated, programmed, and chosen best move in any case based on 'weights' as in language models.

  • @MushookieMan
    @MushookieMan ปีที่แล้ว +5

    Intuitively, conditional entropy appears when you are solving a sudoku or a minesweeper game.

  • @4guysandacontroller
    @4guysandacontroller ปีที่แล้ว +1

    I don't get it, but it sounds neat

  • @jhbonarius
    @jhbonarius ปีที่แล้ว +3

    My Magnifica S also startles people when it auto cleans.
    Anyhow, even after doing my phd in signal processing in the edge of information theory (using stiff nonlinear ODEs in particle filters), I still find it very confusing. It requires such a different way of thinking.

    • @LetoDK
      @LetoDK ปีที่แล้ว +1

      It's pretty simply actually: if the people aren't used to the behaviour of your coffee machine, the sudden onset of sound can come as a surprise. Doesn't really take a PhD to understand, but whatever

  • @briankuhns9769
    @briankuhns9769 ปีที่แล้ว +2

    It was 1/4 of the entropy. why did the guy correct him?

    • @luxhey
      @luxhey ปีที่แล้ว +1

      It wasn't though. For every increment of 1 bit, the amount of information doubles. Remember we did a log calculation, so linearity is gone.

  • @thishandleistaken.
    @thishandleistaken. ปีที่แล้ว

    Very relevant, I was studying information theory

  • @froop2393
    @froop2393 ปีที่แล้ว +3

    xkcd 538 shows us that strong passwords don't help in all circumstances.
    Classic example of Theory != Real World

  • @tarebf
    @tarebf ปีที่แล้ว +1

    His conclusion is incorrect because not every disease is equally probable and thus the information gain can not be compared to dice rolling. But very interesting nonetheless.

  • @meguellatiyounes8659
    @meguellatiyounes8659 ปีที่แล้ว +1

    How package managers solve dependencies?

    • @code-dredd
      @code-dredd ปีที่แล้ว

      Direct Acyclic Graphs

    • @meguellatiyounes8659
      @meguellatiyounes8659 ปีที่แล้ว

      @@code-dredd thanks alot,
      i just wanted videos for it.

    • @code-dredd
      @code-dredd ปีที่แล้ว +1

      @@meguellatiyounes8659 There's at least one in this channel, but you have to search. Might need to look at Git videos, which rely on the same data structure.

  • @zyswies3612
    @zyswies3612 ปีที่แล้ว

    KL divergence my friend

  • @seaveredge
    @seaveredge ปีที่แล้ว

    I was thinking, hmm familiar accent… apparently he did his master at the same university as I’m currently working, haha.

  • @Lion_McLionhead
    @Lion_McLionhead ปีที่แล้ว

    This isn't the running channel.

  • @irwainnornossa4605
    @irwainnornossa4605 ปีที่แล้ว

    0 belongs before 1, not after 9.

  • @sveu3pm
    @sveu3pm ปีที่แล้ว

    if you calculate entropy in n bits, then bit is 2 options, so its 2 on n combinations, not your set size on n.

  • @roberttalada5196
    @roberttalada5196 ปีที่แล้ว

    My passwords are made up of only invisible Unicode control characters

  • @YouPlague
    @YouPlague ปีที่แล้ว

    permutations

  • @Geezon
    @Geezon ปีที่แล้ว

    For those questioning why it isn't 25% learned from knowing one die. The question isn't what is the entropy of the information learned.
    Knowing one die isn't 1/4 of the information, it removes 5/6 of the total possibilities for 4 dice rolls.
    You learned 1/6 of the possible values for the one die but the other 3 dice have 216 combinations, from the set of 1296 possibilities of all 4 dice.
    So you learned about 83% - by discarding 1080 possibilities of the 4 dice.

    • @entropie-3622
      @entropie-3622 ปีที่แล้ว

      "Information" in this context is the same as entropy, no one uses the term "information" to describe "number of possibilities".
      It seems to just have been an oversight of the lecturer trying to steer the conversation to the "number of possibilities" view of things somewhat disregarding Brady's correct intuition.

  • @timothynash4585
    @timothynash4585 ปีที่แล้ว

    Crypto wallet recovery phrases like Metamask are a whole bunch of random words together

  • @JohnSmith-pu8rd
    @JohnSmith-pu8rd ปีที่แล้ว

    Nf3

  • @theshuman100
    @theshuman100 ปีที่แล้ว

    you can acquire information.
    but it may not be useful

  • @thishandleistaken.
    @thishandleistaken. ปีที่แล้ว +3

    This makes me wonder: Is information theory all about entropy? Is there anything else to learn in this field?

    • @code-dredd
      @code-dredd ปีที่แล้ว

      We don't know what we don't know. It wasn't until the recent discoveries in modern cryptography that a new application for prime numbers was found, since the difficulty of trying to find prime factors lies at the foundation.

  • @weksauce
    @weksauce ปีที่แล้ว +1

    Sads me that likes were 128 and dislikes were 2, and I would have to un-power-of-two them to like this video :(

  • @MikeHunt-rw4gf
    @MikeHunt-rw4gf ปีที่แล้ว

    Algorithm.

  • @odysseus9672
    @odysseus9672 ปีที่แล้ว +1

    The password doesn't have entropy. The method for generating the password had a lot of entropy.

  • @segmentationviolator
    @segmentationviolator ปีที่แล้ว +7

    The video is 3 minutes old but there is a comment 4 minutes old. Sus.

    • @intron9
      @intron9 ปีที่แล้ว +1

      sussy

    • @blazerorb
      @blazerorb ปีที่แล้ว

      Google has negative latency, dontcha know

  • @Jkauppa
    @Jkauppa ปีที่แล้ว

    how much information is in consecutive uniform numbers, ie, sum of log (1->n)

    • @Jkauppa
      @Jkauppa ปีที่แล้ว

      can you derive the real gamma function from the sum of discrete (integer) logarithm sequence

    • @Jkauppa
      @Jkauppa ปีที่แล้ว

      but the left side p_i is always the same (1), but right side is always +1 to the previous

    • @Jkauppa
      @Jkauppa ปีที่แล้ว

      what is that measurement then

    • @Jkauppa
      @Jkauppa ปีที่แล้ว

      what is log(x+y)=log(x)+?, or log(n+1)=log(n)+?

    • @Jkauppa
      @Jkauppa ปีที่แล้ว

      then the logsum will be something of the form of n*log(1) + (n-1)*log(?) + .... as +1 composite

  • @likebot.
    @likebot. ปีที่แล้ว +1

    My password is WomanManCameraTV

    • @likebot.
      @likebot. ปีที่แล้ว

      ... and the last 5 digits of Pi.

    • @strehlow
      @strehlow ปีที่แล้ว

      Mine is *********************

  • @Ziferten
    @Ziferten ปีที่แล้ว

    Guys seriously buy some new pe.... oh.

  • @dialecticalmonist3405
    @dialecticalmonist3405 ปีที่แล้ว

    He didn't say whether the die were 6-sided. Why would he assume they were 6-sided?

  • @biomatrix8154
    @biomatrix8154 ปีที่แล้ว

    There r actually 95 different password enterable chars (including the space).

  • @JavadZahiri
    @JavadZahiri ปีที่แล้ว +1

    Please don't use these type of papers again. They are very unreadable.

  • @c-LAW
    @c-LAW ปีที่แล้ว +2

    Entropy = nonsense

    • @HapChandler876
      @HapChandler876 ปีที่แล้ว +1

      What do you mean?

    • @nHans
      @nHans ปีที่แล้ว +2

      ​@@HapChandler876 Meaning, they failed their thermodynamics and/or information theory course! 🤣

    • @dembro27
      @dembro27 ปีที่แล้ว +1

      @@HapChandler876 Perhaps "nonsense" in the sense that entropy represents possibility space, which could be equated to noise; useless information; nonsense.

  • @shreeek7086
    @shreeek7086 ปีที่แล้ว

    4th :O

  • @shreeek7086
    @shreeek7086 ปีที่แล้ว +1

    First ✌

  • @shreeek7086
    @shreeek7086 ปีที่แล้ว

    Second too lol

  • @shreeek7086
    @shreeek7086 ปีที่แล้ว

    3rd 😴

  • @shreeek7086
    @shreeek7086 ปีที่แล้ว

    5th!!! Bye