This was exactly the baby step I needed to get me on my way with entropy. Far too many people try to explain it by going straight to the equation. There's no intuition in that. Brilliant explanation. I finally understand it.
@@SerranoAcademy Repetition (redundancy) is dual to variation -- music. Certainty is dual to uncertainty -- the Heisenberg certainty/uncertainty principle. Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics. Randomness (entropy) is dual to order (predictability) -- "Always two there are" -- Yoda.
how does one make something so complicated into something so intuitive that others can finally see the picture. your explanation itself is an amazing feat.
Excellent explanation, very clear and concise! I have always pondered the significance of the log in cross-entropy loss function. The explanation (particularly: "products are small and volatile, sums are good") completely clears this up.
I have been scared of delving into entropy in detail for so long because the first time I studied it, it wasn’t a good experience. All I want to say is THANK YOU!!!!!! I should have been supplementing the udacity ND lesson videos with these since the beginning.
Actually, there is something wrong here. the entropy and information in information theory are representing the same thing which is how much information we will get after decoding the random message, so in case of the balls in the box if all are the same color we have no information after decoding the message as its probability to be red =1 hence low entropy and low information.
I'm studying Decision Tree (Machine Learning Algorithm) and it uses Entropy to efficiently build the tree. I finally understand the details. Thank you!!
Good video! Minor correction of calculations: at 5:50, the probability of getting the same configuration is 0.25. This is because there are only 4 possible configurations of the balls (there is only one blue ball, and only four slots, so only 4 places the blue ball can be). This can also be calculated by selecting red balls first multiplying 0.75 * 0.66667 * 0.5 = 0.25. Similarly, at 6:58, the probability is 1/6 because there are 6 possible configurations. We can calculate the probability by multiplying (2/4) * (1/3) = (2/12) = (1/6) ~= 0.166667.
I find sum(p*log(p^-1)) more intuitive. Inverse p (i.e. 1/P) is the ratio of total samples to this sample. If you ask perfect questions you'll ask log(1/p) questions. Entropy is then the sum of these values, each multiplied by the probability of each, which is how much it contributes to the total entropy.
You made a mistake/approximation by saying the entropy is equal to the number of question needed to be asked in order to find out which letter it is. If I do a scenario with only three letters, all equiprobable, the entropy is about 1.59 but the average number of question needed to find out the correct letter is about 1.66. Your presentation gives a great way to gain an intuitive feeling about the entropy, but maybe you should include a small disclaimer on this point.
Thanks for the relationship between knowledge and entropy, that was very helpful. Your explanation of statistics is also good! Though, I am only half way through the video at this point, I will finish it! Thanks
In the last minute of the video, he explains that using Log base 2 corresponds to the level of a decision tree, which is the number of questions you'd have to ask to determine a value.
Syntropy is dual to increasing entropy -- The 4th law of thermodynamics! Thesis is dual to anti-thesis -- The time independent Hegelian dialectic. Schrodinger's cat: Alive (thesis, being) is dual to not alive (anti-thesis, non being) -- Hegel's cat. Syntropy is the process of optimizing your predictions to track targets or teleological physics. Teleological physics (syntropy) is dual to non teleological physics (entropy, information).
hi Luis, nice to meet you, I am reading the book of Deep learning of Ian Godfellow, and I needed to view your video for understand the chapter, 3.13 information theory. thanks very much.
Wow, thank you, man. I needed that information! There are many ways to teach the same stuff! That number of question stuff is great! It's good to have more than one way to measure something!
Although a good description of informatic entropy, the analogy used at the beginning of a phase change doesn't describe thermodynamic entropy very well. The reason why ice melting constitutes an increase in entropy in this case is because it is in an open thermodynamic system with its environment. Heat has been transferred from the room (a closed system) to the ice. It is this irreversible movement of heat from the room that constitutes the increase in entropy since the average temperature of the room and the ice has decreased and will continue decreasing until it reaches a stable equilibrium. Indeed, we would not arrive at gas if there was not sufficient potential energy in the room. While Boltzmann entropy is similar, the similarity lies in the fact that this transfer of heat understood on a macro level is translation of the probability of this energy distribution on a micro level. Entropy is then a measure of the extent to which the particles are in a probable microstate.
This is the best explanation I have come across for a long time, Can you please answer how can we use entropy to find the uncertainty of a naive Bayesian classifier with let's say 4 feature variables and a binomial class variable?
Hi. Thanks a million times for simplifying a very complicated topic. Kindly find time n post a simplified tutorial on mcmc.... I am overwhelmed by your unique communication skills. Markov chain Monte Carlo. God bless you.
I know this may be easier for others to understand, but could you show an explanation of the actual symbols of this formula and show an example of numbers plugged in to see which numbers go where. I am not familiar with Log other than it's related to exponents. The minus aspect of it is also unfamiliar.
552 secs sequencing and entropy, milk that is perfectly "random in the coffe vs. seperated milk and coffee. remember the averge number is the hardest to get due to movement or variance. so the average person is the hardest thing to be.
Hi Luis, Thanks for your explanation. I guess you're wrong in minute 6.29 and 7.41. I think P winning for P bucket 1 should be 0, since there were no Blue Balls in the bucket as expected outcome of the game. should be R, R, R, B. Am I right?
This can be taught to a 9 year old kid (consider he/she needs to understand for basic math operation, the log2 part is can be explained in another video :D ). I mean excellent explanation!
Don't we have to match the sequence that we started the game (RRRB)? If so, 4 red balls would give 1*1*1*0 because there isn't a blue ball in that bucket?
Luis, Thank you so much for this brilliant elucidation of information theory & entropy. Merely as an avocation, I have been toying around with a pet evolutionary theory about belief systems and societies. In order to test it - if that is even possible - I felt I needed to develop some sort of computer program as a model. Since I have very little programming experience and only mediocre math skills, I have been teaching myself both (with a lot of help from the web). It was purely by accident that I stumbled upon Claude Shannon and information theory, and I immediately became fascinated with the topic, and have a hunch that it may somehow be relevant to my own research. Regardless, I am now interested in it for its own sake. I had a an ephemeral understanding of how all the facets (probability, logs, choices, etc.) were all related mathematically, but it wasn't until after watching your video that I believe I fully grok the concept. At one point early on, I found myself shouting, "if he brings up yes/no questions, I know I understand this!" And then you did. It was such a wonderful moment for someone who finds math so challenging, and it is greatly appreciated! I shall check out your other videos later. You're a very good teacher!
Fácil de compreender por causa da excelente explanação. Parabéns pelo vídeo e muito obrigado por compartilhar. / Easy to think because of the excellent explanation. Congratulations on the video and thank you very much for sharing.
Question about information, entropy and the similarity to observer effect and collapse of wave-function. I like your description of information at 2:30. "information = How much do I know about the ball I am picking?". I hold it in my hand, not looking at it but I have informaiton about it based on entropy. But how much do I know about the ball when I open my hand and LOOK at it?obviously when I actually observe it, then I know ALL about it and this is maximum information so the entropy becomes zero? (I also recall that entropy is a measure of 'lack of information'.) So is this process of 'observation leading to zero entropy' valid? and if not, why not, and if so, is it linked in any way to an observation in quantum physics causing a wave function to collapse?
It's like a wave function collapse. You don't know what will happen in the future, that's what probably is for. Once you know what happened in the past, it is solidified. All possibilities must collapse when the present passes over it. The exception could be Schrodinger's Cat. But Schrodinger used that as an example of a stupid argument, so may not be that useful outside of quantum.
Great vid, I definately learned something. It could be improved by shortening it a bit by not going so deeply into the details of the "dummy" examples. Like, for the example with four red balls at 6:25, just say the probability of guessing all four balls correct is 100. No one needs an explanation of why that is. Another example is with the tree of questions. Saying "now, only one more question is needed" at 16:23 is enough, we don't need to go into details about those last questions and their outcome. But, the aproach was very nice, I liked it. Just some tips for improvement for future vids.
As you said the game is to have red,red,red,blue. but for first case we have all reds. hence probability of winning in first case should be 1*1*1*0(prob of blue in first bucket). How did you calculate (1*1*1*1). please explain?
Excellent job, Luis! Plain and simple: the log base 2 gives the number of bifurcations to arrive at the answer, and the probability of the answer serves to temper down the chaos introduced into the system by very rare events. Genius!
This was exactly the baby step I needed to get me on my way with entropy. Far too many people try to explain it by going straight to the equation. There's no intuition in that. Brilliant explanation. I finally understand it.
Sean Walsh feel the same way.
With great knowledge comes low entropy
Hahaaa, love it!!!
lol
@@SerranoAcademy Repetition (redundancy) is dual to variation -- music.
Certainty is dual to uncertainty -- the Heisenberg certainty/uncertainty principle.
Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics.
Randomness (entropy) is dual to order (predictability) -- "Always two there are" -- Yoda.
And low entropy is easier to rig
You win
how does one make something so complicated into something so intuitive that others can finally see the picture. your explanation itself is an amazing feat.
Luis, you are such an incredibly gifted teacher and so meticulous in your explanations. Thank you for your hard work.
an amazing teacher is an invaluable thing
Excellent explanation, very clear and concise! I have always pondered the significance of the log in cross-entropy loss function. The explanation (particularly: "products are small and volatile, sums are good") completely clears this up.
I have been scared of delving into entropy in detail for so long because the first time I studied it, it wasn’t a good experience. All I want to say is THANK YOU!!!!!! I should have been supplementing the udacity ND lesson videos with these since the beginning.
Great work. Compared to my textbook you explained it 100 times better, Thank you.
Actually, there is something wrong here. the entropy and information in information theory are representing the same thing which is how much information we will get after decoding the random message, so in case of the balls in the box if all are the same color we have no information after decoding the message as its probability to be red =1 hence low entropy and low information.
i wish i had this lecture during college examination.....still it's nice to finally understand the intuition behind the formulas i already knew.
Teaching should be like this, from practice to theory - no the other way around!
I'm studying Decision Tree (Machine Learning Algorithm) and it uses Entropy to efficiently build the tree. I finally understand the details. Thank you!!
Excellent! Great explanation. Enjoyable video (except YT’s endless, annoying ads). Thank you for composing and posting.
At 13:44 it's not 0.000488 but 0.00006103515 ! There is a computation error. The entropy is correct, 1.75.
Thank you for the correction! Yes, you're right.
Thank you so much. This was the only video in youtube that clarified all my doubts regarding the topic of entropy.
Good video! Minor correction of calculations: at 5:50, the probability of getting the same configuration is 0.25. This is because there are only 4 possible configurations of the balls (there is only one blue ball, and only four slots, so only 4 places the blue ball can be). This can also be calculated by selecting red balls first multiplying 0.75 * 0.66667 * 0.5 = 0.25.
Similarly, at 6:58, the probability is 1/6 because there are 6 possible configurations. We can calculate the probability by multiplying (2/4) * (1/3) = (2/12) = (1/6) ~= 0.166667.
I find sum(p*log(p^-1)) more intuitive.
Inverse p (i.e. 1/P) is the ratio of total samples to this sample. If you ask perfect questions you'll ask log(1/p) questions. Entropy is then the sum of these values, each multiplied by the probability of each, which is how much it contributes to the total entropy.
This was one of the best explanations on entropy. Thanks
2nd time I found this video and loved it both times. Much better description than the prof at the uni I am at!!!
You made a mistake/approximation by saying the entropy is equal to the number of question needed to be asked in order to find out which letter it is. If I do a scenario with only three letters, all equiprobable, the entropy is about 1.59 but the average number of question needed to find out the correct letter is about 1.66.
Your presentation gives a great way to gain an intuitive feeling about the entropy, but maybe you should include a small disclaimer on this point.
Luis, You have a great way of explaining. At times , I like your videos more than even some highly rated professors
Easy and Great explanation! Thank you very much, Luis
Gracias. Muito claro Senhor. I have been struggling to wrap my head around this and you just made it easy. Thank you.
Thanks for the relationship between knowledge and entropy, that was very helpful. Your explanation of statistics is also good! Though, I am only half way through the video at this point, I will finish it!
Thanks
Great video! Now I understand what Claude Shannon discovered and how useful and essential maths are in Computer Science.
Great clarity. Have never got this idea about the Shannon Entropy. Thank you. Great work!
Wow! Awesome, so books and encyclopedias and biographies of Shannon to understand what you just clearly explained! Thank You!
Can you make a part 2 with the full proof, not just the intuition behind the formula? Your explanation's amazing & would love to see a part 2.
I love the explanation of the negative sign in the Entropy Equation many people wonder
I needed this video to get me up to speed on entropy. Great job Luis!
Thank you for excellent explanation of entropy concept first... Then reach to final equation step-by-step it is really good and simple way
What a great explanation ! I wish I had a teacher like you Luis, everything wold be way easier ! Thanks a lot
Confession: I was a math kiddy; I know to use it but I often missed the deeper meaning and intuition. Your videos are turning me into a math hacker.
Great explanation. But I think what’s still missing is an explanation of why we use log base 2....didn’t quite get that
In the last minute of the video, he explains that using Log base 2 corresponds to the level of a decision tree, which is the number of questions you'd have to ask to determine a value.
your explain is perfect. Even though I am not good at listening english. I can understand everything :)
It's very helpful for me to introduce the concept of entropy to students. Thank you for your clear presentation of entropy.
The best explanation about Shannon entropy that I have ever heard. Thanks!
This video is helping to keep me floating in my Data Science course; thank you so much for your time!
Syntropy is dual to increasing entropy -- The 4th law of thermodynamics!
Thesis is dual to anti-thesis -- The time independent Hegelian dialectic.
Schrodinger's cat: Alive (thesis, being) is dual to not alive (anti-thesis, non being) -- Hegel's cat.
Syntropy is the process of optimizing your predictions to track targets or teleological physics.
Teleological physics (syntropy) is dual to non teleological physics (entropy, information).
hi Luis, nice to meet you, I am reading the book of Deep learning of Ian Godfellow, and I needed to view your video for understand the chapter, 3.13 information theory. thanks very much.
Thanks a lot Luis, just had an exam about this Wednesday and your video helped me a lot to understand the whole concept.
Luis, you really are a great communicator. Looking forward to your other explanations.
Wow ..... I wish more people could teach like you this is so insightful
You are the best.Such a great explanation.Better than lots of text books.
I think that the Huffman compression that you use and the end of the video is near the entropy value but not exactly the same
Excellent presentation for an otherwise complex concept.
Hola Luis,
estupendo, espectacular, excelente!
Wow, thank you, man.
I needed that information!
There are many ways to teach the same stuff!
That number of question stuff is great! It's good to have more than one way to measure something!
Very good explanation - hope to hear more of your videos
Although a good description of informatic entropy, the analogy used at the beginning of a phase change doesn't describe thermodynamic entropy very well. The reason why ice melting constitutes an increase in entropy in this case is because it is in an open thermodynamic system with its environment. Heat has been transferred from the room (a closed system) to the ice. It is this irreversible movement of heat from the room that constitutes the increase in entropy since the average temperature of the room and the ice has decreased and will continue decreasing until it reaches a stable equilibrium. Indeed, we would not arrive at gas if there was not sufficient potential energy in the room. While Boltzmann entropy is similar, the similarity lies in the fact that this transfer of heat understood on a macro level is translation of the probability of this energy distribution on a micro level. Entropy is then a measure of the extent to which the particles are in a probable microstate.
At 13:34 the product does not equal 0.000488. It is approximately 0.000061035. You are missing the last 1/8 factor.
very lucid explanation - excellent, intuitive build-up to Shannon's theorem from scratch
So, after watching the video, the entropy for giving you thumbs up and subcribing to your channel was 0 - i.e. great explanation!
This is the best explanation I have come across for a long time, Can you please answer how can we use entropy to find the uncertainty of a naive Bayesian classifier with let's say 4 feature variables and a binomial class variable?
Thank you so much for a such a easy explanation...respect from india...
Really, you have given us outstanding information.
I watched it straight through. Very good.
Best explanation I found so far
Brilliant lecture! I learn so much with this explanation. Thanks from Brazil :)
Its always hard to understand the equations but u made it so simple :-)
Hi.
Thanks a million times for simplifying a very complicated topic.
Kindly find time n post a simplified tutorial on mcmc....
I am overwhelmed by your unique communication skills.
Markov chain Monte Carlo.
God bless you.
Help us smash Markov chain Monte Carlo
That moment when you realize you don't need to search for another video because you got it from the first time.
What I'm trying to say is Thank You!
I know this may be easier for others to understand, but could you show an explanation of the actual symbols of this formula and show an example of numbers plugged in to see which numbers go where. I am not familiar with Log other than it's related to exponents. The minus aspect of it is also unfamiliar.
Very nice video. Insightful, inutuitive and very well explained. Thank you!
Easy and excellent explain, Please do for loss and cost function as well (convex)
552 secs sequencing and entropy, milk that is perfectly "random in the coffe vs. seperated milk and coffee. remember the averge number is the hardest to get due to movement or variance.
so the average person is the hardest thing to be.
Great video!! Thank You. Would be great to add some explanation for information gain (as for example used for feature selection)
Hi Luis, Thanks for your explanation. I guess you're wrong in minute 6.29 and 7.41. I think P winning for P bucket 1 should be 0, since there were no Blue Balls in the bucket as expected outcome of the game. should be R, R, R, B.
Am I right?
This can be taught to a 9 year old kid (consider he/she needs to understand for basic math operation, the log2 part is can be explained in another video :D ). I mean excellent explanation!
Very clever explanation of mighty ENTROPY.
Great Video! I really liked the intuitive approach. My professors was waaaay messier.
Thank you very much for this beautiful and clear explanation!
You explanation was crystal clear, if possible share some real time examples of data mining where entropy, gini index are used
Don't we have to match the sequence that we started the game (RRRB)? If so, 4 red balls would give 1*1*1*0 because there isn't a blue ball in that bucket?
Luis, Thank you so much for this brilliant elucidation of information theory & entropy. Merely as an avocation, I have been toying around with a pet evolutionary theory about belief systems and societies. In order to test it - if that is even possible - I felt I needed to develop some sort of computer program as a model. Since I have very little programming experience and only mediocre math skills, I have been teaching myself both (with a lot of help from the web). It was purely by accident that I stumbled upon Claude Shannon and information theory, and I immediately became fascinated with the topic, and have a hunch that it may somehow be relevant to my own research. Regardless, I am now interested in it for its own sake. I had a an ephemeral understanding of how all the facets (probability, logs, choices, etc.) were all related mathematically, but it wasn't until after watching your video that I believe I fully grok the concept. At one point early on, I found myself shouting, "if he brings up yes/no questions, I know I understand this!" And then you did. It was such a wonderful moment for someone who finds math so challenging, and it is greatly appreciated! I shall check out your other videos later. You're a very good teacher!
For your work, I would look into some of the work by Loet Leydesdorf.
@@Faustus_de_Reiz Thank you! I shall.
Hi Serrano do you have complete playlist of Information theory.
For the first time in my life i understand the real meaning of the Entropy
In the third sequence, I can ask if it is a vowel of
consonant, but...if it is not a vowel I still have to ask at leat 2 questions...
Fácil de compreender por causa da excelente explanação. Parabéns pelo vídeo e muito obrigado por compartilhar. / Easy to think because of the excellent explanation. Congratulations on the video and thank you very much for sharing.
That was highly intuitive, thank you, sir, I appreciate the effort behind this.
Lovely explanation...Superb
Question about information, entropy and the similarity to observer effect and collapse of wave-function.
I like your description of information at 2:30. "information = How much do I know about the ball I am picking?". I hold it in my hand, not looking at it but I have informaiton about it based on entropy.
But how much do I know about the ball when I open my hand and LOOK at it?obviously when I actually observe it, then I know ALL about it and this is maximum information so the entropy becomes zero? (I also recall that entropy is a measure of 'lack of information'.)
So is this process of 'observation leading to zero entropy' valid? and if not, why not, and if so, is it linked in any way to an observation in quantum physics causing a wave function to collapse?
It's like a wave function collapse. You don't know what will happen in the future, that's what probably is for. Once you know what happened in the past, it is solidified. All possibilities must collapse when the present passes over it.
The exception could be Schrodinger's Cat. But Schrodinger used that as an example of a stupid argument, so may not be that useful outside of quantum.
Is there a construction or characterization or description of how to ask the smartest questions every time ?
A very quick way to learn about entropy
very helpful for the mathematically challenged
Great explanation, greetings from Brazil!
this is a really great explanation, thanks so much for sharing mate!
Thank you for the very good video. Easiest to understand so far.
wow, another great and insightful presentation . really helps to build intuition
Thank you I clearly understand entropy and IG, BUT the ball is GREEN !!!
Superb step by step explanation
Great vid, I definately learned something. It could be improved by shortening it a bit by not going so deeply into the details of the "dummy" examples. Like, for the example with four red balls at 6:25, just say the probability of guessing all four balls correct is 100. No one needs an explanation of why that is. Another example is with the tree of questions. Saying "now, only one more question is needed" at 16:23 is enough, we don't need to go into details about those last questions and their outcome.
But, the aproach was very nice, I liked it. Just some tips for improvement for future vids.
As you said the game is to have red,red,red,blue. but for first case we have all reds. hence probability of winning in first case should be 1*1*1*0(prob of blue in first bucket). How did you calculate (1*1*1*1). please explain?
If Shannon were alive, he would enjoy seeing such a perfect explanation for his theory. Many thanks.
Best instructor there is! Thanks
I have learned so much from your teaching. Thank you.
And the entropy is number of bits needed to convey the information.
十分谢谢! Thank you very much, Luis.
Mr. Luis Serrano III great job in Neural Network and Claude Shannon Entropy.
Excellent job, Luis! Plain and simple: the log base 2 gives the number of bifurcations to arrive at the answer, and the probability of the answer serves to temper down the chaos introduced into the system by very rare events. Genius!
Superb explanation. I like your teaching style. Thank you very much :-)