@@donegal79 the job of a good teacher is to point a student in the right direction and not just hand wave complex topics. Plus, he did go and look for the proof as is evident in him watching this video.
This video made me so happy, best 35 minutes of my day for certain, maybe my whole week. Cheers for this; if you see this comment 3 years later, you're a legend for taking the time to explain this so clearly.
What I love about this presentation, unlike some others I have seen, is that it does not skip over any steps. Each step is very clear and easy to follow. Wonderful job sir., thank you.
This video is absolutely brilliant! I've always wanted to know how the normal distribution curve was derived, and your explanation was perfect! Thanks so much!!! Math is beautiful
@Ask why? ? differential equations. It's the maths for working out when more than one thing is changing at the same time. As opposed to normal calculus which only changes one thing. Like how far a ball goes if you throw it, if you change both the angle and how hard you throw.
@@stevenson720 I know my question is about 11 months too late, but regarding your reply, I.e. "differential equations ... for working out when more than one thing is changing...", etc. , isn't that actually what multivariable calculus is actually for? I suppose it really depends on whether we're talking about ordinary diff equations or partial diff equations, since ODE's deal with one independent variable, while PDE's are for multiple independent variables. I'm no math major, just an independent learner and lover of maths, so my response might not be 100% accurate, but if it's not, anyone may feel free to correct me.
This video is a brilliant illustration of Einstein's famous sentence: "Everything should be made as simple as possible, but not simpler". The derivation is beautiful, elegant and crystal clear. Thanks so much for sharing your knowledge!
It was a good joke though... But the last equation, the general one covers that too. Bad at throwing darts?? The value of σ is more. Or if your good at throwing darts but your shots clusters somewhere other than bull's eye. Your μ will be different. So never loose faith in maths, specially in general maths.
It’s beautiful to observe that the number of TH-cam “likes” decrease as the video is of educational nature and not about useless make-up tutorials etc. This itself IS a proof that the number of curious people actually wanting to understand and therefore watching this helpful video completely are from the “other side” of the Gaussian Distribution. ;-) Thanks for the fantastic job!!
I'm so grateful that I found this absolute gem of TH-cam, keep posting videos. You probably inspired thousands of people to be more interested in math/science
I'm from Brazil and i just found this class right here. I ask to my professor how to derivate this formula and she did not know it. This was one of the most impressive class that i ever saw, Thank you so much!!!!
Beautiful derivation of a ubiquitous formula: the hat trick comes at 22:48! To complete the discovery of this treasure, I am now deep diving to revise my Euler integral. Thanks so much!
Wow, this was awesome. I'm reading E. T. Jayne's Probability Theory. In Chapter 7, he performs this derivation, but as is often the case, he assumes the reader is as fluent as he is with functional analysis. This video really helped me fill in the gaps. Can't wait to watch the rest of your videos. Thanks a bunch!
Great vid. Cleared up much of the mystery of where normal distribution comes from. Have forgotten most of my calculus but was still able to follow along!
Fantastic explanation, step by step and does not assume the viewer knows any particular mathematical derivation. best video i've seen on the derivation of the normal distribution!
I am incredibly new to statistics and have never actually taken a course, but I have taken physics and engineering courses that apply physics; it's pretty neat to see that the dirac delta function is a limiting case of the normal distribution in which lambda approaches infinity, which I realized as soon as you showed how lambda transforms the shape of the graph. Very cool video!
Brilliant!!!! This is a MUST for any student of statistics. The way statistics are normally taught... many unanswered assumptions... which are answered by this video.
I love this video, this derivation is spectacular, I love it when mathematics links various seemingly unrelated concepts with each other and yields this beauty. That being said, I also think Gauss' proof in the article you provide is far easier and more accesible to students like myself.
I don't know what to say but you just saved my life. I have been looking for a proper derivation and not the 5 mins ones for months. Thank you thank you thank you so so so much.
Really well explained. I'm relatively new to this type of thinking and it was illuminating! On the fly definitions of new functions that lead you in any direction you'd like, seems really powerful, but also like a puzzle. Thanks for making this video!
I'm software developer. I'm pretty comfortable with discrete math and terrible with calculus. But your explanation is so clear that I was able to understand most of it. Thank you for such high quality content. This video deserves more views! I'm subscribed (which happens rarely!).
Wow, amazing explanation that cleared up everything about the normal distribution for me! You’re calm way of teaching is very clear and enjoyable, thank you!
For the transition from y = 0 to the general case, the 1D equation can be generalized to 2D due to radial symmetry, which makes the x axis equivalent to any other line going through (0,0). Regarding the number of dimensions, a minimum of two is necessary to specify coordinate independence and radial symmetry, which together give the form of an exponential. Lovely, unique video. Thanks!
Esto es bellisimo, me ayudo de una manera impresionante en mi clase de diseño de experimentos. Simplemente gracias!, no lo hubiera logrado sin ti This is beautiful, it helps me in an impressive way in my experiment design class. Simply thank you!, I wouldnt achieve it without you
Thank you for the brilliant derivation from nearly the First Principles. Thank you indeed. I deeply expect that there should have been a "History of Mathematics" YT channel, along with providing the reasons for the clever decisions taken at crucial steps to derive historically important equations. These steps have nothing to do with computation, but just a "Leap of Intelligence", because of which mathematics has prospered for so long. Began with Pythagoras's proof of root2 being an irrational number.
Awesome explanation...I have been trying to understand the function behind normal curve all this while and this is so beautifully explained...thanks a ton
Amazingly satisfying video; I've enjoyed your videos for a long time and this was particularly good. Everything is explained at just the right level and the derivation was so logical it all felt obvious by the end. Thank you for putting so much effort into your videos!
I like how you're writing the integral signs! Anyway this is a super clear video, thank you so much! I'm reading Maxwell's 1859 paper and I wasn't sure where the Ce^Ax^2 came from.
In your opinion, what looks nicer when written out: Gaussian distribution or the Schrödinger equation? And in general, what formula do you think is most aesthetically pleasing? Also, if you haven't seen it yet, look up Gauss's signature. It's one of the best things ever.
Excellent explanation, and a channel I'm looking forward to digging deeper! I'm a fellow medical student, with an engineering background so these videos really do intrigue me. Got a few questions... - What field within medicine interests you? I wonder if there's any area which is more conducive to your kind of conceptual understanding and deductive reasoning... I kind of put that part of my brain away the last few years, and only use it as a hobby like watching these videos haha - The video's derivation makes sense. And there's a sort of beauty of setting up the normal distribution from a darts analogy (i.e. probability falling off with distance). A rotational symmetry also makes sense. But what's not intuitive to me is why it is a 2D darts setup, and not a 1D or 3D darts setup? Not asking for a derivation, but just curious for any intuitive insights here.
+GypsyNulang I have the intuition that I'll become a pathologist someday although I've been told I have a surgeon's personality. I like pathology because the hours are regular and I'd have time to teach too. We could do a 3D board if you wanted but I like the 2D example because we have experience with that. Maxwell in his derivation works in three dimensions, thinking of gas diffusing from a central source in all three directions. The derivation is conceptually the same, save for a few constants: statistical independence of all coordinates and dependence only on distance from origin.
This video is quite interesting, I got bored one afternoon and I searched the derivation for Normal distribution because I was learning probability questions involving Normal distributions in school where we use a table to find the values for standardized normal distribution where n=1 s=0 using a table . This is really good except that I lack the attention span to understand most of this
hey great idea for a video. i found in undergrad that my stats prof didnt really care to elaborate on this, and so stats ended up being my least favorite math class. turns out you need to understand this stuff to do my actual job, so this is much appreciated
Terrific derivation! The only potentially ambiguous part is at 21:07 when "A" is written as the total area under the curve, whilst "A" is still written at the top right as "A = -h^2". Probably not confusing to anyone who understands the rest of the derivation, but it still bothers me a bit to have "A" on the screen twice, with two different usages. 😉
We love you from the bottom of our hearts. Everything is soooooooooo clear unlike other videos on youtube. I am trying to learn data analytics, on my own from youtube for free Cause I want a neww skill your standard deviation
8:00 I am a bit confused a bit here. How did you go from *φ (x) = λ f(x)* to *φ (√x^2 + y^2 ) = λ f(√x^2 + y^2 )* Also according to your argument *φ (x) = λ f(x)* is only true for points on the x axis since *φ (x) = φ (√x^2 + 0^2 )*
+xoppa09 All these functions are of a single variable, no matter what name you give to that variable, call it 'x' call it '√x^2 + y^2', or 'y', or whatever. The observation we make when we see φ(x) = λf(x) is that whenever something is evaluated by φ, this is the same as it being evaluated by f times some multiplier λ. It probably would have been better if I had written φ(•) = λf(•) or drop the argument and write φ = λf instead of φ(x) = λf(x) to emphasize this
Class teacher often omit the derivation of Normal Distribution. I always wonder how the bell curve formula is derived. Here the answer I am. Thanks a lot.
The formula tries to check how much data has varied from the mean value. The square is to generalize those which have varied less (-ve) and more (+ve) to the mean
So the formula for variance is Summation of all the Squared Deviations; ie Sum ([X-Xbar]^2). In a continuous setting, we integrate rather than sum, therefore we integrate ([X-Xbar]^2) and multiply it with the PDF. As Xbar, or the mean, is equal to 0, we only end up integrating [X]^2*pdf from negative infinity to positive infinity. Hope this helps man. en.wikipedia.org/wiki/Variance#Absolutely_continuous_random_variable
So now, 26 years after I first encountered this equation in college, I finally know where it came from. A cool thought experiment coupled with some “cosmetics” which (true to their name) conceal its true identity.
I am not sure if the equation written at 21:37 is correct; isn't φ(x) the probability distribution function and not f(x)? Why is f(x)dx being integrated?
It is correct, but only because it doesn't matter what he chooses to represent the particular probability function (may it be a distribution or density function for discrete or continuous data respectively), or the proportionality constant "lambda". I suppose if you really wanted to be pedantic and make sure that everything he is saying is correct, then after the line of work: f(x) = lambda*e^(Ax^2) we would then say: =) phi(x) = (lambda^2)*e^(Ax^2) Now redefine f(x) and our constant "lambda" (which remember, does not depend on how we choose to name it), such that: f(x) = f(x)/(lambda) =) f(x) = phi(x) and thus, from here on, our f(x) represents the probability function which we want. Also, redefine lambda: lambda = lambda^(1/2) (remember: both are still constants) =) phi(x) = lambda*e^(Ax^2) and thus, we may now return to his line of working where f(x) represents this particular probability function: f(x) = lambda*e^(Ax^2) Allowing us set the integral from negative infinity to positive infinity of f(x) equal to 1.
Well, good job in explaining this, but it leaves a big question unanswered (as do most "derivation" videos of the Gaussian distribution. Namely, you explained how to derive f(x), the pdf of the x coord of the dartboard, and this is "a" bell curve. We haven't shown this is also "the" Gaussian bell curve of the central limit theorem - it's conceivable that f(x) only roughly looks the Gaussian but is not identical. How do we show f(x) is the Gaussian of the clt?
Great explanation! I followed everything except for the part where you introduced the integral for variance (around 28:00). Could someone clarify where the integral of x^2 * f(x) * dx comes from?
Basically the variance of random variable X is defined as the expected value of the squared deviation from the mean of X. E((X-mean)^2) = variance, where E is the expected value. The expected value is basically the same as the weighted mean, only the weighted mean is for discrete values of x, and the expected value is for continous functions of x, that can be integrated. The formula of the waited mean is the sum of all the occuring values multiplied by their weight(number of occurance), divided by the total number of weights(total occurances of each value). In the case of expected values of continous probability density functions, the weights are not the number of occurences of each value, rather the probabilites of each value(f(x)*dx), and since the probabilities add up to 1, the division by the total number of weights, or probabilities - in the continous case - gives only the integral of x multiplied by the probability of x through all values (E(X) = Integral of x*f(x)dx). Since the mean of the normal distribution function is 0, the squared deviation from the mean in this case gives only x^2, so E(X^2)=x^2*f(x)dx. I hope it's a bit more clear, altough I think was quite confusing, so you should read this through several times and really focus on every part, if you want to understand it :D.
😮😮😮👏👏👏👌 awesome... for years this has been troubling me how to derive normal distribution equation... trust me I had night mares... 😅😅😅 I am too bad with just mugging up... thanks a ton... I used to wonder how we are able to connect two independent sample space through mysterious Z... now its quite clear. 1. The sample space has to behave like a normal distribution phenomena 2. Anyway the probability density curve will have area as 1... and that makes me to understand t distribution even better... thank you...
Thanks for this - very well explained. However, one question at the back of my mind is that if you consider a non-zero mean, wouldn't you need to adapt the definition at 27:18 to be $$ \int (x - \mu)^2 f(x) dx $$ ? And wouldn't this complicate things a lot? Don't get me wrong, I understand the intuition behind just shifting the mean, it just seems like a potential snag that I would get caught on in an exam.
both gaussian integral and integration of PDF follow same pattern { int.(Ae^(-x^2)) }, but one of them integrates to one and other one integrates to sqrt(pi). I understand Error Function and its derivation. What's the intuitive understanding behind the relation ship?
In the actual multivariate normal, x and y can be correlated (linearly dependent) so that instead of all points in a circle being equally likely, it's all points in an ellipse. Does the math still work out in that case?
Does anyone know from which book the 2. further reading comes from? ------ Edit: For everyone wondering, the book is called "Probability theory - the logic of science" by E. T. Jaynes.
But wait, our original missions statement was that the probability of finding a dart across all r (anywhere on the board) is equal to 1, so it's actually the integral from r=0 to r=inf, so, shouldn't our distribution have double the lambda you found? (Comment made at 21 minutes)
I don't see how a function that has y-intercept of 2 and a positive range could ever have an integral less than 2, therefore I doubt the fact that saying h^2 = -pi lambda ^2 is actually correct, since that substitution was made given the requirement that the integral was equal to 1, despite it necessarily being greater than 2.
+Magic Gonads But remember I'm integrating f over all x-coordinates (or y) so the limits of integration are as stated. You could do a double integration of phi from r=0 to r=inf and theta=0 to theta=2*pi, setting that equal to 1 and you get the same relation between the constants I have in the video. I didn't go down that route because I wouldn't use multivariable calculus in a video where I can use single-variable instead. I'm not sure I get your intuition behind your second question about the integral being less than 2. The y-intercept being 2 is only one point on the function and for considering the total area, you should also consider how quickly the function dies to zero to the left and to the right.
With the second point, I realised that the area being 2 dimensional allows it to be less than the length, so as long as it dips to near 0 before |x|=1 then it's plausible, and then your solution would map that exact path. But shouldn't this function, regardless of what theta you use as the r, be the same. So along the positive x axis you would map every positive value of r, so using inf and -inf (due to symmetry) doubles the area under the curve. It's not possible to find a dart a negative distance away from the bulls eye since you're considering theta irrelevant. You don't need to use y.
Please help me, i didn't understood why a function that gives the probability in a (x,y) cordenate, when multiplicate by the Da(little area) , gives us the probability in that area.
Because φ is the probability density function, so its units are (probability)/(area) (or probability per unit area). Therefore, the probability of a dart landing at some point is φ*dA, because you need to get rid of the area units at the bottom.
i don't understand how can you say at 8:46 that f(sqrt(x^2+y^2)) is equal to f(x)f(y) just analysing a single case (if you put sqrt(x^2+y^2)=x this is true just in a single case and it0s the case you described, y=0)
That was done because he wanted to get rid of the function φ and express the equation in terms of the function f only. He used the specific case when y=0 and defined the constant λ as equal to f(y=0). He said that if φ(sqrt(x^2+y^2)) is true in general then it must be true in an specific case, when y=0. So, φ(x)=λf(x) for any variable x and it follows that φ(sqrt(x^2+y^2))=λf(sqrt(x^2+y^2))=f(x)f(y). You see at the end that λf(sqrt(x^2+y^2))=f(x)f(y) with no φ involved.
Lots of assumption but worth it !! However for in the case of multidimensional scenario, x^2 + y^2 != r^2 so i think Gaussian distribution might need improvement.
Thank god someone cares to explain this equation that just floats around in the math realm with no explanation from teachers other than, "here!"
Come on, it is not an equation floating around. There are many derivations in books. Actually this derivation is rather too long.
@@univuniveral9713 The best derivations are the easiest to understand imo
@@wurttmapper2200 True
Just because you were too lazy to seek out a proof. Hey, but too easy to blame your teachers. Dufus.
@@donegal79 the job of a good teacher is to point a student in the right direction and not just hand wave complex topics.
Plus, he did go and look for the proof as is evident in him watching this video.
This video made me so happy, best 35 minutes of my day for certain, maybe my whole week. Cheers for this; if you see this comment 3 years later, you're a legend for taking the time to explain this so clearly.
What I love about this presentation, unlike some others I have seen, is that it does not skip over any steps. Each step is very clear and easy to follow. Wonderful job sir., thank you.
This video is absolutely brilliant! I've always wanted to know how the normal distribution curve was derived, and your explanation was perfect! Thanks so much!!! Math is beautiful
"Maths is beautiful" your so right. 😁
@Ask why? ? differential equations. It's the maths for working out when more than one thing is changing at the same time. As opposed to normal calculus which only changes one thing. Like how far a ball goes if you throw it, if you change both the angle and how hard you throw.
J. I can’t really link your comments to your profile photo... it’s so illogical
@@stevenson720 I know my question is about 11 months too late, but regarding your reply, I.e. "differential equations ... for working out when more than one thing is changing...", etc. , isn't that actually what multivariable calculus is actually for?
I suppose it really depends on whether we're talking about ordinary diff equations or partial diff equations, since ODE's deal with one independent variable, while PDE's are for multiple independent variables. I'm no math major, just an independent learner and lover of maths, so my response might not be 100% accurate, but if it's not, anyone may feel free to correct me.
This is fantastic! Everything is explained and paced so well; no other video online has derived the normal distribution so clearly as you have.
This video is a brilliant illustration of Einstein's famous sentence: "Everything should be made as simple as possible, but not simpler". The derivation is beautiful, elegant and crystal clear. Thanks so much for sharing your knowledge!
"We are more likely to find a dot near the bulls eye." You've oblivious never seen my wife play darts.
+FriendEd
More likely to strike someone else in the eye...amirite?
wait what are you doing here
It was a good joke though...
But the last equation, the general one covers that too.
Bad at throwing darts?? The value of σ is more. Or if your good at throwing darts but your shots clusters somewhere other than bull's eye. Your μ will be different.
So never loose faith in maths, specially in general maths.
It’s beautiful to observe that the number of TH-cam “likes” decrease as the video is of educational nature and not about useless make-up tutorials etc. This itself IS a proof that the number of curious people actually wanting to understand and therefore watching this helpful video completely are from the “other side” of the Gaussian Distribution. ;-) Thanks for the fantastic job!!
Dude this was a great video, keep up the great work!! I love how at 3 am in the morning I am binging on your videos, goes to show that you have skill
I'm so grateful that I found this absolute gem of TH-cam, keep posting videos. You probably inspired thousands of people to be more interested in math/science
I'm from Brazil and i just found this class right here. I ask to my professor how to derivate this formula and she did not know it. This was one of the most impressive class that i ever saw, Thank you so much!!!!
My physics professor from Greece pronounced it "φ"
wrong, i'm pretty sure it is pronounced "φ"
@@yerr234 What's funny is that my professors here in Germany, even though one is from Russia and the other is a German, both pronounce it "φ".
You're all funny😂
It's φ not φ
Beautiful derivation of a ubiquitous formula: the hat trick comes at 22:48! To complete the discovery of this treasure, I am now deep diving to revise my Euler integral. Thanks so much!
Wow, this was awesome. I'm reading E. T. Jayne's Probability Theory. In Chapter 7, he performs this derivation, but as is often the case, he assumes the reader is as fluent as he is with functional analysis. This video really helped me fill in the gaps. Can't wait to watch the rest of your videos. Thanks a bunch!
I have always wondered about this formula. Your explanation is the most concise and understandable one even to a novice like me. Thanks a million.
Great vid. Cleared up much of the mystery of where normal distribution comes from. Have forgotten most of my calculus but was still able to follow along!
Fantastic explanation, step by step and does not assume the viewer knows any particular mathematical derivation. best video i've seen on the derivation of the normal distribution!
One of the best videos on the internet, love it!
I am incredibly new to statistics and have never actually taken a course, but I have taken physics and engineering courses that apply physics; it's pretty neat to see that the dirac delta function is a limiting case of the normal distribution in which lambda approaches infinity, which I realized as soon as you showed how lambda transforms the shape of the graph. Very cool video!
Brilliant!!!! This is a MUST for any student of statistics. The way statistics are normally taught... many unanswered assumptions... which are answered by this video.
I love this video, this derivation is spectacular, I love it when mathematics links various seemingly unrelated concepts with each other and yields this beauty. That being said, I also think Gauss' proof in the article you provide is far easier and more accesible to students like myself.
This is the most clear and complete derivation of the Normal distribution I've seen.
Thanks for sharing
I just started a Patreon if you appreciate the work done on this channel: www.patreon.com/Mathoma
Thanks for viewing the channel!
Great works in this video! After watching this video, I just can't appreciate enough the original inventor of this function Carl Friedrich Gauss!
I don't know what to say but you just saved my life. I have been looking for a proper derivation and not the 5 mins ones for months. Thank you thank you thank you so so so much.
Amazing, finally someone that explains completely and holistically how to derive the Gaussian density function. Thank you!
Really well explained. I'm relatively new to this type of thinking and it was illuminating! On the fly definitions of new functions that lead you in any direction you'd like, seems really powerful, but also like a puzzle. Thanks for making this video!
I'm software developer. I'm pretty comfortable with discrete math and terrible with calculus. But your explanation is so clear that I was able to understand most of it.
Thank you for such high quality content. This video deserves more views!
I'm subscribed (which happens rarely!).
Man, what an incredible video! Loved your derivation.
Wow, amazing explanation that cleared up everything about the normal distribution for me! You’re calm way of teaching is very clear and enjoyable, thank you!
Thank you! Most simple and well demonstrated video about the subject that I've seen. Congratulations!
For the transition from y = 0 to the general case, the 1D equation can be generalized to 2D due to radial symmetry, which makes the x axis equivalent to any other line going through (0,0).
Regarding the number of dimensions, a minimum of two is necessary to specify coordinate independence and radial symmetry, which together give the form of an exponential.
Lovely, unique video. Thanks!
Esto es bellisimo, me ayudo de una manera impresionante en mi clase de diseño de experimentos. Simplemente gracias!, no lo hubiera logrado sin ti
This is beautiful, it helps me in an impressive way in my experiment design class. Simply thank you!, I wouldnt achieve it without you
Absolutely brilliant! You make Mathematics look like what it's meant to be, simple. Thank you for this great video.
Seriously Mathoma... thank you so much sir. Thank you. You are doing God's work. You will ride shiny and chrome in Valmatha. Excellent video.
It took me 2 days to understand the concept behind this topic for my statistics class. This video cleared everything up for me. Thx!!
I have been looking for a video like this for so long, a clear derivation from scratch of the normal distribution
Thank you for the brilliant derivation from nearly the First Principles. Thank you indeed.
I deeply expect that there should have been a "History of Mathematics" YT channel, along with providing the reasons for the clever decisions taken at crucial steps to derive historically important equations. These steps have nothing to do with computation, but just a "Leap of Intelligence", because of which mathematics has prospered for so long. Began with Pythagoras's proof of root2 being an irrational number.
best video for derivation of gaussian distribution ever.
Awesome explanation...I have been trying to understand the function behind normal curve all this while and this is so beautifully explained...thanks a ton
i was so confused how the "2" comes in the formula
now i finally understand
thanks :)
It's a very good video. Thank you. I just wish the frequent commercials weren't as loud.
Amazing... This video really helps to understand the Gaussian distribution a lot better. Thank you.
Amazingly satisfying video; I've enjoyed your videos for a long time and this was particularly good. Everything is explained at just the right level and the derivation was so logical it all felt obvious by the end. Thank you for putting so much effort into your videos!
+Will Price
That's very kind of you to say. My pleasure.
I actually enjoyed watching this video. I expected to learn, but i never expected to enjoy the derivation of PDF. This was fun!
this is a fantastic lecture and literally reveals mathemagic of normal distribution curve, i am going to see this again and again and...
We need great explainers like you... Awesomely explained.
This is an outstanding video. You have explained all the details with clarity. Thank you!
0:42 Thought Experiment: Dart Board
1:57 Probability Denisty Function; Fi
This was wonderful, thank you, I'm excited to see the rest of your channel now!
Yes. This video Is perfect. I've been looking for this for years. You are wonderful. Thank you so much. Excellent explanation!!!
I like how you're writing the integral signs! Anyway this is a super clear video, thank you so much! I'm reading Maxwell's 1859 paper and I wasn't sure where the Ce^Ax^2 came from.
you nailed it with this video tho! it's so cool to see that this derivation is actually an insight derived from a multivariable case.
wow i am watching your video at 12/25/2020 and i suppose your video is my Christmas gift. so beautiful explanation.
In your opinion, what looks nicer when written out: Gaussian distribution or the Schrödinger equation? And in general, what formula do you think is most aesthetically pleasing? Also, if you haven't seen it yet, look up Gauss's signature. It's one of the best things ever.
Thank you for taking your time and explaining it beautifully!
Excellent explanation, and a channel I'm looking forward to digging deeper! I'm a fellow medical student, with an engineering background so these videos really do intrigue me. Got a few questions...
- What field within medicine interests you? I wonder if there's any area which is more conducive to your kind of conceptual understanding and deductive reasoning... I kind of put that part of my brain away the last few years, and only use it as a hobby like watching these videos haha
- The video's derivation makes sense. And there's a sort of beauty of setting up the normal distribution from a darts analogy (i.e. probability falling off with distance). A rotational symmetry also makes sense. But what's not intuitive to me is why it is a 2D darts setup, and not a 1D or 3D darts setup? Not asking for a derivation, but just curious for any intuitive insights here.
+GypsyNulang
I have the intuition that I'll become a pathologist someday although I've been told I have a surgeon's personality. I like pathology because the hours are regular and I'd have time to teach too.
We could do a 3D board if you wanted but I like the 2D example because we have experience with that. Maxwell in his derivation works in three dimensions, thinking of gas diffusing from a central source in all three directions. The derivation is conceptually the same, save for a few constants: statistical independence of all coordinates and dependence only on distance from origin.
Ah thanks yea the multiple axes demonstrate the statistical independence
I have a neat trick for resolving 14:00. Replace the unknown g(.) by (uoh)(.) where h(.) is squaring function.
This video is quite interesting, I got bored one afternoon and I searched the derivation for Normal distribution because I was learning probability questions involving Normal distributions in school where we use a table to find the values for standardized normal distribution where n=1 s=0 using a table . This is really good except that I lack the attention span to understand most of this
hey great idea for a video. i found in undergrad that my stats prof didnt really care to elaborate on this, and so stats ended up being my least favorite math class. turns out you need to understand this stuff to do my actual job, so this is much appreciated
Terrific derivation! The only potentially ambiguous part is at 21:07 when "A" is written as the total area under the curve, whilst "A" is still written at the top right as "A = -h^2". Probably not confusing to anyone who understands the rest of the derivation, but it still bothers me a bit to have "A" on the screen twice, with two different usages. 😉
Thanks so much for the clarity of the video.
We love you from the bottom of our hearts. Everything is soooooooooo clear unlike other videos on youtube. I am trying to learn data analytics, on my own from youtube for free Cause I want a neww skill your standard deviation
8:00 I am a bit confused a bit here. How did you go from
*φ (x) = λ f(x)*
to
*φ (√x^2 + y^2 ) = λ f(√x^2 + y^2 )*
Also according to your argument *φ (x) = λ f(x)* is only true for points on the x axis
since *φ (x) = φ (√x^2 + 0^2 )*
+xoppa09
All these functions are of a single variable, no matter what name you give to that variable, call it 'x' call it '√x^2 + y^2', or 'y', or whatever. The observation we make when we see φ(x) = λf(x) is that whenever something is evaluated by φ, this is the same as it being evaluated by f times some multiplier λ. It probably would have been better if I had written φ(•) = λf(•) or drop the argument and write φ = λf instead of φ(x) = λf(x) to emphasize this
Crystal clear explanation, thank you Sir for the great work!
Class teacher often omit the derivation of Normal Distribution. I always wonder how the bell curve formula is derived. Here the answer I am. Thanks a lot.
Beautiful! Thank you so much. I wish this was around when I went through my MS.
27:21 I am unable to get the intuition about the variance integral part - how the formula came up?
The formula tries to check how much data has varied from the mean value. The square is to generalize those which have varied less (-ve) and more (+ve) to the mean
So the formula for variance is Summation of all the Squared Deviations; ie Sum ([X-Xbar]^2). In a continuous setting, we integrate rather than sum, therefore we integrate ([X-Xbar]^2) and multiply it with the PDF. As Xbar, or the mean, is equal to 0, we only end up integrating [X]^2*pdf from negative infinity to positive infinity.
Hope this helps man.
en.wikipedia.org/wiki/Variance#Absolutely_continuous_random_variable
So now, 26 years after I first encountered this equation in college, I finally know where it came from. A cool thought experiment coupled with some “cosmetics” which (true to their name) conceal its true identity.
oh, thats how you get e^-(x^2)
+Mi Les
Indeed.
This was truly beautiful. Thank you so much for the great content!
14:15 How can squaring remove the root? Shouldn't it be sqrt(x^4+y^4) if
x-->x^2 and y-->y^2?
He said Exponentiating them.
this is so well presented, thank you so much for this
Clear clean well described well paced, excellent. Thank you.
I am not sure if the equation written at 21:37 is correct; isn't φ(x) the probability distribution function and not f(x)? Why is f(x)dx being integrated?
It is correct, but only because it doesn't matter what he chooses to represent the particular probability function (may it be a distribution or density function for discrete or continuous data respectively), or the proportionality constant "lambda".
I suppose if you really wanted to be pedantic and make sure that everything he is saying is correct, then after the line of work:
f(x) = lambda*e^(Ax^2)
we would then say:
=) phi(x) = (lambda^2)*e^(Ax^2)
Now redefine f(x) and our constant "lambda" (which remember, does not depend on how we choose to name it), such that:
f(x) = f(x)/(lambda)
=) f(x) = phi(x)
and thus, from here on, our f(x) represents the probability function which we want.
Also, redefine lambda:
lambda = lambda^(1/2)
(remember: both are still constants)
=) phi(x) = lambda*e^(Ax^2)
and thus, we may now return to his line of working where f(x) represents this particular probability function:
f(x) = lambda*e^(Ax^2)
Allowing us set the integral from negative infinity to positive infinity of f(x) equal to 1.
Absolutely! I enjoy this fresh look of the derivation of this class of Gaussian functions. I like the way you explained
Great video!! Helped me immensely in understanding where the normal dist. pdf came from. Thnx a lot😆😆
30:51 Why does that satisfy normalization condition? Could you explain?
Thank you for this video! Also, you have a very steady hand when drawing the normal distribution. I can't do it that well.
Well, good job in explaining this, but it leaves a big question unanswered (as do most "derivation" videos of the Gaussian distribution. Namely, you explained how to derive f(x), the pdf of the x coord of the dartboard, and this is "a" bell curve. We haven't shown this is also "the" Gaussian bell curve of the central limit theorem - it's conceivable that f(x) only roughly looks the Gaussian but is not identical. How do we show f(x) is the Gaussian of the clt?
You are awesome bro..
Finally someone cared about proof..
Great explanation! I followed everything except for the part where you introduced the integral for variance (around 28:00). Could someone clarify where the integral of x^2 * f(x) * dx comes from?
Basically the variance of random variable X is defined as the expected value of the squared deviation from the mean of X. E((X-mean)^2) = variance, where E is the expected value. The expected value is basically the same as the weighted mean, only the weighted mean is for discrete values of x, and the expected value is for continous functions of x, that can be integrated. The formula of the waited mean is the sum of all the occuring values multiplied by their weight(number of occurance), divided by the total number of weights(total occurances of each value). In the case of expected values of continous probability density functions, the weights are not the number of occurences of each value, rather the probabilites of each value(f(x)*dx), and since the probabilities add up to 1, the division by the total number of weights, or probabilities - in the continous case - gives only the integral of x multiplied by the probability of x through all values (E(X) = Integral of x*f(x)dx). Since the mean of the normal distribution function is 0, the squared deviation from the mean in this case gives only x^2, so E(X^2)=x^2*f(x)dx. I hope it's a bit more clear, altough I think was quite confusing, so you should read this through several times and really focus on every part, if you want to understand it :D.
😮😮😮👏👏👏👌 awesome... for years this has been troubling me how to derive normal distribution equation... trust me I had night mares... 😅😅😅 I am too bad with just mugging up... thanks a ton... I used to wonder how we are able to connect two independent sample space through mysterious Z... now its quite clear. 1. The sample space has to behave like a normal distribution phenomena 2. Anyway the probability density curve will have area as 1... and that makes me to understand t distribution even better... thank you...
4:20 (nice) wouldn't that also rotate the box by θ' - θ?
Thanks for this - very well explained. However, one question at the back of my mind is that if you consider a non-zero mean, wouldn't you need to adapt the definition at 27:18 to be $$ \int (x - \mu)^2 f(x) dx $$ ? And wouldn't this complicate things a lot?
Don't get me wrong, I understand the intuition behind just shifting the mean, it just seems like a potential snag that I would get caught on in an exam.
Yes, even I have this doubt.
Mathoma, could you include your bibliographic sources in the video description? Thanks.
both gaussian integral and integration of PDF follow same pattern { int.(Ae^(-x^2)) }, but one of them integrates to one and other one integrates to sqrt(pi). I understand Error Function and its derivation. What's the intuitive understanding behind the relation ship?
I understand the formula now but I guess I don't know where a normal distribution came from. Like how do they figure out where the dots would be.
Thanks for this! Really, really helpful! Will you make videos about other distributions?
In the actual multivariate normal, x and y can be correlated (linearly dependent) so that instead of all points in a circle being equally likely, it's all points in an ellipse. Does the math still work out in that case?
Does anyone know from which book the 2. further reading comes from?
------
Edit: For everyone wondering, the book is called "Probability theory - the logic of science" by E. T. Jaynes.
But wait, our original missions statement was that the probability of finding a dart across all r (anywhere on the board) is equal to 1, so it's actually the integral from r=0 to r=inf, so, shouldn't our distribution have double the lambda you found?
(Comment made at 21 minutes)
Or rather, the coefficient should be twice lambda
I don't see how a function that has y-intercept of 2 and a positive range could ever have an integral less than 2, therefore I doubt the fact that saying h^2 = -pi lambda ^2 is actually correct, since that substitution was made given the requirement that the integral was equal to 1, despite it necessarily being greater than 2.
+Magic Gonads
But remember I'm integrating f over all x-coordinates (or y) so the limits of integration are as stated. You could do a double integration of phi from r=0 to r=inf and theta=0 to theta=2*pi, setting that equal to 1 and you get the same relation between the constants I have in the video. I didn't go down that route because I wouldn't use multivariable calculus in a video where I can use single-variable instead.
I'm not sure I get your intuition behind your second question about the integral being less than 2. The y-intercept being 2 is only one point on the function and for considering the total area, you should also consider how quickly the function dies to zero to the left and to the right.
With the second point, I realised that the area being 2 dimensional allows it to be less than the length, so as long as it dips to near 0 before |x|=1 then it's plausible, and then your solution would map that exact path.
But shouldn't this function, regardless of what theta you use as the r, be the same.
So along the positive x axis you would map every positive value of r, so using inf and -inf (due to symmetry) doubles the area under the curve. It's not possible to find a dart a negative distance away from the bulls eye since you're considering theta irrelevant. You don't need to use y.
Also something else, what would e^-pix actually correspond to in terms of b^-x, what would the base be?
excellent derivation very intuitive,i needed it for understanding gaussian regression
Please help me, i didn't understood why a function that gives the probability in a (x,y) cordenate, when multiplicate by the Da(little area) , gives us the probability in that area.
5:50 why independent f(x)*f(y)?
Hi! Pablo, from Spain.. :) Min. 2:34, Why phi times dA? Thanks in advance!
Because φ is the probability density function, so its units are (probability)/(area) (or probability per unit area). Therefore, the probability of a dart landing at some point is φ*dA, because you need to get rid of the area units at the bottom.
How did this derivation ensure an inflection point 1 standard deviation from the mean?
How do you intuit the idea that
P(x1)P(x2)P(x3)....P(xn)=P(0)^(n-1) • P(sqrt(x1^2+x2^2.....xn^2)
i don't understand how can you say at 8:46 that f(sqrt(x^2+y^2)) is equal to f(x)f(y) just analysing a single case (if you put sqrt(x^2+y^2)=x this is true just in a single case and it0s the case you described, y=0)
can it be explained by treating y like a parameter?
That was done because he wanted to get rid of the function φ and express the equation in terms of the function f only. He used the specific case when y=0 and defined the constant λ as equal to f(y=0). He said that if φ(sqrt(x^2+y^2)) is true in general then it must be true in an specific case, when y=0. So, φ(x)=λf(x) for any variable x and it follows that φ(sqrt(x^2+y^2))=λf(sqrt(x^2+y^2))=f(x)f(y). You see at the end that λf(sqrt(x^2+y^2))=f(x)f(y) with no φ involved.
Lots of assumption but worth it !! However for in the case of multidimensional scenario, x^2 + y^2 != r^2 so i think Gaussian distribution might need improvement.