Thanks for watching! Checkout the description for the MEDIUM article (published in Towards Data Science) that accompanies this video. Hopefully that should answer questions. Also please follow here and on medium for fun updates like this!
I can’t speak for every case. But in linear regression, we assume the distribution of the labels follows a normal distribution. And the normal distribution can be characterized by a mean and standard deviation. And if you substitute this is the “maximum likelihood estimation”, the math with simplify to optimizing the residual sum or squares ( which is proportional to the mean squared error ) to compute the coefficients in the linear regression hypothesis. I explain this in the entire probability and likelihood videos too if that helps
It was probably a subject that I had been trying to clarify in my head for a month and could not clarify it. Maybe because I'm a little detail-oriented. Thanks to you, brother, I understood the subject. Thanks to TH-cam, you have a brother from the other side of the world. Thank you very much.
Thank you. Studying mathematics and statistics in college. I really like this video. My professor told me that “ the most important thing for statistics is : you have to understand the basic logic first using a basic example or daily life example, know what u want and what you need to do“. The second important thing is to “remember the notation and to read the books and study by myself. I really like the first part of the video---that’s the key and core idea for most likely function. Why i watch this video! 😂, Because want to refresh the idea. Doing harder problems with only notations and symbols, get lost.
This is great! However it's really important to not confuse the probability density function (p(x)) with the probability of x. For one p(x) can be larger than 1!
You also take the logarithm of both sides because that leads to nice properties when differentiating (because log is strictly increasing, it maintains the property that if x1 < x2, then l(x1) < l(x2)). Addressing arithmetic underflow is definitely a useful added benefit too.
it's great that you thought of making a video on comparison between probability and likelihood. However, I think in the initial graphs, the y-axis do not represent probability values. They are probability-density values at various x.
Nice introduction! Very clear and helpful, thanks. My only nitpick would be that, when you change to logarithms, maybe "L proportional to P" (i.e. "L = kP") should become "log L = log k + log P" - not a proportionality anymore, but a constant offset. The idea of monotonicity is still maintained.
Yep. Good catch. I think that's technically correct. Guess when making this type of video when teaching on the spot, sometimes details like this slip my mind
Thanks! Great explanation at the beginning (up to about minute 8 which is how far I have gotten). Aren't your example choices of myu and sigma off by a factor of more than 1000 though? Just want to make sure I am clear about it.
You were talking about sigma and mean , and everything was clear until when you started talking about theta , where did the sigma and mean go ? are we training the model to make predictions on the model parameters or the distribution parameters ?? Thanks tho.
Is it possible to find probability distribution? Looks like in real world we see only likelihood, couse can't obtain general observation (population), does it?
probabaly a stupid question but, P(y1,y2,y3...) is written as P(y1).P(y2).P(y3)...; P(y1,y2,y3...) isn't this a function, but taking the product P(y1).P(y2).P(y3)... gives me a number? and these two are the same thing?
P(y1,y2,y3) is the probability the first random variable (RV) has a value y1 AND the 2nd RV has a value y2 and the 3rd RV is y3. This is a number. Now, if each of these RVs are independent of each other, then yea you can write it out as a product of P(y1)P(y2)P(y3). This too is a product of 3 numbers which gives us a number. If they aren’t independent RVs, you are going to have to use the Bayes Rule to write it out in a compex equation. Ultimately, the outcome though is still some real number
I don't see any math explanation in this other than showing the equations. but good explanation theoretically. sorry to comment this but i would appreciate if i see actual math and its explinations. thanks
Thanks for commenting! This was my first time teaching in this way with a white boarding strategy. I have tried more for future videos (hopefully they have turned out better)
At no point in this video did you ever state what likelihood actually is, only what it is proportional to. I recognize you're trying to educate but this is a very poor job, similar to the article you wrote on this subject.
Thanks for watching! Checkout the description for the MEDIUM article (published in Towards Data Science) that accompanies this video. Hopefully that should answer questions. Also please follow here and on medium for fun updates like this!
Tell me how do I use intuition vs probability to predict outcome of my 5 lottery deep training model? 😂
Could you please explain why we used mean and standard deviation when attempting to calculate the likelihood?
I can’t speak for every case. But in linear regression, we assume the distribution of the labels follows a normal distribution. And the normal distribution can be characterized by a mean and standard deviation. And if you substitute this is the “maximum likelihood estimation”, the math with simplify to optimizing the residual sum or squares ( which is proportional to the mean squared error ) to compute the coefficients in the linear regression hypothesis.
I explain this in the entire probability and likelihood videos too if that helps
Thank you so much. Such a good and simple explanation sir.
It was probably a subject that I had been trying to clarify in my head for a month and could not clarify it. Maybe because I'm a little detail-oriented. Thanks to you, brother, I understood the subject. Thanks to TH-cam, you have a brother from the other side of the world. Thank you very much.
Thank you. Studying mathematics and statistics in college. I really like this video. My professor told me that “ the most important thing for statistics is : you have to understand the basic logic first using a basic example or daily life example, know what u want and what you need to do“. The second important thing is to “remember the notation and to read the books and study by myself. I really like the first part of the video---that’s the key and core idea for most likely function. Why i watch this video! 😂, Because want to refresh the idea. Doing harder problems with only notations and symbols, get lost.
finally, my searching of 2-3 hours and many videos on the likelihood rests. thanks man...
This is great! However it's really important to not confuse the probability density function (p(x)) with the probability of x. For one p(x) can be larger than 1!
You also take the logarithm of both sides because that leads to nice properties when differentiating (because log is strictly increasing, it maintains the property that if x1 < x2, then l(x1) < l(x2)). Addressing arithmetic underflow is definitely a useful added benefit too.
This is such a clear explanation. Great job my dude
it's great that you thought of making a video on comparison between probability and likelihood. However, I think in the initial graphs, the y-axis do not represent probability values. They are probability-density values at various x.
Very well done, clear and concise!
Thank you! My first time trying this style out. So I’m glad it turned out well :)
Seriously, one of the best explanations !
one of the best explanations on youtube! well done sir!
Thanks a ton for watching
Nicely explained! I got better understanding of this, could you also include some examples which give some feel about the calculations...
This is the best explanation of likelihood function. thank you so much for the video.
Thank you so much. This video solved so many things for me.
The mean values are not well selected. Most of the samples are distributed around 200k. So the means have to be around 200k
Thank you! My confusion goes away after watching this. Thumb up.
You are very welcome. Thanks for watching !
Thank you so much!! You made complicated concepts so easy to understand!!! Thanks again!
Super welcome and also very glad to hear :D
Another 🔥video! This man has an insane brain
Thanks Shashank! I’m just happy it’s useful 🙂🙂
THANK YOU SO MUCH , YOU ARE A LEGEND
This is very well explained, thank you!
Thank you for watching!
Thank god, I clicked the videoooo
Thanks man people out there really like to make easy things difficult ty og
Awesome stuff! Just to clarify: logistic regression uses the binomial distribution; let's not confuse viewers with link functions and sigmoids.
Aren't sigmoids a whole family of functions that have certain properties?
Very good explanation of MLE. Amazing
Nice introduction! Very clear and helpful, thanks. My only nitpick would be that, when you change to logarithms, maybe "L proportional to P" (i.e. "L = kP") should become "log L = log k + log P" - not a proportionality anymore, but a constant offset. The idea of monotonicity is still maintained.
Yep. Good catch. I think that's technically correct. Guess when making this type of video when teaching on the spot, sometimes details like this slip my mind
Hi sir can you do a video on why we use Basie n inferences and how to use them?
Thanks! Great explanation at the beginning (up to about minute 8 which is how far I have gotten). Aren't your example choices of myu and sigma off by a factor of more than 1000 though? Just want to make sure I am clear about it.
Thanks. It is such a nice explanation of the topic. Everything is explained well
Thanks so much for the compliment! And I am glad you liked it :)
Nice video! New topic...👍 Pl make video ML binary classification of time series forecasting using likelyhood equation. waiting for next video!
Great job man. Thanks so much!
You were talking about sigma and mean , and everything was clear until when you started talking about theta , where did the sigma and mean go ? are we training the model to make predictions on the model parameters or the distribution parameters ?? Thanks tho.
Great explanation sir! Thx a lot!
wonderful, thanks for your clear explaining, pretty good
You are very welcome
Thank you so much that was really helpful
you are honestly #1
You are too kind :)
Thanks a lot!
I think you should include keyword: Maximum Likelihood, Log Likelihood Ratio, to your title to reach more audience.
Yea. I’ll keep this in mind. Thanks for the tip. Maybe I’ll change this title soon
Nice explanation
awesome video. thank you!
I got the benefit and enjoyment thank you
It should be probability Density on y axis. Not not probability since X a continious Random Variable
Yep! Going to make some videos around probability theory soon to clear this up. Good catch!
@@CodeEmporium yes please more probability theory videos is what we need
Simply amazing!
Excellent! Thanks :)
Great video dude
Super explanation. Thanks
Welcome! Thanks for watching:)
iid? i tot it was independent but non identical distribution, the fact that our data may come from different parameter values
Minute 8:11. I wonder if for a better illustration, L(u, sigma), u should be around 200k and above. So that the mean matched to x - axis.
Thanks for the video
Bro I got 4 ads watching this video. I hope this guy is making bank off of these videos
Is it possible to find probability distribution? Looks like in real world we see only likelihood, couse can't obtain general observation (population), does it?
Thx, life saver
Good stuff.
obeservations y1 , y2 ..... yn are joint probability ? i didn't get that part .
With X values in the six figures, how can mu be a double digit number?
Great explanation! Thanks, man. By the way, what Blackboard App are you using in this video?
Thank you! The app is called “explain everything “
Very nice review. Thanks.
You are very welcome!
Should have clarified that housing prices in practice are not independent. Perhaps use a better example.
Another great video
Thank youu
probabaly a stupid question but, P(y1,y2,y3...) is written as P(y1).P(y2).P(y3)...; P(y1,y2,y3...) isn't this a function, but taking the product P(y1).P(y2).P(y3)... gives me a number? and these two are the same thing?
P(y1,y2,y3) is the probability the first random variable (RV) has a value y1 AND the 2nd RV has a value y2 and the 3rd RV is y3. This is a number.
Now, if each of these RVs are independent of each other, then yea you can write it out as a product of P(y1)P(y2)P(y3). This too is a product of 3 numbers which gives us a number. If they aren’t independent RVs, you are going to have to use the Bayes Rule to write it out in a compex equation. Ultimately, the outcome though is still some real number
Great video.
Thanks a ton!
Why do we use pdf with well fitted parameter instead of histogram?
I don't see any math explanation in this other than showing the equations. but good explanation theoretically. sorry to comment this but i would appreciate if i see actual math and its explinations. thanks
Thanks for commenting! This was my first time teaching in this way with a white boarding strategy. I have tried more for future videos (hopefully they have turned out better)
Nice
Thank you!
Waiting
Not much longer now :)
Writing red and green on a black background is very hard to read for colourblind people
Yea. I didn’t think it would look this dark. In future videos , I try to correct this. :)
7.52
Smarter version of Aziz Ansari!
Next Level Explanation , Subscriber+=1 :)
Welcome aboard! Thanks a ton!
At no point in this video did you ever state what likelihood actually is, only what it is proportional to. I recognize you're trying to educate but this is a very poor job, similar to the article you wrote on this subject.
Fake accent nothing else
God damn you explain so much better than my college prof.
Thanks a ton ! Hope you enjoy the rest of these videos :)