You're most welcome, Oscar! Inspired by feedback like yours on this video, I've recently been releasing a comprehensive series of video tutorials on all of the foundational subjects for data science and machine learning, including probability theory. You can read all about it in GitHub (www.github.com/jonkrohn/ML-foundations) or jump straight to the TH-cam playlist (th-cam.com/play/PLRDl2inPrWQW1QSWhBU0ki-jq_uElkh2a.html)
Very good explanation, about how to implement the central limit in python. I'm starting to learn data science and I have seen that some courses have a lack of statistics base, so it drives to a lack of how to implement statistics in python, like hypothesis test, probabilities, chi-square test, and so on.
As part of my 8-subject Machine Learning Foundations series, subject 5 is on Probability and subject 6 is on Statistics. We will cover all of these topics! More detail on the series here: github.com/jonkrohn/ML-foundations (Growing) TH-cam playlist here: th-cam.com/play/PLRDl2inPrWQW1QSWhBU0ki-jq_uElkh2a.html
You are making super awesome videos. You made it so simple to understand using notebook. Thanks for creating such content. Please make videos on NLP. Keep growing.
So glad you're enjoying the videos, Swaviman! Have you already seen this two-hour NLP tutorial I have on TH-cam? th-cam.com/video/rqyw06k91pA/w-d-xo.html I'm focusing on ML Foundations videos right now (linear algebra, calculus, stats, computer science, etc.) but will definitely get to more NLP videos after that.
Great explanation of what is a core foundational concept in ML and stats in general - one of the best I've seen. Easy to follow, and Jon's Canadian accent is eerily soothing!
Awesome video, I just have one doubt, what it the logic behind mentioning _=sns.ditplot()?, we can even use sns.ditplot() directly. is there any specific meaning for prefixing _=?
Yep! Try removing it. It's cosmetic, but if you do not include the "_ =" prefix, then the Jupyter notebook creates an ugly line of output code alongside the desired plot.
thank you, Jon. But, according to the theory there should be with replacement??? you did it without replacement? please answer..does it make any difference with or without replacement?
Ah great question, Kashif! As long as the size of the samples are very small relative to the source distribution, I don't think it makes a difference. Check out the discussion here: www.researchgate.net/post/To-ensure-independence-in-central-limit-theorem-we-need-sample-size-to-be-less-than-10-of-the-population-size-if-sampling-without-replacement-Why#:~:text=Finance%20and%20Economics-,To%20ensure%20independence%20in%20central%20limit%20theorem%2C%20we%20need%20sample,size%20if%20sampling%20without%20replacement.
Thanks, Katie! If you like that tutorial, you might also like my ML Foundations series, which I've begun publishing in the past couple of weeks and will cover over a hundred of the most important subjects for deeply understanding ML, one of which is the Central Limit Theorem: th-cam.com/play/PLRDl2inPrWQW1QSWhBU0ki-jq_uElkh2a.html
You bet, Dmitrii! That's on its way. It'll be covered in the "Statistics" playlist that I hope to begin releasing on TH-cam (and Udemy) in 2023. In the meantime, it's available via oreilly.com and the code is open-source here; search for "confidence interval" in this Jupyter notebook: github.com/jonkrohn/ML-foundations/blob/master/notebooks/6-statistics.ipynb
This is great, thanks! Could you also do an example with a real world dataset? Like using the theorem to solve a real problem instead of with generated numbers.. Thank you!
Yep, you bet, Simen! The Theorem on its own can't be used to solve any real-world problems that I'm aware of, but it is a critical concept underlying countless statistical and machine learning models (which can themselves be used to solve real-world problems). Eventually, I will record my own "Probability and Statistics for Machine Learning" videos at home for release on TH-cam, which will contain many examples with real-world data. In the meantime, the professionally-recorded version of this same content is available via O'Reilly: learning.oreilly.com/videos/probability-and-statistics/9780137566273/
You're most welcome, David! Hopefully you've already noticed that I've been releasing tons of math videos lately, particularly on linear algebra and calculus. More probability and stats videos to follow after that, including several around the central limit theorem!
I can think of these. Can you confirm my understanding 1. It is used to check whether our sample size is sufficient for modeling or we need more data 2. Do we need to remove the outlier to make our data normally distributed?
@@jheel-patel 1: yes absolutely. 2: Not necessarily as you might be able to use a "Box-Cox Transformation" to transform toward normal without outlier removal... however, I do recommend removing outliers -- or at least investigating them to check if anything's fishy with your data collection -- before transforming toward normal. I've been releasing a "Machine Learning Foundations" series of courses in recent months (see github.com/jonkrohn/ML-foundations/). I've published the first course -- Intro to Linear Algebra -- on TH-cam and am now filming the second course. The fifth and sixth course (on probability and statistics, respectively) will address directly address your questions related to the Central Limit Theorem in quite a lot of detail.
@@jheel-patel The Central Limit Theorem underlies all of probability and statistics, as well as much of machine learning. It is perhaps the single most fundamental concepts underlying all of data science. Almost every class of predictive model depends on it. I will explain more in subjects 5 and 6 of my ML Foundations series (github.com/jonkrohn/ML-foundations). Those videos should be released on TH-cam early next year; I'll send updates via my email newsletter, which you can sign up for on jonkrohn.com :)
First of all, sorry to contact you by this channel... I did an inferential statistical analysis on stroke data available in Kaggle, in order to select the variables to a predictive model. I used the chi-square test and independent mean test. It's available on my GitHub github.com/cmapereira/stroke_analysis if someday you could take a look I will appreciate that.
Thank you very much, i am a probability student in France and this help a lot
You're most welcome, Oscar! Inspired by feedback like yours on this video, I've recently been releasing a comprehensive series of video tutorials on all of the foundational subjects for data science and machine learning, including probability theory. You can read all about it in GitHub (www.github.com/jonkrohn/ML-foundations) or jump straight to the TH-cam playlist (th-cam.com/play/PLRDl2inPrWQW1QSWhBU0ki-jq_uElkh2a.html)
Thanks Jon .. This was quick and fast and to the point directly.
You're welcome, Hassan! Glad the pacing was right for you :)
Very good explanation, about how to implement the central limit in python. I'm starting to learn data science and I have seen that some courses have a lack of statistics base, so it drives to a lack of how to implement statistics in python, like hypothesis test, probabilities, chi-square test, and so on.
As part of my 8-subject Machine Learning Foundations series, subject 5 is on Probability and subject 6 is on Statistics. We will cover all of these topics!
More detail on the series here: github.com/jonkrohn/ML-foundations
(Growing) TH-cam playlist here: th-cam.com/play/PLRDl2inPrWQW1QSWhBU0ki-jq_uElkh2a.html
You are making super awesome videos. You made it so simple to understand using notebook. Thanks for creating such content. Please make videos on NLP. Keep growing.
So glad you're enjoying the videos, Swaviman!
Have you already seen this two-hour NLP tutorial I have on TH-cam? th-cam.com/video/rqyw06k91pA/w-d-xo.html
I'm focusing on ML Foundations videos right now (linear algebra, calculus, stats, computer science, etc.) but will definitely get to more NLP videos after that.
Great explanation of what is a core foundational concept in ML and stats in general - one of the best I've seen. Easy to follow, and Jon's Canadian accent is eerily soothing!
Haha thank you and thank you, Amr!
Awesome video, I just have one doubt, what it the logic behind mentioning _=sns.ditplot()?, we can even use sns.ditplot() directly. is there any specific meaning for prefixing _=?
Yep! Try removing it. It's cosmetic, but if you do not include the "_ =" prefix, then the Jupyter notebook creates an ugly line of output code alongside the desired plot.
thank you, Jon. But, according to the theory there should be with replacement??? you did it without replacement? please answer..does it make any difference with or without replacement?
Ah great question, Kashif! As long as the size of the samples are very small relative to the source distribution, I don't think it makes a difference. Check out the discussion here: www.researchgate.net/post/To-ensure-independence-in-central-limit-theorem-we-need-sample-size-to-be-less-than-10-of-the-population-size-if-sampling-without-replacement-Why#:~:text=Finance%20and%20Economics-,To%20ensure%20independence%20in%20central%20limit%20theorem%2C%20we%20need%20sample,size%20if%20sampling%20without%20replacement.
Really well explained. Thanks
You're most welcome, Kisholoy :)
Very well explained and demonstrated with clear python code.
Thanks, Katie! If you like that tutorial, you might also like my ML Foundations series, which I've begun publishing in the past couple of weeks and will cover over a hundred of the most important subjects for deeply understanding ML, one of which is the Central Limit Theorem: th-cam.com/play/PLRDl2inPrWQW1QSWhBU0ki-jq_uElkh2a.html
Great explanation about central limit theorem and thank you for providing the code as well.
Thank you, Zeeshan! You're most welcome :)
Thank you, just what i needed
Yay, you're welcome :D
Confidence interval with examples in Python! plz
You bet, Dmitrii! That's on its way. It'll be covered in the "Statistics" playlist that I hope to begin releasing on TH-cam (and Udemy) in 2023.
In the meantime, it's available via oreilly.com and the code is open-source here; search for "confidence interval" in this Jupyter notebook: github.com/jonkrohn/ML-foundations/blob/master/notebooks/6-statistics.ipynb
This is great, thanks! Could you also do an example with a real world dataset? Like using the theorem to solve a real problem instead of with generated numbers.. Thank you!
Yep, you bet, Simen! The Theorem on its own can't be used to solve any real-world problems that I'm aware of, but it is a critical concept underlying countless statistical and machine learning models (which can themselves be used to solve real-world problems). Eventually, I will record my own "Probability and Statistics for Machine Learning" videos at home for release on TH-cam, which will contain many examples with real-world data. In the meantime, the professionally-recorded version of this same content is available via O'Reilly: learning.oreilly.com/videos/probability-and-statistics/9780137566273/
Thank you so so much for this vid
You're most welcome, David! Hopefully you've already noticed that I've been releasing tons of math videos lately, particularly on linear algebra and calculus. More probability and stats videos to follow after that, including several around the central limit theorem!
How do you use it in machine learning? Is it for outlier detection?
I can think of these. Can you confirm my understanding
1. It is used to check whether our sample size is sufficient for modeling
or we need more data
2. Do we need to remove the outlier to make our data normally distributed?
@@jheel-patel 1: yes absolutely.
2: Not necessarily as you might be able to use a "Box-Cox Transformation" to transform toward normal without outlier removal... however, I do recommend removing outliers -- or at least investigating them to check if anything's fishy with your data collection -- before transforming toward normal.
I've been releasing a "Machine Learning Foundations" series of courses in recent months (see github.com/jonkrohn/ML-foundations/). I've published the first course -- Intro to Linear Algebra -- on TH-cam and am now filming the second course. The fifth and sixth course (on probability and statistics, respectively) will address directly address your questions related to the Central Limit Theorem in quite a lot of detail.
What are some other uses of it in data science?
@@JonKrohnLearns Thanks for quick response. appreciate 🙌
@@jheel-patel The Central Limit Theorem underlies all of probability and statistics, as well as much of machine learning. It is perhaps the single most fundamental concepts underlying all of data science. Almost every class of predictive model depends on it. I will explain more in subjects 5 and 6 of my ML Foundations series (github.com/jonkrohn/ML-foundations). Those videos should be released on TH-cam early next year; I'll send updates via my email newsletter, which you can sign up for on jonkrohn.com :)
First of all, sorry to contact you by this channel... I did an inferential statistical analysis on stroke data available in Kaggle, in order to select the variables to a predictive model. I used the chi-square test and independent mean test. It's available on my GitHub github.com/cmapereira/stroke_analysis if someday you could take a look I will appreciate that.