Gaussian Naive Bayes, Clearly Explained!!!

แชร์
ฝัง
  • เผยแพร่เมื่อ 16 มิ.ย. 2024
  • Gaussian Naive Bayes takes are of all your Naive Bayes needs when your training data are continuous. If that sounds fancy, don't sweat it! This StatQuest will clear up all your doubts in a jiffy!
    NOTE: This StatQuest assumes that you are already familiar with...
    Multinomial Naive Bayes: • Naive Bayes, Clearly E...
    The Log Function: • Logs (logarithms), Cle...
    The Normal Distribution: • The Normal Distributio...
    The difference between Probability and Likelihood: • Probability is not Lik...
    Cross Validation: • Machine Learning Funda...
    For a complete index of all the StatQuest videos, check out:
    statquest.org/video-index/
    If you'd like to support StatQuest, please consider...
    Buying my book, The StatQuest Illustrated Guide to Machine Learning:
    PDF - statquest.gumroad.com/l/wvtmc
    Paperback - www.amazon.com/dp/B09ZCKR4H6
    Kindle eBook - www.amazon.com/dp/B09ZG79HXC
    Patreon: / statquest
    ...or...
    TH-cam Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshirt.com/statques...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer.bandcamp.com/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    0:00 Awesome song and introduction
    1:00 Creating Gaussian distributions from Training Data
    2:34 Classification example
    4:46 Underflow and Log() function
    7:27 Some variables have more say than others
    Corrections:
    3:42 I said 10 grams of popcorn, but I should have said 20 grams of popcorn given that they love Troll 2.
    #statquest #naivebayes

ความคิดเห็น • 462

  • @statquest
    @statquest  4 ปีที่แล้ว +19

    NOTE: This StatQuest is sponsored by JADBIO. Just Add Data, and their automatic machine learning algorithms will do all of the work for you. For more details, see: bit.ly/3bxtheb BAM!
    Corrections:
    3:42 I said 10 grams of popcorn, but I should have said 20 grams of popcorn given that they love Troll 2.
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @phildegreat
      @phildegreat 3 ปีที่แล้ว

      website not working?

    • @statquest
      @statquest  3 ปีที่แล้ว

      @@phildegreat Thanks! The site is back up.

    • @anirbanpatra3017
      @anirbanpatra3017 ปีที่แล้ว

      8:15 There's a minor error in the slide 'help use decide' .
      You really are a great teacher.Wish I could Meet you in person some day.

  • @rohan2609
    @rohan2609 3 ปีที่แล้ว +124

    4 weeks back I had no idea what is machine learning, but your videos have really made a difference in my life, they are all so clearly explained and fun to watch, I just got a job and I mentioned some of the learnings I had from your channel, I am grateful for your contribution in my life.

    • @statquest
      @statquest  3 ปีที่แล้ว +14

      Happy to help!

    • @lowerbound4803
      @lowerbound4803 2 ปีที่แล้ว +2

      Congratulations!!

    • @rimurusama9695
      @rimurusama9695 ปีที่แล้ว

      That is a HUGE help my friend, congrats.. !!

  • @mildlyinteresting1925
    @mildlyinteresting1925 4 ปีที่แล้ว +69

    Following your channel for over 6 months now sir, your explanations are truly amazing..

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      Thank you very much! :)

  • @raa__va4814
    @raa__va4814 ปีที่แล้ว +23

    Im at the point where my syllabus does not require me to look into all of this but im just having too much fun learning with you. Im glad i took this course up to find your videos

  • @tassoskat8623
    @tassoskat8623 4 ปีที่แล้ว +57

    This is by far my favorite educational TH-cam channel.
    Everything is explained in a simple, practical and fun way.
    The videos are full of positive vibes just from the beginning with the silly song entry. I love the catch phrases.
    Statquest is addictive!

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      Thank you very much! :)

  • @minweideng4595
    @minweideng4595 3 ปีที่แล้ว +8

    Thank you Josh. You deserve all the praises. I have been struggling with a lot of the concepts on traditional classic text books as they tend to "jump" quite a lot. You channel brings all of them to life vividly. This is my go to reference source now.

    • @statquest
      @statquest  3 ปีที่แล้ว +2

      Awesome! I'm glad my videos are helpful.

  • @amirrezamousavi5139
    @amirrezamousavi5139 2 ปีที่แล้ว +2

    My little knowledge about machine learning could not be derived without your tutorials. Thank you very much

    • @statquest
      @statquest  2 ปีที่แล้ว

      Glad I could help!

  • @mohit10singh
    @mohit10singh 3 ปีที่แล้ว +3

    I am a beginner in Machine Learning field, and your channel helped me alot, almost went through all the videos, very nice way of explaining. Really appreciate you for making these videos and helping everyone. You just saved me ... Thank you very much...

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you very much! :)

  • @samuelbmartins
    @samuelbmartins 2 ปีที่แล้ว +3

    Hi, Josh.
    Thank you so much for all the exceptional content from your channel.
    Your work is amazing.
    I'm a professor in Brazil of Computer Science and ML and your videos have been supporting me a lot.
    You're an inspiration for me.
    Best.

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      Muito obrigado!

  • @leowei2575
    @leowei2575 7 หลายเดือนก่อน +2

    WOOOOOOW. I watched every video of yours, recommended in the description of this video, and now this video. Everything makes much more sense now. It helped me a lot to undersand the Gaussian Naive Bayes algorithm implemented and available from scikit-learn for applications in machine learning. Just awesome. Thank you!!!

    • @statquest
      @statquest  7 หลายเดือนก่อน +1

      Wow, thanks!

  • @sakhawath19
    @sakhawath19 3 ปีที่แล้ว +7

    If I remember all the best educator's name on TH-cam, you always come at the beginning! You are a flawless genius!

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! 😃

  • @yuxinzhang4228
    @yuxinzhang4228 3 ปีที่แล้ว +5

    It's amazing! Thank you so much !
    Our professor let us self-teach the Gaussian naive bayes and I absolutely don't understand her slides with many many math equations. Thanks again for your vivid videos !!

    • @statquest
      @statquest  3 ปีที่แล้ว

      Glad it was helpful!

  • @georgeruellan
    @georgeruellan 3 ปีที่แล้ว +3

    This series is helping me so much with my dissertation, thank you!!

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      Awesome and good luck with your disertation!

  • @tianhuicao3297
    @tianhuicao3297 3 ปีที่แล้ว +9

    These videos are amazing !!! Truly a survival pack for my DS class👍

  • @TheVijaySaravana
    @TheVijaySaravana 2 ปีที่แล้ว +2

    I have watched over 2-3 hours of lecture about Gaussian Naive Bayes. Now is when I feel my understanding is complete.

  • @sudhashankar1040
    @sudhashankar1040 3 ปีที่แล้ว +1

    This video on Gaussian Naive Bayes has been very well explained. Thanks a lot.😊

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      Most welcome 😊

  • @pinesasyg9894
    @pinesasyg9894 2 ปีที่แล้ว +2

    amazing kowledge with incredible communication skills..world will change if every student has such great teacher

  • @Godofwarares1
    @Godofwarares1 ปีที่แล้ว +10

    This is crazy I went to school for Applied Mathematics and it never crossed my mind that what I learned was machine learning as chatgpt came into the lime light I started looking into it and almost everything I've learned so far is basically everything I've learned before but in a different context. My mind is just blown that I was assuming ML was something unattainable for me and it turns out I've been doing it for years

  • @jiheonlee4065
    @jiheonlee4065 4 ปีที่แล้ว +2

    Thank you for another excellent Statquest !~

  • @Adam_0464
    @Adam_0464 3 ปีที่แล้ว +1

    Thank you, You have made the theory concrete and visible!

  • @sairamsubramaniam8316
    @sairamsubramaniam8316 3 ปีที่แล้ว +1

    Sir, this playlist is a one-stop solution for quick interview preparations. Thanks a lot sir.

    • @statquest
      @statquest  3 ปีที่แล้ว

      Good luck with your interviews! :)

  • @qbaliu6462
    @qbaliu6462 หลายเดือนก่อน +1

    This channel has helped me so much during my studies 🎉

    • @statquest
      @statquest  หลายเดือนก่อน

      Happy to hear that!

  • @anje889
    @anje889 ปีที่แล้ว +1

    contents are excellent and also i love your intro quite a lot (its super impressive for me) btw. thanking for doing this at the fisrt place as a beginner some concepts are literally hard to understand but after watching your videos things are a lot better than before. Thanks :)

    • @statquest
      @statquest  ปีที่แล้ว

      I'm glad my videos are helpful! :)

  • @argonaise_jay
    @argonaise_jay 2 ปีที่แล้ว +1

    One of the best channel for learners that the world can offer..

  • @tcidude
    @tcidude 3 ปีที่แล้ว

    Josh. I love you your videos. I've been following your channel for a while. Your videos are absolutely great!
    Would you consider covering more of Bayesian statistics in the future?

    • @statquest
      @statquest  3 ปีที่แล้ว

      I'll keep it in mind.

  • @WorthyVII
    @WorthyVII ปีที่แล้ว +2

    Literally the best video ever on this.

  • @chonky_ollie
    @chonky_ollie ปีที่แล้ว +1

    Your videos are more helpful than my Machine Learning lectures were. Man, you are Gigachad of Machine Learning

  • @MrRynRules
    @MrRynRules 2 ปีที่แล้ว +1

    Daym, your videos are so good at explaining complicated ideas!! Like holy shoot, I am going to use this, multiple predictors ideas to figure out the ending of inception, Was it dream, or was it not a dream!

  • @akashchakraborty6431
    @akashchakraborty6431 3 ปีที่แล้ว +1

    You have really helped me a lot. Thanks Sir. May you prosper more and keep helping students who cant afford paid content :)

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! :)

  • @ahhhwhysocute
    @ahhhwhysocute 3 ปีที่แล้ว +1

    Thanks for the video !! it was very helpful and easy to understand

    • @statquest
      @statquest  3 ปีที่แล้ว

      Glad it was helpful!

  • @liranzaidman1610
    @liranzaidman1610 3 ปีที่แล้ว +5

    How do people come up with these crazy ideas? it's amazing, thanks a lot for another fantastic video

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you again!

  • @chenzhiyao834
    @chenzhiyao834 2 ปีที่แล้ว +1

    you explained much clearer than my lecturer in ML lecture.

  • @konmemes329
    @konmemes329 2 ปีที่แล้ว +1

    Your video just helped me a lot !

    • @statquest
      @statquest  2 ปีที่แล้ว

      Glad it helped!

  • @rogertea1857
    @rogertea1857 3 ปีที่แล้ว +1

    Another great tutorial, thank you!

  • @heteromodal
    @heteromodal 3 ปีที่แล้ว +1

    Thank you Josh for another great video! Also, this (and other vids) makes think I should watch Troll 2, just to tick that box.

    • @statquest
      @statquest  3 ปีที่แล้ว

      Ha! Let me know what you think!

  • @haofu1673
    @haofu1673 3 ปีที่แล้ว +2

    Great video! If people are willing to spend time on videos like this rather than Tiktok, the wold would be a much better place.

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you very much! :)

  • @Vivaswaan.
    @Vivaswaan. 4 ปีที่แล้ว +1

    The demarcation of topics in the seek bar is useful and helpful. Nice addition.

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Glad you liked it. It's a new feature that TH-cam just rolled out so I've spent the past day (and will spend the next few days) adding it to my videos.

    • @anitapallenberg690
      @anitapallenberg690 4 ปีที่แล้ว +2

      @@statquest We really appreciate all your dedication into the channel!
      It's 100% awesomeness :)

    • @statquest
      @statquest  4 ปีที่แล้ว

      @@anitapallenberg690 Hooray! Thank you! :)

  • @ADESHKUMAR-yz2el
    @ADESHKUMAR-yz2el 3 ปีที่แล้ว +1

    i promise i will join the membership and buy your products when i get a job... BAM!!!

    • @statquest
      @statquest  3 ปีที่แล้ว

      Hooray! Thank you very much for your support!

  • @AmanKumar-oq8sm
    @AmanKumar-oq8sm 3 ปีที่แล้ว

    Hey Josh, Thank you for making these amazing videos. Please make a video on the "Bayesian Networks" too.

    • @statquest
      @statquest  3 ปีที่แล้ว

      I'll keep it in mind.

  • @meysamamini9473
    @meysamamini9473 3 ปีที่แล้ว +1

    I'm Having great time watching Ur videos ❤️

  • @hli2147
    @hli2147 3 ปีที่แล้ว +2

    This is the only lecture that makes me feel not stupid...

  • @camilamiraglia8077
    @camilamiraglia8077 3 ปีที่แล้ว

    Thanks for the great video!
    I would just like to point out that in my opinion if you are talking about log() when the base is e, it is easier (and more correct) to write ln().

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      In statistics, programming and machine learning, "ln()" is written "log()", so I'm just following the conventions used in the field.

  • @MinhPham-jq9wu
    @MinhPham-jq9wu 2 ปีที่แล้ว +1

    So great, this video so helpful

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      Glad it was helpful!

  • @tagoreji2143
    @tagoreji2143 ปีที่แล้ว +1

    Tqsm Sir for the Very Valuable Information

  • @nzsvus
    @nzsvus 4 ปีที่แล้ว

    BAM! thanks, Josh! It would be amazing if you can make a StatQuest concerning A/B testing :)

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      It's on the to-do list. :)

  • @Mustafa-099
    @Mustafa-099 2 ปีที่แล้ว +1

    Hey Josh I hope you are having a wonderful day, I was searching for a video on " Gaussian mixture model " on your channel but couldn't find one, I have a request for that video since the concept is a bit complicated elsewhere
    Also btw your videos enabled to get one of the highest scores in the test conducted recently in my college, all thanks to you Josh, you are awesome

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      Thanks! I'll keep that topic in mind.

  • @RFS_1
    @RFS_1 3 ปีที่แล้ว +1

    Love the explaination BAM!

  • @auzaluis
    @auzaluis 4 ปีที่แล้ว +1

    The world needs more Joshuas!

  • @samuelschonenberger
    @samuelschonenberger ปีที่แล้ว +2

    These gloriously wierd examples really are needed to understand a concept

  • @diraczhu9347
    @diraczhu9347 2 ปีที่แล้ว +1

    Great video!

  • @patrycjakasperska7272
    @patrycjakasperska7272 7 หลายเดือนก่อน +1

    Love your channel

    • @statquest
      @statquest  7 หลายเดือนก่อน

      Thanks!

  • @vinaykumardaivajna5260
    @vinaykumardaivajna5260 ปีที่แล้ว +1

    Awesome as always

    • @statquest
      @statquest  ปีที่แล้ว

      Thanks again! :)

  • @rrrprogram8667
    @rrrprogram8667 4 ปีที่แล้ว +1

    Thanks for the awesome video..

  • @prashuk-ducs
    @prashuk-ducs 15 วันที่ผ่านมา

    Why the fuck does this video make it look so easy and makes 100 percent sense?

  • @sayanbhowmick9203
    @sayanbhowmick9203 3 หลายเดือนก่อน +1

    Great style of teaching & also thank you so much for such a great video (Note : I have bought your book "The StatQuest illustrated guide to machine learning") 😃

    • @statquest
      @statquest  3 หลายเดือนก่อน +1

      Thank you so much for supporting StatQuest!

  • @ahmedshifa
    @ahmedshifa 3 หลายเดือนก่อน

    These videos are extremely valuable, thank you for sharing them. I feel that they really help to illuminate the material.
    Quick question though: where do you get the different probabilities, like for popcorn, soda pop, and candy? How do we calculate those in this context? Do you use the soda a person drinks and divide it by the total soda, and same with popcorn, and candy?

    • @statquest
      @statquest  3 หลายเดือนก่อน

      What time point are you asking about (in minutes and seconds). The only probabilities we use in this video are if someone loves or doesn't love troll 2. Everything else is a likelihood, which is just a y-axis coordinate.

  • @worksmarter6418
    @worksmarter6418 3 ปีที่แล้ว +1

    Super awesome, thank you. Useful for my Intro to Artificial Intelligence course.

    • @statquest
      @statquest  3 ปีที่แล้ว

      Glad it was helpful!

  • @yuniprastika7022
    @yuniprastika7022 3 ปีที่แล้ว +1

    can't wait for your channel to BAAM! going worldwide!!

  • @justinneddie9437
    @justinneddie9437 2 ปีที่แล้ว +1

    well the little intro made me cry laugh. I don't know why... awesome

  • @mukulsaluja6109
    @mukulsaluja6109 3 ปีที่แล้ว +1

    Best video i have ever seen

  • @WillChannelUS
    @WillChannelUS 4 ปีที่แล้ว +2

    This channel should have 2.74M subscribers instead of 274K.

    • @statquest
      @statquest  4 ปีที่แล้ว

      One day I hope that happens! :)

  • @therealbatman664
    @therealbatman664 ปีที่แล้ว +1

    Your videos are really great !! my prof made it way harder!!

  • @har_marachi
    @har_marachi 2 ปีที่แล้ว +1

    😅😅😅😅It's the "Shameless Self Promotion" for me... Thank you very much for this channel. Your videos are gold. The way you just know how to explain these hard concepts in a way that 5-year-olds can understand... To think that I just discovered this goldmine this week.
    God bless you😇

    • @statquest
      @statquest  2 ปีที่แล้ว

      Thank you very much! :)

  • @ArinzeDavid
    @ArinzeDavid 2 ปีที่แล้ว +1

    awesome stuff for real

  • @haneulkim4902
    @haneulkim4902 3 ปีที่แล้ว

    Amazing video! Thank you so much!
    One question, What if distribution of candy or other feature does not follow normal distribution?

    • @statquest
      @statquest  3 ปีที่แล้ว

      Just use whatever distribution is appropriate, and you can mix and match distributions for different variables.

  • @Theviswanath57
    @Theviswanath57 3 ปีที่แล้ว +3

    In Stats Playlist, we used following notation for P( Data | Model ) for probability & L(Model | Data) for likelihood;
    Here we are writing likelihood as L(popcorn=20 | Loves) which I guess L( Data | Model );

    • @statquest
      @statquest  3 ปีที่แล้ว +2

      Unfortunately the notation is somewhat flexible and inconsistent - not just in my videos, but in the the field in general. The important thing is to know that likelihoods are always the y-axis values, and probabilities are the areas.

    • @Theviswanath57
      @Theviswanath57 3 ปีที่แล้ว +1

      @@statquest understood; somewhere in the playlist you mentioned that likelihood is relative probability; and I guess this neatly summaries how likelihood and probability

    • @radicalpotato666
      @radicalpotato666 ปีที่แล้ว

      I just had the exact same question when I started writing the expression in my notebook. I am more acquainted with the L(Model | Data) notation.

  • @jacobwinters6648
    @jacobwinters6648 3 วันที่ผ่านมา

    Hello! Does it matter if the data in one of the columns (say popcorn) is not normally distributed? Or should the assumption be that we will have a large enough sample size to use the central limit theorem?
    Thanks for all of your videos! I love them and can’t wait for your book to be delivered (just ordered it yesterday).

    • @statquest
      @statquest  3 วันที่ผ่านมา +1

      It doesn't matter how the data are distributed. As long as we can calculate the likelihoods, we are good to go. BAM! :) And thank you so much for supporting StatQuest!!! TRIPLE BAM!!! :)

  • @peteerya7839
    @peteerya7839 2 ปีที่แล้ว

    Hi
    Your video is amazing!!! I have a quick question. When you said to use the cross validation to help use decide which thins, pop corn, soda pop and candy, I think the training data part can "only" help decide the prior probability, then use the testing data to do the confusion test comparisons, all the above are conditioned in each scenario, right? For example, we will have three confusion matrix from pop corn, soda pop and candy based on the test data. What do you think?

    • @statquest
      @statquest  2 ปีที่แล้ว

      That sounds about right.

  • @MrElliptific
    @MrElliptific 4 ปีที่แล้ว

    Thanks for this super clear explanation. Why would we prefer this method for classification over a gradient boosting algorithm? When we have too few samples?

    • @statquest
      @statquest  4 ปีที่แล้ว

      With relatively small datasets it's simple and fast and super lightweight.

  • @YesEnjoy55
    @YesEnjoy55 8 หลายเดือนก่อน +1

    Great so much Thanks!

    • @statquest
      @statquest  8 หลายเดือนก่อน

      You're welcome!

  • @roymillsdixton7941
    @roymillsdixton7941 ปีที่แล้ว

    A nice video on Gaussian Naive Bayes Classification model. Well done! But I have a quick question for you, Josh. I only understand that Lim ln(x) as x approaches o is negative infinity. How is the Natural log of a really small unknown number very close to zero assumed to be equal to -115 and -33.6 as in the case of L(candy=25|Love Troll 2) and L(popcorn=20|does not Love Troll 2) respectively? What measure was used to determine these values?

    • @statquest
      @statquest  ปีที่แล้ว

      log(1.1*10^-50) = -115 and log(2.5*10^-15) = -33.6

  • @sheebanwasi2925
    @sheebanwasi2925 3 ปีที่แล้ว +1

    Hey JOSH Thanks for making such amazing video. Keep up the work. I just have a quick question if you don't mind.
    I can't understand how you got the likelihood eg: L(soda = 500 | LOVES) how you calculating that value.

    • @statquest
      @statquest  3 ปีที่แล้ว

      We plugged the mean and standard deviation of soda pot for people that loved Troll2 into the equation for a normal curve and then determined the y-axis coordinate when the x-axis value = 500.

  • @jacquelineb.2468
    @jacquelineb.2468 3 ปีที่แล้ว

    thanks for creating this helpful video! is your sample data available somewhere? would love to calculate things by hand for practice!

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thanks! Unfortunately, the raw data is not available :(

  • @deepshikhaagarwal4125
    @deepshikhaagarwal4125 ปีที่แล้ว +1

    Thank you josh your videos are amazing! HoW to buy study guides from statquest

    • @statquest
      @statquest  ปีที่แล้ว

      See: statquest.gumroad.com/

  • @mohammadelghandour1614
    @mohammadelghandour1614 2 ปีที่แล้ว

    Great work ! In 8:11 How can we use cross validation with Gaussian Naive Bayes? I have watched the Cross validation video but I still can't figure out how to employ cross validation to know that candy can make the best classification.

    • @statquest
      @statquest  2 ปีที่แล้ว

      to apply cross validation, we divide the training data into different groups - then we use all of the groups, minus 1, to create a gaussian naive bayes model. Then we use that model to make predictions based on the last group. Then we repeat, each time using a different group to test the model.

  • @taetaereporter
    @taetaereporter ปีที่แล้ว

    thank you for ur service T.T

  • @jonathanjacob5453
    @jonathanjacob5453 5 หลายเดือนก่อน +1

    Looks like I have to check out the quests before getting to this one😂

    • @statquest
      @statquest  5 หลายเดือนก่อน

      :)

  • @taotaotan5671
    @taotaotan5671 4 ปีที่แล้ว

    Hi, Josh. Thanks for this clear explanation. Since this Naive Bayes could be applied to Gaussian distribution, I guess it could also be applied to other distributions like Poisson distribution, right? Then a question is: how to determine the distribution of a feature? I believe this will be quite important to build a reasonable model.
    Thanks again for the nice video.

    • @statquest
      @statquest  4 ปีที่แล้ว

      One day (hopefully not too long from now), I'm going to cover the different distributions, and that should help people decide which distributions to use with their data.

  • @arjunbehl9771
    @arjunbehl9771 3 ปีที่แล้ว +1

    Great stuff : )

  • @jianshue9240
    @jianshue9240 3 ปีที่แล้ว

    Thanks dude

  • @mahadmohamed2748
    @mahadmohamed2748 2 ปีที่แล้ว

    Thanks for these great videos! Quick question: In other resources the likelihood is actually the probability of the data given the hypothesis rather than the likelihood of the data given the hypothesis. Which one would be correct, or is it fine to use either?

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      Generally speaking, you use the likelihoods of the data. However, we can normalize them to be probabilities. This does not offer an advantages and takes longer to do, so people usually omit that step and just use the likelihoods.

  • @kicauburungmania2430
    @kicauburungmania2430 3 ปีที่แล้ว

    Thanks for the awesome explanation. But I've a question. Is GNB can be used for sentiment analysis?

    • @statquest
      @statquest  3 ปีที่แล้ว

      Presumably you could use GNB, but I also know that normal NB (aka multinomial naive bayes) is used for sentiment analysis.

  • @shailukonda
    @shailukonda 4 ปีที่แล้ว +1

    Could you please make a video on Time Series Analysis (Arima model)?

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      One day I'll do that.

  • @linianhe
    @linianhe ปีที่แล้ว +1

    dude you are awesome

  • @kartikmalladi1918
    @kartikmalladi1918 ปีที่แล้ว

    I've seen the cross validation video and the main thing that it does is consider diff training set and test model in a data set. In this video, are you trying to say cross validation helps for the accurate prediction and percentage contribution/coefficients give the decisive main important factor as candy? Thanks

    • @statquest
      @statquest  ปีที่แล้ว

      Cross validation can be used for all sorts of comparisons.

  • @shichengguo8064
    @shichengguo8064 3 ปีที่แล้ว

    Looks like it also works when both multinomial and gaussian predictor existing in the prediction dataset.

    • @statquest
      @statquest  3 ปีที่แล้ว

      Yes, you are correct. And thanks for supporting StatQuest!

  • @janeli2487
    @janeli2487 2 ปีที่แล้ว

    Hi @StatQuest, thanks for another awesome video. I have a question about posterior probability. Isn't p(class)*p(observation data|class) just the joint probability, and if we want to get p( class|observed data) we need to divided the joint probability with p(observed data)? Thanks in advance!

    • @statquest
      @statquest  2 ปีที่แล้ว

      Yes, but since all dividing by p(observed data) does is scale the values and all we want to know is which one is larger, we rarely do it for Naive Bayes.

    • @janeli2487
      @janeli2487 2 ปีที่แล้ว +1

      @@statquest Got it! Thanks!

  • @itsfloofles
    @itsfloofles 3 ปีที่แล้ว

    Hey Josh! Thanks so much for this informal video. I'm currently working on something that needs exactly what you describe here. However there is one thing I cannot find an answer to however hard I try. I sadly can't translate my issue to the example you give in the video but I'll try to make it as basic as possible.
    Let's say that we have a couple of boxes that contain a random bunch of items. Some boxes contain similar items to others and beforehand I make some observations on this. I then create some PDFs of for example P(coin | box1).
    Let's say that I observed coins in a couple of the boxes but some boxes never had a coin in them.
    If we have a total of 4 boxes our prior would be 0.25 for each box.
    I observe a coin in the box that I have in front of me and calculate the likelihoods for each box.
    What do I do if I do not have a PDF for a box then? Because in that case the final likelihood will be off right?
    For example:
    Box 1: log(0.25) + log(L(coin=1|box1))
    Box 2: log(0.25) + log(L(coin=1|box2))
    Box 3: log(0.25) + log(there is no likelihood!)
    Box 4: log(0.25) + log(there is no likelihood!)
    The log of 0 doesn't exist, but if I simply disregard that part of the sum than the loglikelihood for Box 3 & 4 would be the highest (while intuitively this doesn't make sense!). What do I do? If the probability of a coin in box 3 & 4 is 0 then what is the likelihood?

    • @statquest
      @statquest  3 ปีที่แล้ว

      When there is no value, you can always just use a very small number. For example, in "regular" or "multinomial" naive bayes, we add "pseudo counts" when something is missing. This shows what I am talking about: th-cam.com/video/O2L2Uv9pdDA/w-d-xo.html

  • @dipinpaul5894
    @dipinpaul5894 4 ปีที่แล้ว +1

    Excellent explanation. Any NLP series coming up ? Struggling to find good resources.

    • @statquest
      @statquest  4 ปีที่แล้ว +4

      I'm working on Neural Networks right now.

    • @ragulshan6490
      @ragulshan6490 4 ปีที่แล้ว +1

      @@statquest it's going to be BAM!!

  • @rogertea1857
    @rogertea1857 3 ปีที่แล้ว

    Hi Josh, could you plz make a video on Gaussian Mixture Model and Bayesian Gaussian Mixture Model?

    • @statquest
      @statquest  3 ปีที่แล้ว

      I'll keep that in mind.

  • @sejongchun8350
    @sejongchun8350 4 ปีที่แล้ว +2

    Troll 2 is an awesome classic, and should not be up for debate. =)

  • @user-bz8nm6eb6g
    @user-bz8nm6eb6g 4 ปีที่แล้ว +3

    Can you talk about Kernel estimation in the future?? Bam!

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      I will consider it.

  • @rajatshrivastav
    @rajatshrivastav 3 ปีที่แล้ว

    Hey Josh!!
    This channel is the best for all the algorithms in Machine Learning, Now how does Naive Bayes works when it has a column in dataset which says the brand preference of Soda(Coke,Pepsi,Thumbs Up) for those who love Troll 2 or not.In general how to deal with categorical variables?
    Thanks in advance!

    • @statquest
      @statquest  3 ปีที่แล้ว

      In this case, you can mix and match distributions. Specifically, use the histogram approach for "Soda" that is described in the other Naive Bayes video ( th-cam.com/video/O2L2Uv9pdDA/w-d-xo.html ) and add it as just another distribution to this.

    • @rajatshrivastav
      @rajatshrivastav 3 ปีที่แล้ว

      @@statquest Ya the histogram method with frequency counts for the Soda preference would work very good as have limited number of discrete levels(3)
      Thank you

  • @riqardumilos4212
    @riqardumilos4212 10 หลายเดือนก่อน

    Hii again, I successfully made a Gaussian Naive Bayes model using data from a survey I made thanks to you

    • @statquest
      @statquest  10 หลายเดือนก่อน +1

      Those seem like a good place to start.

  • @sugarbeybiii101
    @sugarbeybiii101 2 ปีที่แล้ว

    Hi Josh, as always thanks so much for the very informative video!!! Quick question, how did you calculate for the likelihoods? :D

    • @statquest
      @statquest  2 ปีที่แล้ว

      I plugged the x-axis coordinate, the mean and the standard deviation into the dnorm() function in R.

    • @atravellingstudent6692
      @atravellingstudent6692 2 ปีที่แล้ว

      @@statquest I am confused, I thought pdf != likelihood, but videos suggest otherwise.

    • @statquest
      @statquest  2 ปีที่แล้ว

      @@atravellingstudent6692 PDF is just the curve for a continuous distribution if we integrate that curve between two points, we get the probability of something happening between those two points. If we look at the y-axis value for a specific point, we get the likelihood. For details, see: th-cam.com/video/pYxNSUDSFH4/w-d-xo.html

  • @Steve-3P0
    @Steve-3P0 3 ปีที่แล้ว +1

    +5000 for using an example as obscure and as obscene as Troll 2.

  • @jackyhuang6034
    @jackyhuang6034 4 ปีที่แล้ว

    Josh, can you summarize important algorithms for both supervised and unsupervised algorithms for interview preparations? Can we buy in bundles at a cheaper price (affordable for students)? Like $80 - $120. Thanks

    • @statquest
      @statquest  4 ปีที่แล้ว

      I'll look into that.

  • @johnel4005
    @johnel4005 3 ปีที่แล้ว +1

    BAM! Someone is going to pass the exam this semester .

  • @alanamerkhanov6040
    @alanamerkhanov6040 6 หลายเดือนก่อน +1

    Hi, Josh. Troll 2 is a good movie... Thanks

    • @statquest
      @statquest  6 หลายเดือนก่อน

      bam!

  • @ajeelahmedabbasi
    @ajeelahmedabbasi 3 ปีที่แล้ว

    Hey Josh big fan, I'm having trouble understanding how you calculated the likelihoods such as L(popcorn=20|Loves) and the other ones like that, besides that though great job, I owe you so much thanks a lot.

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      I'm just calculating the y-axis coordinates on the normal distributions. For example, if I have a normal distribution with mean = 3 and standard deviation = 1, the y-axis coordinate that corresponds to an x-axis value = 2 can be found (in the R programming language) with the following command: dnorm(x=2, mean=1, sd=2)

    • @ajeelahmedabbasi
      @ajeelahmedabbasi 3 ปีที่แล้ว +1

      @@statquest wow thank you so much for replying! I honestly didn't think you would since it's been a while since you made this video haha
      Ahhh okay I kinda get it now, thank you so much once again!