Logistic Regression in R, Clearly Explained!!!!

แชร์
ฝัง
  • เผยแพร่เมื่อ 31 ก.ค. 2024
  • This video describes how to do Logistic Regression in R, step-by-step. We start by importing a dataset and cleaning it up, then we perform logistic regression on a very simple model, followed by a fancy model. Lastly we draw a graph of the predicted probabilities that came from the Logistic Regression.
    The code that I use in this video can be found on the StatQuest GitHub:
    github.com/StatQuest/logistic...
    For more details on what's going on, check out the following StatQuests:
    For a general overview of Logistic Regression:
    • StatQuest: Logistic Re...
    The odds and log(odds), clearly explained:
    • Odds and Log(Odds), Cl...
    The odds ratio and log(odds ratio), clearly explained:
    • Odds Ratios and Log(Od...
    Logistic Regression, Details Part 1, Coefficients:
    • Logistic Regression De...
    Logistic Regression, Details Part 2, Fitting a line with Maximum Likelihood:
    • Logistic Regression De...
    Logistic Regression Details Part 3, R-squared and its p-value:
    • Logistic Regression De...
    Saturated Models and Deviance Statistics, Clearly Explained:
    • Saturated Models and D...
    Deviance Residuals, Clearly Explained:
    • Deviance Residuals
    For a complete index of all the StatQuest videos, check out:
    statquest.org/video-index/
    If you'd like to support StatQuest, please consider...
    Buying The StatQuest Illustrated Guide to Machine Learning!!!
    PDF - statquest.gumroad.com/l/wvtmc
    Paperback - www.amazon.com/dp/B09ZCKR4H6
    Kindle eBook - www.amazon.com/dp/B09ZG79HXC
    Patreon: / statquest
    ...or...
    TH-cam Membership: / @statquest
    ...a cool StatQuest t-shirt or sweatshirt:
    shop.spreadshirt.com/statques...
    ...buying one or two of my songs (or go large and get a whole album!)
    joshuastarmer.bandcamp.com/
    ...or just donating to StatQuest!
    www.paypal.me/statquest
    Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
    / joshuastarmer
    0:00 Awesome song and introduction
    0:29 Load and format data
    3:54 Dealing with missing data
    5:03 Verifying that the data is not imbalanced
    6:44 Logistic regression with one independent variable
    12:48 Logistic regression with many independent variables
    15:13 Graphing the predicted probabilities
    #statquest #logistic

ความคิดเห็น • 640

  • @statquest
    @statquest  3 ปีที่แล้ว +25

    Here's the link to the code: github.com/StatQuest/logistic_regression_demo/blob/master/logistic_regression_demo.R
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @falaksingla6242
      @falaksingla6242 2 ปีที่แล้ว

      Hi Josh,
      Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so.
      Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.

  • @wei2674
    @wei2674 4 ปีที่แล้ว +25

    Thank you so much Josh for all these videos! I got Aplus for most of my stat courses quite a few years ago when I was doing my MSc of BIostat, but it took me quite some time to come up with a better understanding of a few concepts. You just summarized and presented these ideas and more in a few minutes! You are a genius and on top of that, you are so Kind to share all these work to everyone for free! With my limited vocabulary, all I can say is THANK YOU! It makes me feel the world is a beautiful place with beautiful mind and soul. I love your song “hello”, it reminds me of the day I met my daughter and brought happy tears to my eyes :)

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Thank you so much!!! I'm really glad you like my videos and my music. :)

  • @i8thelastmoa360
    @i8thelastmoa360 4 ปีที่แล้ว +5

    Your videos cover everything in my course and I wish I found you sooner! So much detail and clear explaining in such little time

  • @SurrenderPink
    @SurrenderPink 4 ปีที่แล้ว +5

    Josh, it’s Saturday morning here and I’m enjoying a cup of Bam! learning R from the best teacher on the planet. I’m so grateful and appreciative of your efforts to share your considerable talents with us!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you very much! :)

  • @MuctaruKabba
    @MuctaruKabba 4 ปีที่แล้ว +39

    Your videos never disappoint, Sir. I have gone through many of them and think you've earned the right to brand the phrase: "clearly explained" because your explanations are indeed very clear. I am building a better explanation of statistics thanks to you. I appreciate you and hope you continue to pass on the knowledge.

    • @statquest
      @statquest  4 ปีที่แล้ว +4

      Wow, thanks!

  • @marielledelcarmencaballero5017
    @marielledelcarmencaballero5017 2 ปีที่แล้ว +1

    Your videos are great! It's also so nice of you that you take the time reply to so many of the comments here !

  • @japhethernandezvaquero204
    @japhethernandezvaquero204 4 ปีที่แล้ว +3

    Nice channel to land on! Happiest discovery of my 2020! Great job!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you! :)

  • @solalstenou6474
    @solalstenou6474 5 ปีที่แล้ว +1

    What is great with your video is that even if I forgot my headphone I am able to follow the video in the computer room full of other students! Thank you so so so much !!!! From University of Bordeaux

    • @statquest
      @statquest  5 ปีที่แล้ว

      Solal Sténou Merci!! :)

  • @chasti5754
    @chasti5754 3 ปีที่แล้ว +12

    I just wish one day all this information actually stays and sticks to my mind... thank you thought! Your videos are amazing!

    • @statquest
      @statquest  3 ปีที่แล้ว +1

      Thanks for watching!

  • @alexandergeorgiev2631
    @alexandergeorgiev2631 3 ปีที่แล้ว +2

    You are an absolute life saver. My data science paper is due in two days and now I have my pretty log graph and I understand this better. DOUBLE BAM!!!!!

  • @yashilagovender5134
    @yashilagovender5134 2 ปีที่แล้ว +1

    Thank you so much for this video! I've been suffering with the coding for my project but this really helped. You're a star!

  • @alhaque7556
    @alhaque7556 2 ปีที่แล้ว +1

    Thank you so much! I've a stat project to do in R with logistic Regression and this simplified the coding portion so much!

  • @emilyblythe7708
    @emilyblythe7708 5 ปีที่แล้ว +88

    where have you been my whole thesis! thank you!!

    • @statquest
      @statquest  5 ปีที่แล้ว +9

      Hooray! I'm glad to help! :)

    • @amandacampos3037
      @amandacampos3037 3 ปีที่แล้ว +1

      I feel the same!! hah

  • @holeman1
    @holeman1 3 ปีที่แล้ว +20

    This 89-year-old guy says BAM!! So clearly explained, indeed. DOUBLE-BAM!!!!

    • @statquest
      @statquest  3 ปีที่แล้ว +3

      BAM!!! And thank you for your support!!!!

  • @565-FENRIR
    @565-FENRIR 2 ปีที่แล้ว +2

    I really enjoyed the clearly way to explain us this topic. So many thanks for the teaching!!!

    • @statquest
      @statquest  2 ปีที่แล้ว

      Thank you very much!!!

  • @dodgecarlincila879
    @dodgecarlincila879 3 ปีที่แล้ว +3

    I was just here for the logistic regression but bam!! I would be watching all of your videos. As a ds learner using r, double bam!!!, your videos will surely help big time! Bambambam! 👌😅
    Thank you. 🙂

    • @statquest
      @statquest  3 ปีที่แล้ว

      Awesome! Thank you!

  • @nathanielchristian7027
    @nathanielchristian7027 5 ปีที่แล้ว +3

    Your simple English explanation of the meaning of "Intercept" in the output from 8:30 to 8:38 of this video was something I could not find after searching for 2 hours. Thank you!

    • @statquest
      @statquest  5 ปีที่แล้ว +2

      Awesome!!! Now that you have that concept down, a lot of other stuff in statistics should make more sense. (At least I hope!) :)

  • @riteshpatel1984
    @riteshpatel1984 5 ปีที่แล้ว +3

    Hi Josh, thanks for your videos they are very easy to understand. Really appreciate your efforts. I believe I speak for many,
    Because of you many people are able to understand with utmost clearity and you cover all the small details with super ease. Keep up the Nobel work. Cheers 👍
    Would it be possible for you to put up a video on model evaluation i.e. determining cutoff and model performance.
    Thanks

    • @statquest
      @statquest  5 ปีที่แล้ว +1

      Thank you! :)

  • @burrohq
    @burrohq 3 ปีที่แล้ว +1

    You sir deserve a promotion 👏 thanks for this incredibly helpful video

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! :)

  • @meniz4659
    @meniz4659 4 ปีที่แล้ว +16

    You will surely be in my Thesis acknowledgments. Thank you for making our lives relatively easier but truly more ineligible. BAAAAAM!!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thanks so much! :)

  • @zahraab1027
    @zahraab1027 4 ปีที่แล้ว +5

    "one last shameless self promotion" got me 😂😂😂.....that's why I love your videos, u make learning stats fun

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Hooray! Thank you! :)

  • @maheshkumar-vv5fp
    @maheshkumar-vv5fp 4 ปีที่แล้ว +2

    good looking white background...
    graphs are beautiful...
    whatever you say, you write it on screen....
    your sound and sound system, very good..
    the way you explain things, CLEARLY EXPLAINS everything..
    and loved that music part and BAM!!!
    and here, i have something to say about your work..
    and that is VERY BIG BAM !!!... good luck.. keep growing..

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you very much! :)

  • @Mel22Brasil
    @Mel22Brasil 3 ปีที่แล้ว +2

    It must be so much fun working with you! Thank you for this tutorial. =)

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! :)

  • @farhadwaseel9981
    @farhadwaseel9981 4 ปีที่แล้ว +6

    I recommend all the videos by stat quest with Josh Starmer. Thank you for your good explanations.

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you very much! :)

  • @wei2674
    @wei2674 4 ปีที่แล้ว +8

    Both my husband and I learned so much from ur video. ( inspired by the top comment), whenever you come to Toronto let us know for a few free accommodation in our Asian restaurant/bubble tea surrounded neighborhoods (north York center)!
    Thx again!
    Xin

    • @statquest
      @statquest  4 ปีที่แล้ว +2

      Hooray!!! That would be awesome. I will dream of the day I can visit you in Toronto. :)

  • @chrischukwu2956
    @chrischukwu2956 3 ปีที่แล้ว +3

    You are an amazing teacher. God bless you!

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! 😃

  • @LoizidesGeorge
    @LoizidesGeorge 4 ปีที่แล้ว +22

    So helpful, thanks!
    Whenever you come to Cyprus let me know for few free accomodations in our mountainous region, Marathasa!
    Thx again!
    Γ

    • @statquest
      @statquest  4 ปีที่แล้ว +7

      Wow! That sounds awesome!!!

    • @LoizidesGeorge
      @LoizidesGeorge 4 ปีที่แล้ว +3

      @@statquest
      oh yes!
      I owe you a lot - you saved me so many hours!
      Γ

  • @mutuamutunga
    @mutuamutunga 4 ปีที่แล้ว +2

    This has been extremely helpful. Thank you!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you! :)

  • @danee593
    @danee593 5 ปีที่แล้ว +2

    Josh you are amazing, thank you!

  • @daviddevega4433
    @daviddevega4433 3 ปีที่แล้ว +2

    Thanks you very much for all stuff. You have saved me to fail my exams. Amazing quality channel Unbelievable the low number of likes. Very appreciated channel, at least for me. Thanks again.

    • @statquest
      @statquest  3 ปีที่แล้ว

      Wow, thanks!

  • @joseluismanzanares3662
    @joseluismanzanares3662 5 ปีที่แล้ว

    Clear as water. Super BAM!!! Gracias por compartir

  • @nl7247
    @nl7247 ปีที่แล้ว +1

    Thanks for also showing how to wrangle data and explore missing data in a simple helpful way ❤

    • @statquest
      @statquest  ปีที่แล้ว +1

      My pleasure 😊

  • @N0o0x0e0r
    @N0o0x0e0r 5 ปีที่แล้ว +1

    This channel has helped me a lot understanding statistics! Could you please make a video explaining the linear mixed model too?

    • @statquest
      @statquest  5 ปีที่แล้ว

      Yes! However, it might be a while before I get to it.

  • @ricardot4722
    @ricardot4722 5 ปีที่แล้ว +2

    I am impressed, you are talented, thanks for your sharing your knowledge.

    • @statquest
      @statquest  5 ปีที่แล้ว

      Thank you! :)

  • @danieltrodler4340
    @danieltrodler4340 4 ปีที่แล้ว +1

    Great content and incredible value. Thank you so much

  • @marcelomurilloquesada8400
    @marcelomurilloquesada8400 4 ปีที่แล้ว +1

    Hi, I really like your videos, every topic is as clear as water after watching it. I've watched this one and also the three videos about logistic regression's details. If you want to go further in this topic, you could do a video explaining emmeans package for R. Many people, including me, would understand post hoc tests for glm using emmeans, if someone like you explained it. Thank you!

  • @skandagurunathanr4795
    @skandagurunathanr4795 4 ปีที่แล้ว

    Great salute! If you can, please post a video on all machine learning models with a large dataset example implementation in r with clear intuition and mathematics statistics behind it. Thanks.

  • @goodsuggestionbutno6783
    @goodsuggestionbutno6783 2 ปีที่แล้ว

    Hoooray! We made it to the end of an exciting journey through logistic regression! Hope you have a nice day, and thank you for understanding the output for logistic regression in R, which really cant be understood thoroughly without watching all the logistic + odds videos!

    • @statquest
      @statquest  2 ปีที่แล้ว

      Yep, that is correct. That's why I made all those other videos first - the output is jam packed with stuff.

  • @paulshannon9708
    @paulshannon9708 5 ปีที่แล้ว

    You really are wonderful for explaining this in a way morons like me can understand, this is so incredibly helpful. Thank you so much!

  • @danielromero-alvarez5392
    @danielromero-alvarez5392 4 ปีที่แล้ว +1

    you are just the best! Thanks for doing this!

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Thank you! :)

  • @mohamedhijazi8460
    @mohamedhijazi8460 4 ปีที่แล้ว +2

    You're the man! thanks for everything!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you very much! :)

  • @critiquessanscomplaisance8353
    @critiquessanscomplaisance8353 4 ปีที่แล้ว +3

    I won't forget you in the acknowledgments sir haha!!! Great job!

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you very much! :)

  • @saulesparza7911
    @saulesparza7911 5 ปีที่แล้ว +1

    This video is amazing! Thanks!!!

    • @statquest
      @statquest  5 ปีที่แล้ว

      Thank you! :)

  • @at4652
    @at4652 6 ปีที่แล้ว +5

    Great tutorials, I started with your PCA video and since then hooked onto other videos . Could I request you to do a video on various types of probability distributions when to use them.

    • @statquest
      @statquest  6 ปีที่แล้ว +2

      Those are all in the works. I wish I could work 2 or 4 times faster than I can. I've wanted to cover the major probability distributions for over a year, but got sucked down a machine learning path and now feel spread pretty thin. However, these will happen eventually! :)

    • @TimothyChenAllen
      @TimothyChenAllen 5 ปีที่แล้ว +1

      StatQuest with Josh Starmer could you make a video on how to work 2 to 4 times faster? :-)

    • @statquest
      @statquest  5 ปีที่แล้ว +1

      As soon as I figure that out, I'll make a video on it! ;)

    • @weilianglim1764
      @weilianglim1764 5 ปีที่แล้ว

      BAM!!!

  • @tansutazegul8297
    @tansutazegul8297 ปีที่แล้ว +1

    incredibly brilliant tutorial!

  • @fahmiidris4499
    @fahmiidris4499 3 ปีที่แล้ว +2

    super dangg! Good explanation, bro!

    • @statquest
      @statquest  3 ปีที่แล้ว

      Thank you! :)

  • @tuanlong9238
    @tuanlong9238 5 ปีที่แล้ว +1

    And...BAM, thanks for sharing, your video is really useful :D

  • @kedwards127
    @kedwards127 5 ปีที่แล้ว +2

    This is so helpful thank you!!

  • @KayYesYouTuber
    @KayYesYouTuber 4 ปีที่แล้ว +1

    Your videos are awesome. Thank you very much.

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you! :)

  • @mdhasibreza5161
    @mdhasibreza5161 2 ปีที่แล้ว

    All of your videos are great and fun to learn from! Could you please upload a tutorial on mediation analysis using STATA and R (using the mediation package)?

    • @statquest
      @statquest  2 ปีที่แล้ว

      I'll keep that in mind.

  • @andreatulli356
    @andreatulli356 3 ปีที่แล้ว +1

    Great video!!! Thank you so much!

  • @da2015
    @da2015 5 ปีที่แล้ว +6

    These videos are so amazing!
    Do you have a suggestion for a book that explains Logistic Regression to newbies? The videos are super awesome, but extra references may help too. Hopefully you will write your own book soon!
    Thanks!

    • @shnibbydwhale
      @shnibbydwhale 4 ปีที่แล้ว +5

      I know this is probably 10 months too late, but the book “Introduction to Categorical Data Analysis” by Alan Agresti is a great book. Does a really good job explaining logistic regression and is pretty light on the math.

  • @sheilaserrano1039
    @sheilaserrano1039 6 ปีที่แล้ว +3

    Thaaaanks! very useful and clear!

    • @statquest
      @statquest  6 ปีที่แล้ว

      Hooray! I'm glad you like it! :)

  • @mathieufen2239
    @mathieufen2239 4 ปีที่แล้ว +1

    SO clear!! Thanks!!

  • @Fsp01
    @Fsp01 2 ปีที่แล้ว +2

    Doing a masters program on analytics and this video made more sense than all the lectures combined on logistic regression. thank you

  • @BruceWayne-oc7dn
    @BruceWayne-oc7dn 2 ปีที่แล้ว +1

    Its's 1:11 AM and what I am doing is DOUBLE BAM. Thank you for this awesome video. U are hero.

  • @AOLFlyersNewsletters
    @AOLFlyersNewsletters 4 ปีที่แล้ว +1

    Thanks Josh - you are our saviour!

  • @raghavendral882
    @raghavendral882 5 ปีที่แล้ว +2

    BAM_ spot on thanks for such video.. my journey with logis tic regression and r has started.

  • @christelleleitzingerphd7491
    @christelleleitzingerphd7491 4 ปีที่แล้ว +1

    Awesome! Thank you so much! Please could you do a video about conditional logistic regression like clogit in R with result interpretation and how it works when using adjusted parameters.

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      I'll keep that in mind.

  • @ericaleverson9430
    @ericaleverson9430 3 ปีที่แล้ว +1

    You are so good!! Thank you!

  • @ca177
    @ca177 4 ปีที่แล้ว +2

    YOU RAWK !! Awesome explains on ML concepts..

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you! :)

  • @vidyaammu1687
    @vidyaammu1687 3 ปีที่แล้ว

    Thanks for the video. Your video made it look like so simple. I request you to upload a video of how to get risk ratios in multiple logistic regression model.

    • @statquest
      @statquest  3 ปีที่แล้ว

      I'll keep that in mind.

  • @wilfredoa.tovarhidalgo9385
    @wilfredoa.tovarhidalgo9385 2 ปีที่แล้ว +1

    Excelent!!!! Thank you very much.

  • @kayizaisma6288
    @kayizaisma6288 4 ปีที่แล้ว +4

    Great job bro.
    Gratitude for your help. You also have where to stay if you come to Uganda (Africa).

    • @statquest
      @statquest  4 ปีที่แล้ว

      Thank you very much!!! :)

  • @yutassmilehealsme6572
    @yutassmilehealsme6572 3 ปีที่แล้ว +2

    THANK YOU! somehow I couldn't find any websites explaining this

    • @statquest
      @statquest  3 ปีที่แล้ว

      Glad you found it.

  • @temjim
    @temjim 4 ปีที่แล้ว +5

    Hi, Josh. I cannot thank you enough for these videos... Would also be good to have a similar video in Python..

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Great suggestion!

    • @aishwaryadas3681
      @aishwaryadas3681 2 ปีที่แล้ว

      @@statquest where's the video sir in python sir?

  • @TiNa-uo3ks
    @TiNa-uo3ks 2 ปีที่แล้ว +1

    Thank You. SOOOOOOOOOooOOOoo Helpful

  • @JanoschGonzalez
    @JanoschGonzalez 5 ปีที่แล้ว

    Excellent!!!

  • @JRO_Lyrics
    @JRO_Lyrics 2 ปีที่แล้ว +1

    great
    work done here

  • @wa5561
    @wa5561 2 ปีที่แล้ว +1

    Thank you for saving my study. Not gonna lie, this video made me cry. I was about to drop out because of statistics, but this saved my project.

  • @RajeshSahu-ey8kw
    @RajeshSahu-ey8kw 4 ปีที่แล้ว +1

    U are geneus...and ur teaching style too...hurray!!!! and Bamm!!!!

    • @statquest
      @statquest  4 ปีที่แล้ว +1

      Wow, thank you!

  • @jives.
    @jives. 3 ปีที่แล้ว +1

    lets goooo StatQuest

  • @sofiaalfonso9883
    @sofiaalfonso9883 3 ปีที่แล้ว +1

    Sir, you are a savior

  • @woopwoopsoupsoup678
    @woopwoopsoupsoup678 ปีที่แล้ว +1

    This man is a legend

  • @hajer3335
    @hajer3335 6 ปีที่แล้ว +2

    Thank you so much for this effort really appreciate
    We need a stat quest on three topics:
    1-Chi-square test,
    2- The Hosmer-Lemeshow goodness of fit test for logistic regression.
    And 3- Iteratively reweighted least squares (IRLS) by using Newton's method.
    If you don't mind :) of course.
    Can you tell us about the title of next video?!

    • @statquest
      @statquest  6 ปีที่แล้ว +1

      The Chi-Square test is on the list. I've looked into the Hosmer-Lemeshow fit... Can you tell me what you think about the limitations? Specifically those mentioned in the wikipiedia article about it? en.wikipedia.org/wiki/Hosmer%E2%80%93Lemeshow_test#Limitations_and_alternatives
      And iteratively reweighted least squares is also on the list. However, up next are some basic statistics videos and then videos on lasso, ridge, and elastic-net regression.

    • @hajer3335
      @hajer3335 6 ปีที่แล้ว +1

      the Hosmer-Lemeshow statistic was used to avoid problem in Pearson chi-squared statistic which was when observations being grouped by the values of the x variables, the Pearson chi-squared goodness of fit test cannot be readily applied if there are only one or a few observations for each possible value of an x variable, or for each possible combination of values of x variables.
      (A sample with a sufficiently large size is assumed. If a chi-squared test is conducted on a sample with a smaller size, then the chi-squared test will yield an inaccurate inference).
      So in the Hosmer-Lemeshow statistic, the observations are grouped by expected probability. But there is very little guidance on selecting the number of subgroups. The number of subgroups,g, is usually calculated using the formula g> P + 1. For example, if you had 12 covariates in your model, then g > 12. How much bigger than 12 g should be is essentially left up to you. Small values for g give the test less opportunity to find mis-specifications. Larger values mean that the number of items in each subgroup may be too small to find differences between observed and expected values. Sometimes changing g by very small amounts (e.g. by 1 or 2) can result in wild changes in p-values. As such, the selection for g is often confusing and arbitrary. Also, it doesn’t take overfitting into account and tends to have low power. For these reasons, the Hosmer-Lemeshow test is no longer recommended.
      Am I on right? Is it enough cues to no longer used of HL test?
      I have another question, ( Overfitting is happening when your sample size is too small. If you put enough predictor variables in your regression model, you will nearly always get a model that looks significant.
      While an overfitted model may fit the idiosyncrasies of your data extremely well, it won’t fit additional test samples or the overall population. The model’s p-values, R-Squared and regression coefficients can all be misleading. Basically, you’re asking too much from a small set of data.)
      If I have a small sample, is there any problem to use Maximum likelihood to fit model and McFadden's pseudo-R squared? Is there any rule to chose the number of sample for any regression?
      Sorry for the many of questions, it is my first year in biostatistics. :)

    • @statquest
      @statquest  6 ปีที่แล้ว +1

      These are all great questions. You are correct about the HL test and you are correct about overfitting. There are, however, lots of tricks you can use to compensate for overfitting (lasso regression, ridge regression, elastic net regression etc.)
      One way to test to see if you have a model that is "overfit" is to use cross validation.
      As for a minimum number of samples for logistic regression - people often say "10 samples per level of each discrete variable". It's a general rule of thumb and it doesn't always apply. However, again you can use cross validation to verify if you have enough samples or not. Cross validation is a very practical tool!

    • @hajer3335
      @hajer3335 6 ปีที่แล้ว +1

      Thank you, Mr Josh, for answering me, I need to study more about Cross-validation.

    • @hajer3335
      @hajer3335 6 ปีที่แล้ว +1

      Sorry l have more than one account 🙈🙊

  • @iselacr5747
    @iselacr5747 3 ปีที่แล้ว

    Hi, I love the way you explain all this things! I have a couple of questions. I observe that it's necessary to establish a code type for the predictors, if these are dichotomous, for example, they are assigned 1 and 0 (in the example male / female), so:
    - How should we proceed with polytomous predictors?
    - What results of the model should be reported in a scientific article?
    Thank you in advice and keep doing great content!

    • @statquest
      @statquest  3 ปีที่แล้ว

      1) For all categorical data (with 2 or more classes), just make sure you are storing it in a factor.
      2) That depends on the journal. I would look at other articles in that journal to figure it out.

  • @kingfisher65
    @kingfisher65 10 หลายเดือนก่อน +2

    amazing. thank you man!

    • @statquest
      @statquest  10 หลายเดือนก่อน

      Thanks!

    • @familians
      @familians 10 หลายเดือนก่อน

      You may like this video too:
      Another great video about logistic regression in JMP
      th-cam.com/video/9yN_yjGAJZE/w-d-xo.htmlsi=jUwEZUDobBudE8AE

  • @geetikapanda7152
    @geetikapanda7152 3 ปีที่แล้ว +1

    The more I watch your videos the more the wish I had a teacher like you in my school days..
    Do we have a video on chi square test?

    • @statquest
      @statquest  3 ปีที่แล้ว

      Not yet. :( But one day we will.

  • @jhndrmwn
    @jhndrmwn 3 ปีที่แล้ว +1

    Love this

  • @katere89
    @katere89 5 ปีที่แล้ว +6

    Hi Josh, thanks for this amazing tutorial. Would you be able to add something interactions between predictors and random effects? I am trying to run a mixed-model logistic regression and have three-way interactions but not entirely sure on how to deal with them. Thanks so much :)

  • @mrangelepic1
    @mrangelepic1 5 ปีที่แล้ว +1

    Hi Josh,
    Thank you very much for this great Video! :)
    Could you please do a video on how AIC works and how to select the relevant parameters for the logistic regression model out of the parametes that are given in a data table?

    • @statquest
      @statquest  5 ปีที่แล้ว

      AIC is on the to-do list. Since asked for it, I'll bump it up a little closer to the top.

    • @565-FENRIR
      @565-FENRIR 5 ปีที่แล้ว +1

      Double BAM!!! That's sounds great! Awesome video, all of them are so helpful to understand logistic regression!

    • @statquest
      @statquest  5 ปีที่แล้ว

      @@565-FENRIR Hooray! Thank you! :)

  • @CarlosDullius
    @CarlosDullius 5 ปีที่แล้ว +1

    I really love the music kkkk
    Congrats man, you are amazing o/

  • @mihaelawassilko7414
    @mihaelawassilko7414 5 ปีที่แล้ว +3

    Hi Josh, Thank you for the very informative tuturial. Do you have any videos for the multilevel modelling?

  • @ss11996
    @ss11996 5 ปีที่แล้ว

    HI, I am having little trouble understanding how does a factor variable(string) can be inputed in a logistic model model which is mathematical ?

  • @MB-nc9rq
    @MB-nc9rq 3 ปีที่แล้ว +1

    Great video, thanks so much Josh! After the 4th minute you mention how to address the NA samples. Can you teach us the RANDOM FOREST method, if we don't want to get rid of our NA samples (e.g. in multivariate cases, where the rows include other useful info)? Thanks!

    • @statquest
      @statquest  3 ปีที่แล้ว

      I cover the random forest method in this video: th-cam.com/video/6EXPYzbfLCE/w-d-xo.html (the theory is here: th-cam.com/video/sQ870aTKqiM/w-d-xo.html )

  • @thomasbaker26
    @thomasbaker26 3 ปีที่แล้ว

    Excellent video, very clear and easy to follow! Do you have any videos that show how to do best subsets and cross validation with logistic regression on R? I know you have a video that explains the concept of cross validation but I am looking for a video like this that goes through it step-by-step for logistic regression on R. Same thing for how to run all possible models (best subsets) using logistic regression on R. I have found one by another youtuber for linear regression but not for logistic.

    • @statquest
      @statquest  3 ปีที่แล้ว

      Not yet. :(

    • @thomasbaker26
      @thomasbaker26 3 ปีที่แล้ว +1

      @@statquest Wow thank you for the quick reply! That's alright, if you do make any videos like that, I'll be among the first to watch them! :)

  • @flownoth437
    @flownoth437 2 ปีที่แล้ว +1

    this is gold!

  • @BulLiT2401
    @BulLiT2401 3 ปีที่แล้ว +1

    Love your videos. Could you do one on mixed logistic regression?

    • @statquest
      @statquest  3 ปีที่แล้ว

      I'll keep that in mind.

  • @sitendurocks
    @sitendurocks 4 ปีที่แล้ว

    at the end where you make the graph , you could have used the broom package and augment function to create the data frame to compute the fitted and actual values.

  • @rachcastellino
    @rachcastellino 4 ปีที่แล้ว +1

    you are a godsend

  • @mariyapak428
    @mariyapak428 2 ปีที่แล้ว +1

    Josh, joining all the folks here in thanking you! I have a question: around minute 9:05 you talk about odds of having being unhealthy for a female. How do we know that these are the odds of being unhealthy vs being healthy? I feel I am floating when it comes to intercept, reference categories, and baseline categories. Thanks a lot!

    • @statquest
      @statquest  2 ปีที่แล้ว +1

      R orders factors ("healthy" vs "unhealthy") in alphabetical order. So that means "healthy" is first, and the default, and "unhealthy" is the difference from that. Likewise, "sexF" and "sexM" are ordered alphabetically, so "sexF" is the default value and "sexM" is the difference from that.

  • @ThinkwithLex
    @ThinkwithLex 8 หลายเดือนก่อน +2

    A small request, you have done a lot already, a big thank you for that. Is it possible to make a video on Logistic regression in Python ?

    • @statquest
      @statquest  8 หลายเดือนก่อน +1

      I'll keep that in mind.

    • @ThinkwithLex
      @ThinkwithLex 8 หลายเดือนก่อน +1

      @@statquest thank you so much

  • @chaparro1097
    @chaparro1097 5 ปีที่แล้ว

    BAM that's a spotless explanation, thanks a lot

  • @bmf_onlineedu1649
    @bmf_onlineedu1649 4 ปีที่แล้ว +1

    nice work

  • @bellahuang8522
    @bellahuang8522 2 ปีที่แล้ว +1

    me binge watching Josh's videos before midterm... anyone else? lmao

    • @statquest
      @statquest  2 ปีที่แล้ว

      Good luck! :)

  • @paulriggsy99
    @paulriggsy99 3 ปีที่แล้ว +1

    Thank you!!!

  • @stefanyramos1250
    @stefanyramos1250 3 ปีที่แล้ว +1

    Awesome!!!!!

  • @philippaknecht9247
    @philippaknecht9247 5 ปีที่แล้ว +2

    Hi Josh
    I find your videos very informative and they help me a lot with my bachelors thesis. Because you put some variables into "factors" and others stay "numeric" I think I can ask my question, that I nowhere find an answer on the internet, or I don't know how! I do a logistic regression with NBA regular season games to find out if the fact that the teams are eliminated from the playoffs has an effect on their winning probability (to find out if they "tank" = intentionally loosing). For the variable of the current strength of the team I use the current winning percentage of the team (how many games won over how many games playd) and this variable is refreshed after every game. I was wondering if I can put this variable as a "numeric"? Or as what kind of type would you define this winning percentage? The opponents winning percentage, whether the game is on the home court or not, if the team is statistically eliminated or in the playoffs and if the opponent is statistically eliminated or in the playoffs is also in the regression. It is the same regression some reserachers did back in 2002 to test the same thing but no one did recently. I hope you understand my question and hope very much, that you can and are willing to help me. Thank you very much and have a great day!

    • @statquest
      @statquest  5 ปีที่แล้ว

      For logistic regression, it will be easier to understand what the estimated coefficients mean if you multiply the percentage of games won by 100. When you do this, you can use these values as "numeric" and the coefficient will tell you how much the probability of the outcome changes for every 1 percentage change in that variable. For more details on interpreting the coefficients, check out th-cam.com/video/vN5cNN2-HWE/w-d-xo.html

    • @philippaknecht9247
      @philippaknecht9247 5 ปีที่แล้ว +1

      Thank you very much for your help!!! I appreciate it a lot! I'm glad it's not a complicated solution... :D

  • @stevengao8527
    @stevengao8527 3 ปีที่แล้ว +1

    Bruh this is dope

  • @kiwonlee553
    @kiwonlee553 4 ปีที่แล้ว

    Hello, thanks a lot for your detailed explanation. Though would there be another video on how to apply cross validation on your regression?

    • @statquest
      @statquest  4 ปีที่แล้ว

      I only have one video on cross validation: th-cam.com/video/fSytzGwwBVw/w-d-xo.html