Mastering Hypothesis Testing for Data Science Interviews: Binomial, Z-test, and T-test

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ก.ค. 2024
  • This video is part 1 of hypothesis testing problems in data science interviews.
    Part 2 of hypothesis testing problems in data science interviews:
    • A/B Testing Analysis M...
    🟢Get all my free data science interview resources
    www.emmading.com/resources
    🟡 Product Case Interview Cheatsheet www.emmading.com/product-case...
    🟠 Statistics Interview Cheatsheet www.emmading.com/statistics-i...
    🟣 Behavioral Interview Cheatsheet www.emmading.com/behavioral-i...
    🔵 Data Science Resume Checklist www.emmading.com/data-science...
    ✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: www.emmading.com/coaching
    // Comment
    Got any questions? Something to add?
    Write a comment below to chat.
    // Let's connect on LinkedIn:
    / emmading001
    ====================
    Contents of this video:
    ====================
    00:00 Intro
    00:34 Three types of questions
    02:18 When to use binomial test, z-test and t-test
    05:09 t-distribution vs z-distribution
    06:50 Testing proportions
    09:26 What's in the next video

ความคิดเห็น • 76

  • @stella123www
    @stella123www 3 ปีที่แล้ว +17

    best hypothesis testing video I've ever seen on youtube, thank you for producing great content!

  • @liuauto
    @liuauto 3 ปีที่แล้ว +1

    This video will save tons of effort before taking a stats course and diving into any details. Have not seen such a helpful diagram ever before.

  • @cming6108
    @cming6108 3 ปีที่แล้ว +1

    so much appreciation for every content you upload!!!

  • @user-nz5oi8pd5m
    @user-nz5oi8pd5m 2 ปีที่แล้ว +1

    always spoiled by Emma's concise and clear explaination.

  • @deepadas4585
    @deepadas4585 3 ปีที่แล้ว +2

    Love your videos, Emma! Very insightful and to-the-point explanation.
    I would love to see some domain-specific analytics interview case studies like- supply chain analytics, e-commerce analytics.

  • @CodeEmporium
    @CodeEmporium 3 ปีที่แล้ว +1

    This is good detail. Love it

  • @vincenttan6303
    @vincenttan6303 3 ปีที่แล้ว

    good stuffs! clearer than textbook and even lecturers.

  • @oliviazhang2922
    @oliviazhang2922 2 ปีที่แล้ว

    You are absoluetly the best Emma!! Thank you!!!

  • @jeoffleonora4612
    @jeoffleonora4612 3 ปีที่แล้ว

    Great video as always! Thanks Emma!

  • @brothermalcolm
    @brothermalcolm 3 ปีที่แล้ว

    Perfect, just what I need, subscribed!

  • @sirvachjumani7215
    @sirvachjumani7215 3 ปีที่แล้ว

    Really useful content for interviewers.

  • @hameddadgour
    @hameddadgour ปีที่แล้ว

    Great explanation and very informative! Thank you!

  • @kangxinwang3886
    @kangxinwang3886 3 ปีที่แล้ว +2

    this is just good period

  • @starbuststream3219
    @starbuststream3219 ปีที่แล้ว

    Very informative video for job interview preparers!

  • @chuchuzhu333
    @chuchuzhu333 3 ปีที่แล้ว

    Thank you so much!

  • @norilouis
    @norilouis ปีที่แล้ว

    This is SO helpful and I really appreciate your content Emma!

    • @emma_ding
      @emma_ding  ปีที่แล้ว

      I'm so glad to hear you found it helpful, Louis! Thanks so much for watching. 😊

  • @dallalstreet1775
    @dallalstreet1775 3 ปีที่แล้ว

    thanks Emma! woderful video

  • @j33vn
    @j33vn 2 ปีที่แล้ว +7

    Great content as always Emma. An intuitive way to think about not using t-test for estimating population proportion is that for Bernoulli data, there is only one unknown. The population proportion. Once we know it, the variance is simply p(1-p). But In the case of estimating population mean, there are two unknowns. Population mean and population standard deviation. The heavier tail of t dbn is used to capture the extra uncertainty caused by this additional unknown. Khan Academy explains this in more detail for anyone interested. Thanks!

    • @emma_ding
      @emma_ding  2 ปีที่แล้ว

      Great observation Jeevan! Thank you for sharing!

    • @user-bn6tc4vv6l
      @user-bn6tc4vv6l 10 หลายเดือนก่อน +1

      Hi, which video/modules from Khan Academy explain this? thanks

  • @Leon71
    @Leon71 3 ปีที่แล้ว

    Thank you very much!

  • @cliffrunner
    @cliffrunner ปีที่แล้ว

    this is a great video! thanks a lot!

  • @nattapatjuthaprachakul9859
    @nattapatjuthaprachakul9859 3 ปีที่แล้ว

    Thank you so much

  • @lydiamai6861
    @lydiamai6861 3 ปีที่แล้ว

    Hi Emma, although I have not learnt this far, I enjoyed the video thanks to your clear and structured explanation. Thanks.

    • @emma_ding
      @emma_ding  3 ปีที่แล้ว

      Happy to hear that! Thank, Lydia!

  • @anamikadas9445
    @anamikadas9445 3 ปีที่แล้ว +4

    Love your videos Emma! For bernoulli variables, would a Chi-Squared also work? Is one method preferred over another in practice?

  • @datasciencepreparationhub9933
    @datasciencepreparationhub9933 2 ปีที่แล้ว

    Good explanation!

  • @sinhamohit
    @sinhamohit 2 ปีที่แล้ว

    Timestamps
    Top quality content
    No funky intro music
    No repetitive sentences
    No begging for likes and subscribe
    Actually gets started when says "Let's get started"
    Earning subscribers the right way!

    • @emma_ding
      @emma_ding  2 ปีที่แล้ว

      Thanks Mohit for the summary! :)

  • @csousa3608
    @csousa3608 ปีที่แล้ว

    Great video! I would love to see a video about hypothesis testing but applied to a case of use when you have to apply A/B/n testing.

    • @emma_ding
      @emma_ding  ปีที่แล้ว

      Great suggestion! In fact, I have a video on the topic you suggested th-cam.com/video/6uw0A3aKwMc/w-d-xo.html, hope it helps! :)

  • @racoonYY109
    @racoonYY109 3 ปีที่แล้ว +1

    Hi Emma, may I understand what's the difference between z-test and binomial test, if to compare CTR of two groups?

  • @hiapple6060
    @hiapple6060 ปีที่แล้ว

    Hi Emma, what test should I use if the metric follows a Bernoulli distribution, and with very different sample sizes in each group, say, 10000 observations in control and 1000 in treatment? In this case, should I use z-test with the pooled standard error or Welch's t-test?

  • @tekingunasar4189
    @tekingunasar4189 2 ปีที่แล้ว

    Hi! Great video. I am a little bit confused on the flow chart though, because it references the knowing some information about the population distribution, particularly when in the flow chart we check whether or not the population distribution is normal. I am confused by this because if we were to know that the population distribution is normal, wouldn't that make hypothesis testing redundant? I know that this is actually not the case, and that I am misunderstanding something, but I'm not sure what exactly that is.

  • @zhihaoxu756
    @zhihaoxu756 2 ปีที่แล้ว +3

    Hi Emma, thank you very much for making this videos. It is indeed very helpful! However, I have a question regarding the difference between Z-test and Binomial test. For small sample, i.e when np

    • @xiaofeichen5530
      @xiaofeichen5530 ปีที่แล้ว

      I think she means calculating directly the probability of k successes in n trials using the binomial pmf Pr(X=k)=(n choose k)p^k(1-p)^(n-k)

  • @cql8878
    @cql8878 2 ปีที่แล้ว

    I love your videos Emma! But by far this one is the hardest one to follow among yours :(

    • @emma_ding
      @emma_ding  2 ปีที่แล้ว

      Thanks for the feedback! Could you be specific which part is hard to follow? Thanks!

  • @rioache1081
    @rioache1081 3 ปีที่แล้ว

    4:11 There is a lot of arguing on the stats forums about the assumption of normality for t-test. And many of the comments state that for t-statistic to have a t-distribution the population has to follow the normal distribution (so t-test does actually require normality of population). What's your opinion on that topic?

  • @plttji2615
    @plttji2615 2 ปีที่แล้ว

    I m quite confused that when testing the conversation rate should I use z test. Cuz some websites mentioned t test. Could you please explain this?

  • @shrutigupta5104
    @shrutigupta5104 2 ปีที่แล้ว +1

    Hi Emma, thanks for making informative videos. My question is how did you choose sample size of 30 as the marker for differentiating between small sample size to large sample size?.

    • @Fawk3s1
      @Fawk3s1 2 ปีที่แล้ว

      it is a convention in statistics. Basically, if n > 30 you can apply the central limit theorem, which says that your distribution is normally distributed if n > 30.

  • @racoonYY109
    @racoonYY109 3 ปีที่แล้ว

    Also why for t-test, we have pooled and unpooled variances scenarios, while for z-test for two proportions we always used pooled?

  • @akshat175
    @akshat175 3 ปีที่แล้ว +1

    Hey Emma, your videos are super useful and simple to follow. Is there a place I can access your slides as well for quick review of the key concepts? This comment would hold for all your videos and not just this one..

    • @emma_ding
      @emma_ding  3 ปีที่แล้ว +2

      Sorry, there's no slides, it's all part of the video editing. But I'll definitely consider providing it in the future if it helps!

  • @navishagarwal1736
    @navishagarwal1736 3 ปีที่แล้ว

    Hey Emma! Thanks for another great video.
    I have watched the video a few times now but the part on "testing proportions" seems to be going over my head. Possibly because I do not have some basics necessary here.
    Any suggestions on recommended reads?

    • @emma_ding
      @emma_ding  3 ปีที่แล้ว

      For resources about stats, you can find some resources from my blog post towardsdatascience.com/how-i-got-4-data-science-offers-and-doubled-my-income-2-months-after-being-laid-off-b3b6d2de6938. For A/B testing specific, this book is a good read. www.amazon.com/Trustworthy-Online-Controlled-Experiments-Practical/dp/1108724264

  • @ramanadeepsingh
    @ramanadeepsingh 19 วันที่ผ่านมา

    Great video...what happens when sample-size is less than 30 and population distribution is not normal. What kind of tests are used in practice?

  • @bcws
    @bcws 7 หลายเดือนก่อน

    Does the Slutsky theorem apply here? Slutsky theorem only applies when one number converges in distribution to a random element and the other converges in probability to a constant.

  • @ishpandey7886
    @ishpandey7886 3 ปีที่แล้ว

    Thanks a ton.... I never found such videos... You are really helping the community...
    I just have a question if the size

    • @emma_ding
      @emma_ding  3 ปีที่แล้ว +1

      Yes, it just won't be a t-test or Z-test. You can Google "hypothesis test non normal distribution" to find more details.

    • @ishpandey7886
      @ishpandey7886 3 ปีที่แล้ว

      @@emma_ding Thanks... Would love to get one end-to- end hypothesis problem with code... That would be really helpful...

  • @appledotted
    @appledotted 3 ปีที่แล้ว +1

    I had a tech screen with a fin-tech company today. They asked me to walk through the math behind testing normality with skewness. (Quite odd)
    I got a bit stuck on how to convert the skewness into a p-value. I mentioned that normally we have CLT that we can do normal approximation like for Binomial and Poisson Distribution, but I am not sure about skewness. Then I said maybe we can try bootstrapping to simulate the sampling distribution to get the variance of skewness if the distribution is unknown. (Not sure if this is a correct approach)
    I tried to find online resources about this after the interview, but somehow none of them go in-depth to talk about this part. Do you happen to have some insight?
    P.S. Really like your videos, very concise and instructive. :)

    • @appledotted
      @appledotted 3 ปีที่แล้ว

      Just rethought about this, I think we can simulate a normal distribution over and over again with the same n, and see what is the proportion of those the skewness is more extreme than our observed data, and use that proportion as the p-value.

  • @bluestacheandego
    @bluestacheandego 3 ปีที่แล้ว

    Hi! Thanks for the videos! I see you got Oreiley textbooks behind you. Do you recommend them? if so, how do you study from them? thanks

    • @emma_ding
      @emma_ding  3 ปีที่แล้ว

      Haha, interesting question! Depends on what you are interested in, two books I highly recommend - Practical Statistics for Data Scientists (if you are interested in learning statistics in practice) and Designing Data-Intensive Applications (if you are interested in software engineering).

  • @YK-mh3mp
    @YK-mh3mp ปีที่แล้ว

    For general distribution other than normal distribution, I think it is theoretically wrong to use t-test. It is not only for proportions.

  • @thegreatlazydazz
    @thegreatlazydazz 3 ปีที่แล้ว +1

    Can you give some material which discusses whty theoretically we cannot use t tests for binomial proportions.

    • @emma_ding
      @emma_ding  3 ปีที่แล้ว +1

      Here you go stats.stackexchange.com/questions/90893/why-use-a-z-test-rather-than-a-t-test-with-proportional-data!

  • @Han-ve8uh
    @Han-ve8uh 3 ปีที่แล้ว

    If a company has defined more than 2 stages in conversion, so not just no-click/click, but like 1. Open product page 2. Add to checkout 3. Open Payment confirmation Page ... It won't follow bernoulli anymore since there are more than 2 outcomes. Are there tests for this, or we have to still use bernoulli and treat outcomes as "reached stage x vs not reached stage x"? How does the latter case affect analysis?

    • @emma_ding
      @emma_ding  3 ปีที่แล้ว +1

      In those cases, you can simplify the problem with "conditions": given users passed all previous stages, the behavior of entering or not entering to the next stage follows Bernoulli distribution. This will make testing a lot easier.

  • @shirleygui6533
    @shirleygui6533 2 ปีที่แล้ว

    Great video! but there is a small point that I was confusing: if the sample size is large enough, according to the CLT theorem, it follows the normal distribution (variance can be calculated from the sample data), then we should use z-test instead of t-test because we "know" the variance? Is my logic correct? THank you

    • @irisyao8691
      @irisyao8691 2 ปีที่แล้ว

      I have the same question, if the sample size >30, we can use z-test by using sample variance though we don't know population variance.

  • @diazjubairy1729
    @diazjubairy1729 3 ปีที่แล้ว

    What's the difference between hypothesis test and a/b test ?

    • @jimbocho660
      @jimbocho660 2 ปีที่แล้ว

      An A/B test is one type of hypothesis test.

  • @maryamomar4106
    @maryamomar4106 2 ปีที่แล้ว

    I love you.

    • @emma_ding
      @emma_ding  2 ปีที่แล้ว

      I'm glad you find the content so loveable! Thank you Maryam.

  • @sssam844
    @sssam844 ปีที่แล้ว

    could you please attach the subtitles as well? I find your videos fantastic and helpful but I have difficulty understanding the pronunciation of some words

    • @emma_ding
      @emma_ding  ปีที่แล้ว

      Sure thing! Thanks for the suggestions. I've added subtitles to my most recent videos, and will add more!

  • @nagrajkaranth123
    @nagrajkaranth123 2 ปีที่แล้ว

    Sis please cover all the interview questions of data science

    • @nagrajkaranth123
      @nagrajkaranth123 2 ปีที่แล้ว

      Great sis I subscribed your channel help me to clear data science interview sis

    • @TheElementFive
      @TheElementFive 2 ปีที่แล้ว

      Sis?

  • @vivekambastha2273
    @vivekambastha2273 3 ปีที่แล้ว

    May be good topic, but the presentation on topics is not good, also have some pauses while switching the topics

    • @emma_ding
      @emma_ding  3 ปีที่แล้ว

      Thanks a lot for the feedback! I'll pay more attention to pauses in the future!