AB Testing 101 | Fmr. Google Data Scientist Explains How to Calculate the Sample Size

แชร์
ฝัง
  • เผยแพร่เมื่อ 8 ก.ค. 2024
  • 🚀 Looking for a comprehensive AB testing course? Visit datainterview.com/
    👉 Join the Data Scientist Interview Bootcamp: www.datainterview.com/bootcam...
    ====== ✅ Details ======
    🤔 Ever wondered how to calculate the sample size in an AB test?
    A former data scientist at Google how to calculate the sample size step-by-step. The lesson covers the formulas, parameters and exercises that should give you the intuition on how the sample size is calculated.
    Dan, the host, was a data scientist formerly at Google and PayPal. He launched datainterview.com/ to help candidates like you eliminate frustrations about the data science interview process and increase your success.
    As an interview coach, Dan helped several clients land their dream jobs as IC and managerial DS roles at top companies such as Google, Meta, Amazon and such. Message him at Dan@DataInterview.com for help!
    👍 Make sure to hit the like, and check out datainterview.com/
    ====== ⏱️ Timestamps ======
    0:00 Sample Size Formula
    2:16 Significance Level (Alpha)
    7:15 Statistical Power (1 - Beta)
    12:43 Variance
    17:41 Demonstration
    ====== 📚 Other Useful Contents ======
    1. Principles and Frameworks of Product Metrics | TH-cam Case Study
    Link: / principles-and-framewo...
    2. How to Crack the Data Scientist Case Interview
    Link: / crack-the-data-scienti...
    3. How to Crack the Amazon Data Scientist Interview
    Link: / crack-the-amazon-data-...
    ====== Connect ======
    📗 LinkedIn - / datainterview
    📘 Medium - / datainterview
  • บันเทิง

ความคิดเห็น • 33

  • @kiwidamien
    @kiwidamien 7 หลายเดือนก่อน +5

    The power is not the probability of detecting the effect if it exists. The way the power is calculated is the probability of detecting an effect if the effect size is exactly equal to stated Minimum Effect Size.
    To get “the probability of detecting an effect if one exists” you need to integrate over a prior of the different effect sizes.

  • @Aidan_Au
    @Aidan_Au ปีที่แล้ว +4

    Dan is back with another jam-packed useful AB Testing course!

  • @chineduezeofor2481
    @chineduezeofor2481 7 หลายเดือนก่อน +1

    This is so detailed. Thank you for this!

  • @jaylambert4700
    @jaylambert4700 ปีที่แล้ว +2

    I thought this was an outstanding tutorial, thank you so much

  • @Iol4up
    @Iol4up 7 หลายเดือนก่อน +1

    This is GOLD!!

  • @SiddhantSethi02
    @SiddhantSethi02 ปีที่แล้ว

    loved the explanation man. This is the first video I have seen that is explaining where is sigma and delta coming from. I have had such a hard time in reasoning where are the parameters coming from when we have not even started the test. Thanks for the good work. :)

  • @twtw5201
    @twtw5201 3 หลายเดือนก่อน

    This is the only video one would needs to demystify the power analysis. Thank you.

  • @stella123www
    @stella123www 9 หลายเดือนก่อน

    This is a fantastic video, it helps me clear up the confusion I had with power analysis. Though I know the famous formula of 16 sigma square/delta , I had no idea the pooled variance = 2* control sample variance. Thanks for the detailed video!

  • @hayzelyeom7245
    @hayzelyeom7245 ปีที่แล้ว +2

    Thank you for the amazing AB test lecture! I have one question. How can I project the effect of this AB test from the entire product's view (e.g. calculating sitewide impact of the observed significant list)?

  • @csousa3608
    @csousa3608 ปีที่แล้ว

    Great video, thank you for sharing. In the case of A/B/n testing, the formula that you shared in the video could be adapted and used?

  • @PhiNguyen-iz9go
    @PhiNguyen-iz9go ปีที่แล้ว

    8:49
    Does the distribution of test-statistic under alternative hypothesis have the same shape with the distribution of test-statistic under null hypothesis?

  • @elinatugaeva6884
    @elinatugaeva6884 10 หลายเดือนก่อน

    Thank you for the explanation! I have a question on the chicken& egg problem: if we cannot calculate the variance of the difference of 2 means, how can we calculate the pooled var for proportions? We also do not know the success rate of the 2nd sample as we have not yet run an experiment

  • @lfengstone
    @lfengstone 4 หลายเดือนก่อน

    great lecture and thanks for sharing. But why is the two-sample pooled variance for proportion is the sum of the two samples's variance? should it be the 2*variance 1, because of the similar reason to it's mean counterpart?

  • @hokage5619
    @hokage5619 ปีที่แล้ว +1

    in case of single tail test will Z(1- a/2) change to Z(1 - a) ?

  • @askanimohankrishnaiitb
    @askanimohankrishnaiitb 8 หลายเดือนก่อน +3

    Hey Hi, I think the definition you gave for the type II error (beta) at around 8:05min is for power. Could you clarify that ?

    • @mayankanand507
      @mayankanand507 4 หลายเดือนก่อน

      What he mentioned is that probability of rejecting null hypothesis when alternative hypothesis is true, and that is the area under curve of alternate hypothesis for all Z

  • @juancruzguillen8288
    @juancruzguillen8288 15 วันที่ผ่านมา

    How would you do if you want to perform an A/B/C test?

  • @yeqinzhang
    @yeqinzhang 9 หลายเดือนก่อน

    how to answer this interview question? what if we cannot collect that much sample, what should we do?

  • @user-ew3oe6wl3v
    @user-ew3oe6wl3v 7 หลายเดือนก่อน

    17:34 Why is (15.68 * sample variance) / delta squared ≈ (16 & population variance) / delta squared? Is it because sample variance is almost equal to population variance?

  • @christiansetzkorn6241
    @christiansetzkorn6241 หลายเดือนก่อน

    should the variance not be multiplied by 2?

  • @ashisranjanlahiri
    @ashisranjanlahiri ปีที่แล้ว +2

    Video is good. It is better to explain what is beta before jumping into the power

  • @PureMoss
    @PureMoss 2 หลายเดือนก่อน

    Am I mistaken, or is the description of the Type II error at 8:05 incorrect? He says Beta is the "the probability of rejecting a null hypothesis when the alternative hypothesis is true." But isn't Beta/Type II error the probability of *not* rejecting a null hypothesis when the alternative is true? Genuinely trying to clarify to make sure I have proper understanding.

  • @farsikogama6114
    @farsikogama6114 8 หลายเดือนก่อน

    14:37 and 17:25 are the answer we are looking for 😄

  • @ChangKaiHua300
    @ChangKaiHua300 ปีที่แล้ว

    Hi sir, In the example i see you use MDE=20%, I am confused shouldn't it is normally be like 80% to 90%. is using 10%, 20% practical in real world?

    • @lucasbraga461
      @lucasbraga461 14 วันที่ผ่านมา

      Hi @ChangKaiHua300, I think you're confusing MDE with statistical power. MDE is the minimum detectable effect, that's the lift between control and treatment and 10% is already usually good enough, 20% is quite reasonable. However the statistical power's default in the industry is 80%, because we want to keep a type II error of maximum 20% (that's beta).

  • @harsharangapatil2423
    @harsharangapatil2423 3 หลายเดือนก่อน +1

    Why does every one just start writing the equation? Where is the deeper intuition?

  • @e.i.l.9584
    @e.i.l.9584 ปีที่แล้ว +1

    Hey, been loving your channel! I also have a similar background in college as you! I was wondering; Would you recommend a master in AI or statistics & data sciencr in order to become a data scientist and/or machine learning engineer?
    Stats would give me an European Master of Statistics (EMOS) and R knowledge.
    AI is more focused on python.
    What would give better opertunities down the road? honestly the stats would be easier to get higher grades than AI since its a killer master where I study it.
    My background is; double bachelor neuroscience and psychology, with a specialization in stats after which i knew it was what I really liked. Did a minor in data science and AI and studied mathematics on an exchange and did comp science and (discrete) math courses extracurricular.
    My goal is to work at a big tech firm but im unsure what gives better opertunities

    • @DataInterview
      @DataInterview  ปีที่แล้ว +1

      Hey, thanks for the post! Honestly, it really boils down to what you are interested in. Seems to me that you are mostly interested in developing and application of AI - in which case, computational neuroscience I think would be a perfect track. A combination of neuroscience, stats, and computer science may help you in the near term and long-term. Any internships you could snatch would be great in building a portfolio. Invest heavily on learning how to code, the math, and application of the latest algos like transformers, ChatGPT and so forth.

    • @e.i.l.9584
      @e.i.l.9584 ปีที่แล้ว

      @@DataInterview thank you!

    • @e.i.l.9584
      @e.i.l.9584 ปีที่แล้ว

      @@DataInterview I actually really want to go more towards machine learning engineering or data science. Would computational still be best then?

  • @zhaoyanzhi741
    @zhaoyanzhi741 6 หลายเดือนก่อน +1

    Very good and helpful explanation, but why pooled variance is to multiply by 2 instead of directly use the variance itself according to the link en.wikipedia.org/wiki/Pooled_variance