A/B Testing Analysis Made Easy: How to Use Hypothesis Testing for Data Science Interviews!

Emma Ding

มุมมอง 64 889

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 ม.ค. 2025

ความคิดเห็น • 117

@emma_ding 3 ปีที่แล้ว ⁺¹⁵
Correction:
Thanks Yidan Shang -- At 9:12, the Spool calculated should be 1.06 instead of 1.099.
Thanks Ruby Jiang -- At 7:38, the mean of treatment is 1.7 instead of 2. The subsequent calculations should be changed accordingly.
@bettergreta464 ปีที่แล้ว
hi Emma, thanks for this. I have a question regarding 9.12, why don't we use SS divided by N? that's the formula for standard deviation i think ?
@Quasar_Energy 3 ปีที่แล้ว ⁺⁴⁰
What I love about your channel is that you don't charge $300 to unemployed job seekers for this information.
@jiercc3138 3 ปีที่แล้ว ⁺¹²
我觉得这个系列真的非常不错。每一个视频虽然看起来短，但是把要点提炼的非常好，详略得当，非常goal oriented，就是为了interview来的。谢谢！
@sijunjiang9744 ปีที่แล้ว ⁺³
Hi Emma, thank you for your valuable video. In the video at 5:57, dmin = 0.05, but at dmin = 0.01. I am a little confused the value of dmin. why it was changed from 0.05 to 0.01 and the value is determined randomly or it could be calculated by some formula? Thanks
@CityInvisible 2 ปีที่แล้ว ⁺²
My hiring manager actually recommended this series of videos. Super helpful for someone who doesn't have much business experience. Thank you!
@Crtg17 2 ปีที่แล้ว ⁺²
Hi Emma - Thank you so much for the sharing! I have learnt a lot. While I would like to clarify the formula you used to calculate the SE in the Confidence Interval of two sample proportions (4:12). The formula you used is the SE in the test statistic of two proportion Z-test, but the SE for the CI should be different (sqrt(p1*(1-p1)/n1 + p2*(1-p2)*n2) . Please correct me if I am wrong here. Thanks
@ys2660 2 ปีที่แล้ว
you are right
@zbear404 3 ปีที่แล้ว ⁺⁵
I recommended you to all my classmates. Excellent work and presentation of what is needed!
@johnstephen399 3 ปีที่แล้ว ⁺²
At 5:05, should you not be using the Z-score for alpha = 0.025 instead of 0.05 since you're using a two-tailed test?
@emma_ding 3 ปีที่แล้ว
For two-sample tests, usually we test if one is greater than the other. One-tailed tests would make the most sense.
@korkutkaynardag9147 ปีที่แล้ว
why do we use the same z value used for p value to calculate the confidence interval. Should we not choose the z value for calculating confidence interval based in 0.01 practical significance boundary (6:50)?
@yidanshang1382 3 ปีที่แล้ว ⁺³
Hey Emma, great video and love the decision flow chart. One quick question - at 9:12, why the Spool =1.099? I calculated the pooled variance =1.13, and the square root of that would be 1.06 instead of 1.099.
@emma_ding 3 ปีที่แล้ว
Thanks for pointing out the mistake.
@hieification 3 ปีที่แล้ว ⁺³
CI in the second case is coming inside the practical boundary for me. Am is missing something? CI for d = 0.633(+ or -) 2.2018 (Multiply 2.002*1.0998). so the range is -1.56 and 2.83. Really helpful video. Thanks Emma!
@emma_ding 3 ปีที่แล้ว ⁺²
Yep, I believe you are missing a multiplier 0.258 for the margin of error (time 9:18), then you'll get the CI 0.0648 to 1.201. Hope it helps! Let me know if you have other questions :)
@kuipan5968 3 ปีที่แล้ว ⁺²
@@emma_ding Hey, I have the same question. Where is 0.258 coming from? CI = d +/- Z*SE, right? Here d = 0.633, Z = 2.2018, and SE = 1.0998.
@lakshmank 3 ปีที่แล้ว ⁺¹
@@emma_ding Hi Emma, Thanks for the video. I have the same question as well. Isn't the width of CI = Z*SE? Why are you multiplying Z*SE again with 0.258? Seems to be mistake in the calculation in your video to calculate CI boundaries?
@hezhaojiang3525 3 ปีที่แล้ว
@@emma_ding Isn't the width of CI = Z*SE? Why are you multiplying Z*SE again with 0.258?
@pushinhuang2872 2 ปีที่แล้ว
Same questions here
@TiantianGao ปีที่แล้ว
Great content!!! The best explanation of z-test and T-test on TH-cam! Great examples!!! Feel very lucky to find you here🙏! Thank you!!!
@emma_ding ปีที่แล้ว
Thanks so much for your kind words! Happy to help. 😊
@firesongs 2 ปีที่แล้ว
5:22 How do we know that the center of the CI is 0.012?
@jiercc3138 3 ปีที่แล้ว
One question Emma. at 7:43, the mean of the treatment is 2. However, by calculating directly from the data array you gave, the mean in treatment I got is 1.7. This can be validated by the sum of squares since if you use 2, the SS of the treatment group at 9:10 will actually be 37 instead of 34.3. Could you double-check? And also could let us know that why the difference of mean at 9:22 is 0.633? Since it would be 0.6 if you use 2 - 1.4, and 0.3 if you use (1.7 - 1.4). Thanks!
@arojitdas8256 ปีที่แล้ว
All your videos are gold mine.
Keep up the good work
@tonghooooo1383 3 ปีที่แล้ว ⁺¹
Hi Emma, thanks for making these great videos. I have a quick question about the term you use at 9:00. Should it be pooled standard deviation instead of pooled standard error here?
@AIVenturePulse หลายเดือนก่อน
Hi Emma, thank you for the video! Quick question, at 9.30, when looking at the t table, why do we use 0.975 to check rather than 0.95?
@jasonsj 2 ปีที่แล้ว
Thanks Emma for the video! Very helpful! is it a typo on 9:10, should it be “pooled” standard deviation? Cause it is formula to calculate sd instead of se, thanks！
@sooryaprakash6390 3 ปีที่แล้ว
quality of content is top notch! thanks for making these videos .looking forward to learn more from you.
@viviangong6760 ปีที่แล้ว
Great explanation Emma! Nice work! For the second case, can you show in formula and calculation of how did you come up with the lower bound is more than Dmin 0.05?
@star_7776 ปีที่แล้ว
Thank you Emma, I am learning a lot, God bless you! I finally feel I understand this topic.
@emma_ding ปีที่แล้ว
I'm so happy to help! 😊
@TiantianGao ปีที่แล้ว ⁺¹
Hi Emma,
I have a question about the sample size. Do the control and treatment group have to have the exact same sample size? For example it’s 1000 users here for both. Can I have control group 1023 users, and my treatment group have 1048 users? Will this affect our result? Thank you!
Thanks you for all the great contents!!! It significantly helped me understand better about hypothesis testing! ❤
@vijayjayaraman5990 3 หลายเดือนก่อน
How did you determine the practical significance boundary?
@dantongzhu1310 3 ปีที่แล้ว ⁺¹³
Hi Emma, how did you calculate the CI in the second example exactly when assuming similar variance for both control and treatment? I got the margin of error = t-score*SEpool = 2.002*1.0999 = 2.202. Then with \hat{d} = 0.6, which would then give a very big CI that includes the entire [-dmin, dmin] = [-0.05, 0.05]. But it seems like in your slides the CI is strictly on the right side of [-dmin, dmin]. I'm very confused and would appreciate some help! Otherwise, your videos have been super super helpful!! Thanks.
@paramawasthi24 3 ปีที่แล้ว ⁺⁴
Margin of error would be t-score*Spool (1/(1/nc +1/nt)^1/2), which would come to be ~0.51 which +/- from d-hat (0.6) would be above the significance level of 0.05
@chloehuajingjiang9128 ปีที่แล้ว
@@paramawasthi24 can I ask why we can’t use Spool we calculated using (SS/df) ^1/2 = 1.0999?
@yueleji8892 3 ปีที่แล้ว
Thank you sooooo much Emma! I have trouble with combining the hypothesis and A/B testing knowledge together, your video saved me!!!
@yueleji8892 3 ปีที่แล้ว
And also a quick question, would you explain the difference between the practical significant boundary and the minimum detect effect? Thank you!
@hermit597 3 ปีที่แล้ว
@@yueleji8892 Wouldn't that be the same? Since practical significance measures the effect size, to me it makes sense for it to be the minimum detectable effect - established before the test is conducted.
@wuru6097 3 ปีที่แล้ว
Hey Emma at 5:52, when calculating the margin of error, the Z you used is 1.96 which is from statistical significance level instead of the practical significance boundary. Could you please confirm if this what's supposed to be used? Thank you!
@jithendrayenugula7137 3 ปีที่แล้ว ⁺³
Really great video thanks 😋 I appreciate the effort 🙏
@tangled55 3 ปีที่แล้ว
Hi Emma, around 2:28, where you say "Bernoulli" population, I think, to be clear, you want to make a designation. Bernoulli deals with the data which only has ONE trial and two possible outcomes, but the Binomial is the collection of Bernoulli trials for the same event (multiple trials of the same event). So since we're doing a hypothesis test on the number of successes in multiple trials, the assumption is that successes follow a binomial and not a Bernoulli, right?
@ristyping 2 ปีที่แล้ว ⁺¹
Yes I was thinking the same thing. The population cannot be a Bernouli because it has n amount of trial with two possible outcomes. Even in this case, just simple have n>1 for both samples indicates that it is a Binomial.
@cococnk388 2 ปีที่แล้ว
The concepts does not change …
@pro100olga 3 ปีที่แล้ว ⁺¹
Thanks a lot for your channel! It helped me to prepare and get a job offer! :)
@leizhang1699 3 ปีที่แล้ว ⁺²
Hi Emma, really appreciate that you made all the great videos, which is very helpful. I was wondering if you can make some videos about how to handle the take home challenge such as Lyft and Airbnb. Any information will be highly appreciated! Thanks
@emma_ding 3 ปีที่แล้ว ⁺¹
Take home challenge would be an interesting topic. Stay tuned!
@hello-pd7tc 3 ปีที่แล้ว
so so so helpful! Thank you Emma
@elderpinzon7686 3 ปีที่แล้ว ⁺⁴
Did anyone else try to reproduce the results of the two-sample test of means? I get that the mean of the treatment is 1.7 (not 2.0 like in the video). This changes the conclusion, the result is not statistically significant.
I think it's a mistake since my calculation for the pooled standard error (which uses the standard deviation of the treatment) matches perfectly
@elderpinzon7686 3 ปีที่แล้ว
Using ttest_ind from scipy.stats I confirm my result. I think there is a typo somewhere in the video
@ey2392 3 ปีที่แล้ว
agree
@cococnk388 2 ปีที่แล้ว
Hello miss,
In some books, they make use of permutation test to carry on hypothesis testing to analyse A/B experiment’s results ?
Can you tell us when to use Permutation test or the test you have presented in your video (binomial… t and z test) ?
Thanks
@maheshchandra5717 3 ปีที่แล้ว ⁺⁴
Hey Emma, just a quick question, have you heard of Strata Scratch? Is it a good platform to practice interview-style SQL and Python coding questions? Are the questions actually asked in those companies which are tagged? Would love to hear your thoughts.
@emma_ding 3 ปีที่แล้ว ⁺¹
I'm not familiar with the platform. I have only used LeetCode and hackerRank.
@kylehuang7926 3 ปีที่แล้ว ⁺³
I use both - Nate's very good at explaining SQL and Emma is good at statistical and product sense questions
@ankityadav-eq7fe 2 ปีที่แล้ว
How did we get practical significance boundry?
@Han-ve8uh 3 ปีที่แล้ว
Thanks for showing all pooling formulas and concepts in one place. At 1:16 and 7:50 you mentioned practical significant boundary of 0.01 and 0.05, how does the experimenter come up with these values? Is it from calculating business costs and revenues?
You talked about checking CI, which reminds me of something confusing i read from point 4 on hookedondata.org/guidelines-for-ab-testing/. Could you comment on why the cases she cites are possible? (CI that is wide and close to 0 vs CI that is tiny and far away.)
What i don't understand is assuming the same p-value (not sure if this assumption is required for this discussion), how can a CI be tiny (think she means narrow )AND far away simultaneously? CI width depends on standard error, so a narrow CI means a narrower sampling distribution of whatever statistic, and the centre of CI should be closer to the null hypothesis sampling distribution centre compared to the CI with a larger width. Is my reasoning correct that a narrower CI would have a centre that is closer to the null centre? How to understand what Emily is saying there then? She seems to say the narrower CI can have a centre of CI that is further.
@emma_ding 3 ปีที่แล้ว
Typically, those values are given during interviews. If not, you can discuss with the interviewer.
@SuperLOLABC 3 ปีที่แล้ว ⁺³
Great video as always Emma! I have a question, is it alright to schedule a Technical phone screen 3 weeks out and the on-site interview a whole month after the technical phone screen? Is it possible to get rejected by the recruiter if I ask for such far out interview dates?
@emma_ding 3 ปีที่แล้ว
You can totally discuss it with the recruiter. No worries. Your recruiter wants to help you with to do your best. :)
@dogugunozkaya4605 หลายเดือนก่อน
isn't the confidence interval between d-t*SE and d+t*SE in the second example. In that case it must be between 0.05-2.002*1.0998 and 0.05-2.002*1.0998 which calculates to -2.15 and +2.25
@nelsonchou1023 2 ปีที่แล้ว
Hi Emma, I'm analysing a conversion A/B test result. I wonder how to account for the issue that a change of conversion is due to the different directions of numerator (checkout sessions) and denominator (homepage sessions) ? E.g. the homepage sessions reduced while the checkout only increased slightly or no change at all? Can we conclude that the treatment group actually performs better? Thanks.
@dwardster 3 ปีที่แล้ว ⁺¹
Do interviewers ever ask to calculate test stats or confidence intervals? In that case can we look up or ask for the formulas?
@emma_ding 3 ปีที่แล้ว
Hi there, good question! It would really depend on the interviewer, but I'd suggest to remember the equation for commonly used hypothesis testing, eg. 1 sample and 2 sample z-tests and t-tests.
@dwardster 3 ปีที่แล้ว
@@emma_ding thank god for remote interviews 😉
@allenlu3021 3 ปีที่แล้ว ⁺¹
Hey Emma great video, really helping me understand the process of AB testing! From watching all of your series on this topic, one thing I'm having trouble understanding is the relationship between MDE and practical significance. I understand MDE is used to calculate sample size such that the sample size calculated is the sample size needed to detect statistical significance at the magnitude of our MDE. In my mind I thought the point of the MDE is such that we can have our null hypothesis be "variant is not larger than control by given MDE (if MDE is positive)"; however, the case seems to be null hypothesis is just control metric != treatment metric. Is the MDE used later on then as the practical significance boundary you mentioned in this video then and it doesn't have anything to do with determining statistical significance beyond helping estimate our initial sample size?
@namandoshi4478 3 ปีที่แล้ว
did u find an answer to this?
@rantao1593 2 ปีที่แล้ว
One question - do we also look at p value to decide if there is a statistical significance, or we only need to compare test statistics and z-score?
@cccspwn 3 ปีที่แล้ว
What is the difference between using the practical significance boundary and minimum detectable effect?
@anaspatankar6999 3 ปีที่แล้ว
Why does the difference in sample proportions (d) follow a normal distribution? is it because the sample size is large enough?
@rash_mi_be 3 ปีที่แล้ว
Hi
How did you find critical Z score in the first part as 1.96?
@shashikantprajapati7364 ปีที่แล้ว
Hi @emma_ding thanks for creating such great and informative videos on A/B testing. I did not understand how did you calculate SS(sum of squares for control and treatment groups)?
@akankshakumar731 3 ปีที่แล้ว
Can you do a video on Chi - squared test, like here the click through probability would be characteristic type.
@travissun6753 3 ปีที่แล้ว
Hi Ding, How can we find the most precise flow chart of statistical test, there are versions of them on google.
@JoelPrinceVarghese 3 ปีที่แล้ว
In your example, what is the timeframe for the average number of posts? If I just want my average posts per user to go up, how do I decide the timeframe to run the test? Also, say I wanted average posts per day to go up and I have data for the same users across multiple days, how do I check my hypothesis then? Hope this makes sense.
@Sn-nw6zb 3 ปีที่แล้ว
Pretty good explanations with example. Thank you. Do all companies use t-test and z-test or do they use boostrap test by running simulations?
@jasonwong8315 3 ปีที่แล้ว ⁺¹
Where did the practical sig boundary come from？ Rule of thumb？
@emma_ding 3 ปีที่แล้ว ⁺²
Typically, those values are given. If not, you can discuss them with the interviewer during the interview.
@jonsings 3 ปีที่แล้ว
@@emma_ding Thanks for clarifying i also had this question.
@ashwinmanickam 2 ปีที่แล้ว
8:00 Case 2 T - test
@SegunAdelowo-t7p 3 ปีที่แล้ว
Hello Emma,
Thanks for all your helpful content :)
I attempted the steps you gave in the video but was unable to solve this question.
In Loan application analysis task. What is the best way to solve this Hypothesis testing?
What's the effect of owning a car on the likelihood of a loan application being accepted?
Own_car attribute is Yes(1) or No(0)
loan_application_accepted attribute is True(1) or False(0)
The dataset has 1000records
own_car value count:
No 598
yes 405
loan_application_accepted count:
False 703
True 300
own_car value count where loan_application_accepted is True:
No 187
yes 113
Null Hypothesis: Owning a car doesn’t affect loan application acceptance
Alternate Hypothesis: Owning a car does affect loan application acceptance
@SegunAdelowo-t7p 3 ปีที่แล้ว
How should this Hypothesis test be computed: What's the effect of owning a car on the likelihood of a loan application being accepted?
@dadaunion 2 ปีที่แล้ว
Hey Emma @Data Interview Pro, Thanks for the great video. But I am quite confuse regarding the CI for second case. As the practical interval is 5%, but the CI you calculated is the actual number (number of post interval). Just wondering how they can be compared.
@jonglee8162 3 ปีที่แล้ว
Hi Emma, thanks for the video! How do we know that we have a large sample size?
@emma_ding 3 ปีที่แล้ว
For sample size, you can refer to the diagram in the part 1 video. th-cam.com/video/IY7y-t30UJc/w-d-xo.html
@shreyaschaturvedi1933 3 ปีที่แล้ว
excellent video! i had one question though: what do you do if you are dealing with unequal variances in your control and treatment group? I'm assuming you can't calculate SE using the pooled formula you have shown, right? would appreciate some advice on this!
@xinyuechang6062 2 ปีที่แล้ว
I am very confused by the practical significant boundary, why in example 1, dmin =0.01, and in example 2, its 0.05?
@sandeepgupta2 3 ปีที่แล้ว
Amazing tutorial !!!
@harryfeng4199 2 ปีที่แล้ว
This is a BLESSING. Thnk u so much.
@emma_ding 2 ปีที่แล้ว
You are welcome 😊
@pushkarajpalnitkar1695 3 ปีที่แล้ว
Great video! Coding rounds are also big part of the interview process. Can you please make some videos on that too? That will be great. Thank You.
@emma_ding 3 ปีที่แล้ว ⁺²
Stay tuned!
@sinarashidian9888 3 ปีที่แล้ว
Thanks for going through these problems step by step. In interviews, are we supposed to use built-in libraries (for instance scipy) for these questions or implementing everything from scratch? First one shows we are familiar with libraries, latter one shows we know the math :) I am not sure which way is the best one to go.
@emma_ding 3 ปีที่แล้ว
Good question! In most cases using build-in libraries would be good enough, but I'd suggest always check with the interviewer on the requirement before you diving into coding anything.
@wclin3872 3 ปีที่แล้ว
Thank you for sharing this! I have a question - in example 2, although we don't know the population variance, but the sample size is large. Can we use the sample standard deviation to estimate the population standard deviation (large sample size) so that we can use Z-test? Thanks!
@myworldAI 3 ปีที่แล้ว
Hi , l have 2 sets of the supermarket customer bring their own plastic bag sample.
One sample is the supermarket provide free plastic bag, the other sample is the supermarket charging a fee for plastic bag. What kind of statistics test should I use ? Can I use Two -sample test of proportions ? Thanks
@emma_ding 3 ปีที่แล้ว
Yes you can.
@myworldAI 3 ปีที่แล้ว
@@emma_ding Thank you very much👍👍👍👍👍
@karundeep07 3 ปีที่แล้ว
Hey Emma,
Thanks for this amazing video..
Just wanted to know ... why did we picked 0.01 as practical level of significance... we could have picked any other value as well like 0.02 or 0.03 or 0.04 (any thing < 0.05 ( α - statistical level of significance)).
Does practical level of significance less need to be = 0.01 ?
@emma_ding 3 ปีที่แล้ว
In practical, a company would pick a practical significance level makes the most sense. 0.01 is just an example to show you the difference between statistical and practical significance. Hope it helps!
@adooby001 3 ปีที่แล้ว
@@emma_ding Is practical significance the similar to MDE? In my past roles, I've seen the MDE that was agreed on during experimental design to serve as the practical significance.
@jonsings 3 ปีที่แล้ว
Super helpful!
@jamesy6213 3 ปีที่แล้ว
非常感谢！讲的比学校老师好太多了！！
@cococnk388 2 ปีที่แล้ว
Thanks 😊
@WashingMykale 3 ปีที่แล้ว ⁺²
Just to confirm, in the 2-sample test of means, were you doing a 2-sided test? because I saw that you looked up the t-score under the 0.975 column for alpha = 0.05.
@mrakashgupta 3 ปีที่แล้ว
+1 to that - though, I saw in one of Khan Academy video, for one tail test with alpha = 0.05, we need to refer 97.5% against required df to get critical T value for comparison. @Emma, please share your thoughts.
@cynsunn 3 ปีที่แล้ว ⁺¹
I can't thank you enough
@rubyjiang8836 3 ปีที่แล้ว ⁺³
Mean of treatment is 1.7 not 2. Then I think the second example is not significant...
@ey2392 3 ปีที่แล้ว
Agree
@zahramovahedinia1896 หลายเดือนก่อน
👌❤️
@yij9010 ปีที่แล้ว
did you put on a programmer plaid shirt😂
@alexisdamnit9012 ปีที่แล้ว
Coming from a statistics background, this is some weird notation. That aside, I think she's doing well, but the notation just throws me off. Data science is a strange mess
@flying3152 2 ปีที่แล้ว
Super helpful!!!
@emma_ding 2 ปีที่แล้ว
Glad you think so!

ต่อไป

เล่นอัตโนมัติ

Mastering Hypothesis Testing for Data Science Interviews: Binomial, Z-test, and T-test