Applied Statistics Interview Question | Google Data Scientist Interview

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 ก.ค. 2024
  • 👉 Ace Data Science & ML Engineer Interviews with www.datainterview.com/pricing
    👉 Join the Data Scientist Interview Bootcamp: www.datainterview.com/bootcam...
    What are the benefits of enrolling in the Premium content? 👇
    ✍️ Learn from 30+ Hours of Video and Text-Based Courses created by interview experts who worked in top companies like Google and Meta!
    📚 200+ Actual Interview Questions + Detailed Solutions - Get practice questions seen in actual interviews with detail solutions solved by engineers from top companies (e.g. Google & Meta)
    📝 Cover Core Areas in Technical Interviews including AB testing, product sense, applied statistics, machine learning, business case, SQL, data science coding and much more!
    ⭐ Become an SQL Pro with Interactive Pad with 100 SQL questions, easy to hard-level questions asked in top companies + highly optimized solutions.
    🎥 Watch Mock Interview Videos with real candidates and an interviewer at top companies.
    💭 Join the Private Chat Group to practice interview questions with peers and instructors. And, network with peers for your next job!
    Join premium prep on 👉 www.datainterview.com/pricing

ความคิดเห็น • 24

  • @DataInterview
    @DataInterview  ปีที่แล้ว +1

    Want more questions like this? Join the prep community on www.datainterview.com/ 🚀

  • @gupnir
    @gupnir ปีที่แล้ว +6

    This question was asked in my Walmart interview. Wish you released this video before

  • @AngelofWar16
    @AngelofWar16 ปีที่แล้ว +12

    We could use the binomial distribution, it would be more accurate. And we would compute the probability of getting less then 30 heads plus probability of getting more than 70 heads. And then we would compare it with 0.05 threshold.

    • @sssam844
      @sssam844 ปีที่แล้ว

      This is how I've learned statistics at a german uni

    • @pravinborate1500
      @pravinborate1500 11 หลายเดือนก่อน

      Yes I think.... This is what hit my mind when I saw the question

    • @robertwilsoniii2048
      @robertwilsoniii2048 11 หลายเดือนก่อน

      Exactly. I can't believe people are paid 6 figures to do z tests...

    • @robertwilsoniii2048
      @robertwilsoniii2048 11 หลายเดือนก่อน

      It feels like a practical joke. I thougjt they'd be doing stuff like mixed model regression and hardcore generalized modeling.

    • @kylerasmussen4921
      @kylerasmussen4921 11 หลายเดือนก่อน +1

      Remember, we are learning this for interviewing. "Z Test" is doing p values based on the cumulative distribution function of the normal distribution. Since binomial distribution converges due to CLT, its much easier to use. The alternative is to use the cumulative (mass) distribution function of the binomial distribution, which in fact changes based on N, which makes it much harder to do in practice.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w ปีที่แล้ว

    more videos like this. great format.

  • @nu940
    @nu940 ปีที่แล้ว

    Thanks, this is a good explanation of the problem

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w ปีที่แล้ว +1

    Be great to still see the results in case of Bayesian approach

  • @shritejchavan6222
    @shritejchavan6222 3 หลายเดือนก่อน

    what if we calculate the confidence Interval for Population Proportion using the standard error of the estimate using sqrt(0.7*0.3/100) and check whether 0.5 lies in the resulting Confidence Interval?

  • @xinyili7450
    @xinyili7450 ปีที่แล้ว +1

    could you give a more detailed answer for why used z-test here, many thanks!

    • @DataInterview
      @DataInterview  ปีที่แล้ว +1

      Hi! Though the outcome is binary, you can represent this in proportion. When your sample size is large enough, given CLT, the sampling distribution of proportion becomes normal.
      Z-test assumes that the sampling distribution is normal for the test to work. In this case, the condition is satisfied as mentioned above.
      So that’s why Z-Test for proportion works here.

  • @jaspreetsingh-nr6gr
    @jaspreetsingh-nr6gr ปีที่แล้ว

    I was expecting MLE parameter estimation, but that is also sort of bayesian. Do u agree/disagree with that? It does rely on bayesian principles.

    • @heyman620
      @heyman620 ปีที่แล้ว +2

      It's based on the CLT (central limit theorem). I am not sure how you intend to use MLE for it, maybe I don't understand enough but it seems like MLE for that kind of task is just the mean. If you can generate more data, the law of large numbers would let you estimate mu directly!

    • @kylerasmussen4921
      @kylerasmussen4921 11 หลายเดือนก่อน

      @@heyman620 LLN just provides that a stochastic process will tend to the sample mean asymptotically if the process is stationary. CLT will already start converging by 100 data points.
      That being said, MLE is a statistical method to try and determine the PDF of the data, but doesn't make statements like "bias". You still need to do hypothesis testing, whether that be chi square or otherwise.

    • @heyman620
      @heyman620 10 หลายเดือนก่อน

      ​​@@kylerasmussen4921Remember that at the end, all you test is that the means of two groups are different, you can do it by gazillion ways. When you have a small amount of data you can make some assumptions regarding the distribution, e.g. assume it's normal.
      That being said, you don't have to, you can actually use Chebyshev's inequality to mimic the test I believe. But assuming normality makes a lot of sense in this setup. What I think is, that all you need is the means, the variance, and a way to know how likely it is to be an estimation error. I guess I just stated it implicitly but you are right. Given infinite data and finite variance you don't need a test though, i.e., LLN :). Very nice comment, thanks!

  • @robertwilsoniii2048
    @robertwilsoniii2048 11 หลายเดือนก่อน +1

    Why not just do a Chi-square goodness of fit test?

  • @heyman620
    @heyman620 ปีที่แล้ว

    It's a nice solution but I think that once you figure out you can use the normal distribution to do so talking about "computing z value" is a little 3rd grade. I think what's more important is knowing the assumptions, i.e, independence. And understanding that this test is, in fact, based on the CLT (this mean is sampled from the distribution of the means!).
    Honestly - sorry, I am not sure I would give you a perfect score for the interview since you just used a statistical test blindly (pass for sure).

    • @KumarHemjeet
      @KumarHemjeet หลายเดือนก่อน

      Ho would you solve this problem then?

    • @heyman620
      @heyman620 หลายเดือนก่อน

      So much better to just simulate it... You can assume normality because of the convergence in distribution to normal in this setup, which stems from the CLT. However, this convergence happens as n -> \infty and here n is fixed to 100.
      Instead, you can actually simulate with P=0.5 and get a better estimation. Just find how many of the outcomes in your simulation have at least 70 heads, let K be this number and N the number of simulations. Your p-value is K/N (the null hypothesis is that P=0.5 and you count instances in which it is as observed in the description).
      That's a form of Bootstrapping.

    • @heyman620
      @heyman620 หลายเดือนก่อน

      @@KumarHemjeet Like, what would you do if I tell you it is 11 of 13, would you still assume normality?