Facebook Data Scientist Mock Interview - Segment Influencers

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 มิ.ย. 2024
  • Get access to mock interviews and courses on data science and ML engineer interviews on datainterview.com.
    This is a 30-minute mock interview based on a phone screen of a product data science role on Facebook. The interview was conducted by an Ex-PayPal data scientist who currently works in FAANG.
    Parts:
    0:00 Statistics Questions
    9:50 SQL Questions
    20:00 Case Questions
    #datascience #machinelearning #dsinterview

ความคิดเห็น • 111

  • @DrEhrfurchtgebietend
    @DrEhrfurchtgebietend 2 ปีที่แล้ว +15

    Watching her struggle with a simple SQL question really made me feel better

  • @nanfengbb
    @nanfengbb 2 ปีที่แล้ว +75

    This is by far the best DS mock interview I have seen on youtube, in terms of the authenticity of the coding environment, interview structure/flow/time control and interaction between the interviewee and interviewer. Thank you for making this! It would be great if you can provide feedback/comments to the interviewee at the end so we can learn what areas she did great/where she can further improve. The interviewee is amazing too! I am curious, what is her experience (YOE and job title)? Thanks.

  • @datahat642
    @datahat642 3 ปีที่แล้ว +2

    The case study has been worked in detail. An additional important feature could be if there any other influencers following the particular user under consideration

  • @tamabebe5551
    @tamabebe5551 3 ปีที่แล้ว +36

    Hello, I don't know why people are being so cold, you did great on the interview.

  • @junyanyao6977
    @junyanyao6977 3 ปีที่แล้ว +15

    The case study probably want to follow this structure: 1. why you want to distinguish influencer account ? [let's see it's for better target ads, or use these informations in recommender system, etc] 2. What kind of data are available to us (account contextual information and behavioral information)? 3. Clarify which features that can be helpful (can talk about some classification models here, but mainly should be features insights) 4. clarify which features are most important (from product sense and machine learning points- e.g. permutation importance, gini importance) 5. Summarize it.

  • @XXX-cn7gj
    @XXX-cn7gj 3 ปีที่แล้ว +1

    Interview was fine. Its a mock for a reason, and they all tend to differ here and there. Practicing is better than not practicing at all!

  • @jlh530i1
    @jlh530i1 3 ปีที่แล้ว +3

    ... a friend of mine was asked to write an algorithm for search autofill during the case portion of their interview

  • @torinojuve1
    @torinojuve1 3 ปีที่แล้ว +19

    Hi - was a Facebook DS and I gave many interviews. This is nothing like the Facebook DS interview.

    • @FuyangLiu
      @FuyangLiu 3 ปีที่แล้ว +1

      So what the real ones differ from this?

    • @asthasrivastava9564
      @asthasrivastava9564 3 ปีที่แล้ว +3

      Can you please share the real experience, please?

  • @ASOT666
    @ASOT666 2 ปีที่แล้ว +1

    amazing, super helpful!

  • @MegaAntimason
    @MegaAntimason 3 ปีที่แล้ว +38

    The first sql answer is incorrect you cant filter on rank yet, you have to create a sub query.

    • @luhan5129
      @luhan5129 2 ปีที่แล้ว

      agree

    • @browser1232
      @browser1232 2 ปีที่แล้ว

      Ugh, came to the comments just to say this. That was pretty bad.

    • @ipvikas
      @ipvikas 2 ปีที่แล้ว

      Correct MYSQL query is:
      select user_name, ROW_NUMBER() over () as 'Rank'
      from Messages
      window w as (partition by date order by message_sent/message_received desc)

  • @simonhafner4750
    @simonhafner4750 2 ปีที่แล้ว +3

    Thanks a lot for sharing? May I ask which level this mock interview is meant for?

  • @pvss2000
    @pvss2000 3 ปีที่แล้ว +1

    For the influencer versus non-influencers, could you do something where first you identify those who actually have content that has products that are being 'advertised', then you correlate the presence/views of that video with sales of that product. If correlation reaches above a certain point then they are an influencer.

  • @reanschwarzer1026
    @reanschwarzer1026 3 ปีที่แล้ว +37

    The third question about the confidence interval of logistic regression is kind of misleading and challenge from the interviewee's perspective. More clarification work should help to understand like if it is the logit format or probability format. First, the question is asking if log-odds (logit) could be 0, I think it is possible, log(p/1-p) definitely could be zero when p=1-p, then you jumped to the confidence interval of the odds ratio, which is kind of tricky if you are treating the odds ratio and log odds as the same stuff (odds ratio is not taking log). The odds ratio format should be like the exp(beta), then when 1 included in the CI, that means beta could be zero since exp(0)=1, then accept the null hp to say beta coefficient is not significant.

    • @qifeizhang4834
      @qifeizhang4834 3 ปีที่แล้ว +2

      your comments is very helpful!

    • @sujaykha
      @sujaykha 3 ปีที่แล้ว +2

      Yup exactly what I though!.. thanks to uploader for great content though 👌

  • @brothermalcolm
    @brothermalcolm 3 ปีที่แล้ว +16

    I feel like this is not the typical fb style interview, but I definitely learned something useful here!

    • @orangethemeow
      @orangethemeow 2 ปีที่แล้ว +1

      It doesn't seem like an analytic role. FB prep session mentions that ML is not required for the analytics track.

    • @huanchenli4137
      @huanchenli4137 ปีที่แล้ว

      @@orangethemeow Most DS roles at FB are just DA or BI, not real DS

  • @hotmilkritata
    @hotmilkritata 2 ปีที่แล้ว +1

    Like the stat questions

  • @xiaowenkang9598
    @xiaowenkang9598 หลายเดือนก่อน

    👍thank you so much

  • @bcws
    @bcws 8 หลายเดือนก่อน

    Isn' the Beta of the logistic regression the change in Y (or log odds in this case) given a 1 unit change in X?
    If so, then it is possible for Beta to be 0 (or 0 to be in beta's confidence interval) as that implies a 1 unit change in x does not have any change in log odds. However, if we want to look at odds, then we need to take the exponential of Beta, in which case it is not possible for the confidence interval of exponential of Beta to contain 0.
    The confidence interval here is not referring to the log odds, but the change in log odds given a change in x.

  • @ni12907
    @ni12907 3 ปีที่แล้ว +9

    Hey the font size is too small, can you please post the questions somewhere?

    • @DataInterview
      @DataInterview  3 ปีที่แล้ว +1

      Sure thing. Noted for the next video.

  • @PremiumTrackerSilverStacker
    @PremiumTrackerSilverStacker 2 ปีที่แล้ว +4

    I don't think she answered the question right on the log odds correctly. CI in log odds is insignificant if it includes 0. CI for odds is insignificant for including 1

  • @vnpikachu4627
    @vnpikachu4627 3 ปีที่แล้ว +13

    The first sql you have to create a subquery, or use HAVING instead of WHERE.

    • @weiyangshi4729
      @weiyangshi4729 2 ปีที่แล้ว

      Would filtering using HAVING work here? I thought SELECT is executed after HAVING. Correct me if I'm wrong!

  • @AniltonNeto
    @AniltonNeto 3 ปีที่แล้ว +6

    13:04 is bad, cuz the result for the division is undefined, in this case, you change NULLIF(field, 1) instead :-P and filter zero values :)

    • @StraightCrossing
      @StraightCrossing 3 ปีที่แล้ว +5

      I would prefer to filter the data so there just isn't null or 0 with WHERE message_recieved > 0

    • @vvalk2vvalk
      @vvalk2vvalk 3 ปีที่แล้ว

      @@StraightCrossing My thoughts exactly.

    • @orangethemeow
      @orangethemeow 2 ปีที่แล้ว

      @@StraightCrossing Same. Then we don't have to worry about those 0s

  • @techsavy5669
    @techsavy5669 2 ปีที่แล้ว +3

    What was the experience in years for interviewer & interviewee ?

  • @cooldudesheks
    @cooldudesheks 3 ปีที่แล้ว

    Thanks for such an insightful content!
    I have a clarification question on 3rd stat problem. You asked if log-odds i.e. logit value can be 0 or not. Since the logit scale is -infinity to +infinity, log-odds can have 0 values dont they? She answered cannot have 0 but minimum of 1.
    I would appreciate if you can clarify if that was the right answer or I am missing something here. Thanks again! 👍

    • @neethualan6543
      @neethualan6543 2 ปีที่แล้ว

      Interviewer asked what if CI of log odds contains zero. However answer was based on odds=1 (there is no association between independent and dependent variable). When log odds = 0 then there is no statistical significance. Answer is correct as odds = 0, means log odds = 1. Then either question or answer should’ve been more clear.

  • @ajitkirpekar4251
    @ajitkirpekar4251 2 ปีที่แล้ว +4

    Thank god it wasn't expected to derrive the MLE. Also, I am a bit surprised FB expects someone to remember the OLS matrix equations for beta coefficients. I mean, it was lasered into my brain sure, but I am not sure that's proof of anything other than I happened to commit it to memory. I also happened to commit the equations for generalized method of moments, but that's also not proof of anything.

    • @joelwillis2043
      @joelwillis2043 2 ปีที่แล้ว

      Well, she commuted the solution but it's not commutable. Her matrix product is not compatible. If you cant derive it from the residual sum of squares you probably don't understand anything from calculus.

    • @tuanseattle
      @tuanseattle 2 ปีที่แล้ว

      Yeah, i thought the answer would be simply said OLS (because we do not do it by hand...). But it looked like equations need to be remembered lol

    • @joelwillis2043
      @joelwillis2043 2 ปีที่แล้ว

      @@tuanseattle Again, there is literally nothing to remember. Just take the derivate of the residual sum of squares and set it to 0 and solve. It is a very simple calculation. The analog of what you learned in 1st-semester calculus.

    • @stanislavdidenko8436
      @stanislavdidenko8436 ปีที่แล้ว

      @@joelwillis2043 I can derive it, but during an hour or so, sitting with pen a paper. it is not trivial, because you are dealing with matrix forms and at some point you have to abstract it from partial derivative to the gradient form solution. It is not interview format task to derive it. I am 3years middle DS. There was no a single day in my carrier where this skill was needed.

  • @jaeen7665
    @jaeen7665 2 ปีที่แล้ว +1

    Dang coefficient would've gotten me off the bat. Idda said run regression and print the summary...whoops.

  • @vvalk2vvalk
    @vvalk2vvalk 3 ปีที่แล้ว +16

    Thank you for the video. Pretty informative.
    This shows imposter syndrome is real.
    I do understand that there were follow-up interviews and further rounds, but it does give much more confidence, given that it is a SENIOR interview at FACEBOOK.
    I am now actually considering to try out Data Scientist path some time in the future.

    • @DataInterview
      @DataInterview  3 ปีที่แล้ว +3

      A lot of people have imposter syndrome to some degree even those with many years of industry experience. I've been a data scientist for 5 years (2 years non-tech and 3 years in tech), and I still experience the syndrome at times. But, over time, you experience it less as you gain more experience.

    • @naraendrareddy273
      @naraendrareddy273 ปีที่แล้ว

      Oh thank you for letting me know this is a senior DS interview. I'm trying to become a junior DS first.

  • @chemtech7
    @chemtech7 3 ปีที่แล้ว +12

    I have never been asked these type of statistics questions or to derive formulas or coefficients on a data science interview.

    • @redcloud6975
      @redcloud6975 3 ปีที่แล้ว +7

      Y’all getting interviewed?😂😭

    • @Hephasto
      @Hephasto 3 ปีที่แล้ว +3

      What questions do you get then?

    • @jimbocho660
      @jimbocho660 2 ปีที่แล้ว +1

      @@Hephastobrief explanation of the idea behind ensembling of models; advantages and disadvantages of decision trees; the hyperparameters of a random forest classifier; detecting and explaining multicolinearity; basic probability especially simple conditional probability calculations; how to regularize neural networks; basic SQL and so on.

    • @Tusharchitrakar
      @Tusharchitrakar 5 หลายเดือนก่อน

      ​@@jimbocho660but these questions seem way easier than ones that need deeper insight into mathematical revelations. I guess it depends on the company

  • @adamdreier
    @adamdreier 3 ปีที่แล้ว

    That function in JavaScript is annoying me, please use ES6 arrow function for binding.

  • @mehmetedex
    @mehmetedex 3 ปีที่แล้ว +4

    her keyboard I imagine made of keys made of ten inch springs with wooden top :D

    • @LouisChiaki
      @LouisChiaki 3 ปีที่แล้ว

      It sounds like a very expensive mechanical keyboard!

    • @dsgarden
      @dsgarden 3 ปีที่แล้ว +2

      Dude you need a shave asap, will make you 1000 yr younger

  • @toshb1384
    @toshb1384 3 ปีที่แล้ว +8

    3:15 - isn’t (X’X)^(-1)X’y derived from the maximum likelihood estimate? I thought the correct answer would be stochastic gradient descent.

    • @DataInterview
      @DataInterview  3 ปีที่แล้ว +5

      The equation is the Least Squares Method that provides an unbiased estimation of a regression parameter. Maximum likelihood estimation is a different parameter estimation technique that maximizes the likelihood of a model given data. You can use SGD to run MLE. But, unlike the least-squares method, the maximum likelihood estimation does not always lead to an unbiased estimation of a regression parameter.

    • @DataInterview
      @DataInterview  3 ปีที่แล้ว +1

      Hope this clarifies your question :)

    • @toshb1384
      @toshb1384 3 ปีที่แล้ว +1

      @@DataInterview thanks for the response. I guess what I’m trying to say: isn’t MLE the same thing as least squares? You can derive the least squares solution directly from maximum likelihood estimation, and you get the same solution. stats.stackexchange.com/questions/143705/maximum-likelihood-method-vs-least-squares-method

    • @mrblahblihblih
      @mrblahblihblih 3 ปีที่แล้ว +1

      yeah I think that's true to use SGD for non closed form, but deriving the beta coefficients from least squares and from MLE should actually give you the same, (X’X)^(-1)X’y

    • @DataInterview
      @DataInterview  3 ปีที่แล้ว +3

      @Tosh B and @wjyu_, thanks for the comments. It's actually not correct to assume that the MLE is the same as the Least Squares Method. Even the author of the StackOverFlow comment notes that it's "equivalent" under certain conditions. That is, it provides the same solution under certain conditions (in this case the estimation of parameters in a linear model). But, just because the solutions are the same, it doesn't mean that the methods are the same. The Least Squares Method minimizes the distance between the target and projection vectors with no stochastic assumptions. MLE, on the other hand, estimates parameters by maximizing a likelihood function such that the observed data is most likely. Additionally, the Least Squares Method leads to unbiased estimations of model parameters. However, MLE can sometimes lead to biased estimations.

  • @oliesting4921
    @oliesting4921 3 ปีที่แล้ว

    Hardly see anything...dark and font too small

  • @user-ox6wk4je2m
    @user-ox6wk4je2m 9 หลายเดือนก่อน

    The first question, Do we really need a maximum likelihood estimate to deal with getting beta coefficients for regression problem? I think it will only been used in classification, right? Will gradient descent be the correct answer?

    • @ecotrix132
      @ecotrix132 4 หลายเดือนก่อน

      OLS, MLE , Grandient descent are different ways

  • @md.imrulhasan8757
    @md.imrulhasan8757 2 ปีที่แล้ว

    done

  • @maddoo23
    @maddoo23 ปีที่แล้ว

    Um, the expression for beta is wrong (first question). its -
    beta = (X'X)^(-1)X'Y

  • @pal999
    @pal999 3 ปีที่แล้ว +17

    It would be helpful to post the correct answers at some point in the future

    • @konataizumi5829
      @konataizumi5829 3 ปีที่แล้ว +13

      They never do. It sucks.

    • @user-ie2qq2ik9x
      @user-ie2qq2ik9x 3 ปีที่แล้ว +1

      @@konataizumi5829 at least some kind of grade for the interviewee would be pretty informative

    • @kristofmeszaros4924
      @kristofmeszaros4924 3 ปีที่แล้ว +2

      I may be wrong but the sql question seemed pretty straightforward, shouldn't the solution just be
      select m.user_name , m.date, max(m.message_sent/m.message_received) as "Ratio"
      from Messages as m
      where m.message_received > 0
      group by m.user_name , m.date
      order by Rate desc

    • @huzuvettin
      @huzuvettin 3 ปีที่แล้ว

      @@kristofmeszaros4924Looks much better to me tbh. Except the "rate" should be "ratio" as aliased earlier right?

    • @huzuvettin
      @huzuvettin 3 ปีที่แล้ว

      Exactly, I get that even watching this interviews applies somewhat knowledge to us but without the correct answers what values should we take as ground truth table am I right B-)

  • @genuinebasilnt
    @genuinebasilnt 3 ปีที่แล้ว +13

    I read the title *A Facebook data scientist mocks interviews*

  • @ariss3304
    @ariss3304 3 ปีที่แล้ว

    I’m going with a reverse engineering path into college, please tell me I don’t have to learn these things.

    • @ariss3304
      @ariss3304 3 ปีที่แล้ว +1

      Specifically the beta coefficient part

    • @DataInterview
      @DataInterview  3 ปีที่แล้ว +1

      If someone asked me these topics when I started learning stats 7 years ago, I would have been frightened myself. But, years of diligent studies, and working in multiple DS jobs helped me develop confidence. I'm confident you will go through a similar experience as well. Here's a video with a commentary that should provide a more "gentle" introduction to the interview in DS: th-cam.com/video/lthBkTN8Vpk/w-d-xo.html&lc=UgxNhFz1NZlAPWn1pI54AaABAg

  • @naraendrareddy273
    @naraendrareddy273 ปีที่แล้ว

    WTF? I didn't know they would go so deep into statistics. Multivariate regression? Derive the Beta coefficient? Wow, I'm stumped right at the beginning. :(

  • @jeremythompson-seyon5463
    @jeremythompson-seyon5463 3 ปีที่แล้ว +1

    Where do I start if I want to learn the skills needed to go into data science? I just started a statistics class and Ive been really interested in the modeling and practical applications. I only barely understand the basics of R and SQL to give you an idea of where my knowledge is. Thanks for the video

    • @DataInterview
      @DataInterview  3 ปีที่แล้ว

      I would start with Kaggle. Emulate worked out examples provided by the community members on the site.

    • @caremacosta1435
      @caremacosta1435 3 ปีที่แล้ว +1

      Depends on the knowledge you already have.
      Learn python
      Highschool math is enough
      I would recommend the book machine learning for absolute beginners, is not that long and it summirizes basic concepts very well
      Keagke has many courses, for you to learn basics on machine learning, SQL, data visualization etc
      And practice practice practice

  • @CommentaryCentral
    @CommentaryCentral 3 ปีที่แล้ว +2

    This is the sort of stuff we covered on the Msc Data Science course in the UK, I cant believe its a senior level interview

    • @Cooldownman197
      @Cooldownman197 3 ปีที่แล้ว +14

      You just covered not implemented

    • @DataInterview
      @DataInterview  3 ปีที่แล้ว +28

      Some of these are covered in undergraduate level as well but, when you are under pressure, and a breath of things are covered in 30 minutes, it’s a much different experience than getting a homework assignment.

    • @CommentaryCentral
      @CommentaryCentral 3 ปีที่แล้ว

      @@DataInterview yeh im sure you are right

    • @tuanseattle
      @tuanseattle 2 ปีที่แล้ว

      "Covered" does not mean it is not hard because people forget stuffs that they do not use frequently. University covers a lot of stuffs

  • @superfreiheit1
    @superfreiheit1 3 ปีที่แล้ว +1

    Cant see anything to small. Zoom in

    • @DataInterview
      @DataInterview  3 ปีที่แล้ว

      Thanks Joe. Duly noted for the next video.

    • @superfreiheit1
      @superfreiheit1 3 ปีที่แล้ว

      @@DataInterview Can you see something on the video?

  • @oaasal
    @oaasal 3 ปีที่แล้ว +6

    Is that a junior level interview?

    • @DataInterview
      @DataInterview  3 ปีที่แล้ว +3

      Senior

    • @oaasal
      @oaasal 3 ปีที่แล้ว +3

      @@DataInterview That sounds easier than I thought. Maybe I should change my job.

    • @DataInterview
      @DataInterview  3 ปีที่แล้ว +1

      @@oaasal Doesn’t hurt :) Do note that this emulates a phone screening. On-site and case studies are another set of challenges. Often much more challenging than phone screening. If you are Interested in prep content, make sure to check out www.topds.io

    • @orangethemeow
      @orangethemeow 2 ปีที่แล้ว +2

      @@DataInterview This domain doesn't exist anymore :(

    • @DataInterview
      @DataInterview  2 ปีที่แล้ว +1

      @@orangethemeow Go to datainterview.com

  • @phyrajkumarverma4412
    @phyrajkumarverma4412 10 หลายเดือนก่อน

    Hi, I also want to give my mock interview.
    Could you take it please?
    I am doing my graduation and currently, I am in 3rd year of computer science.
    I want to be good in data science

  • @zakarie
    @zakarie 2 ปีที่แล้ว +1

    Actually the confidence interval is not interpreted as the chance that true value falls in the interval but the accurate interpretation should be there is 95% probability that the random interval falls on the true value.

  • @sirongzeng4096
    @sirongzeng4096 3 ปีที่แล้ว

    Anyone want to come and join a group of mock interview for data analyst? I'm looking for people to mock together, in aspects of coding, behavioral questions, and resume. Thanks!

    • @sirongzeng4096
      @sirongzeng4096 3 ปีที่แล้ว

      Or there is a group or slack channel, please let me know! Thanks!

    • @md.imrulhasan8757
      @md.imrulhasan8757 2 ปีที่แล้ว

      I want to join... Can you please include me ?

  • @ipvikas
    @ipvikas 2 ปีที่แล้ว

    Sql#1: Correct MYSQL query is:
    select user_name, ROW_NUMBER() over () as 'Rank'
    from Messages
    window w as (partition by date order by message_sent/message_received desc)

  • @sourabhsharma9830
    @sourabhsharma9830 2 ปีที่แล้ว

    That is not the confidence interval, that is credible interval. Confidence interval means 95 % of the time the estimated beta coefficient will predicted the correct result which “y”.
    To get 95 % confidence of beta coefficient we need to use Bayes parameter estimation which will give you a posterior distribution of beta coefficient with 95% credible interval.

  • @miraarora8142
    @miraarora8142 3 ปีที่แล้ว +5

    solution for 1st SQL Question:
    select
    t_date,
    user_name
    from messages
    where message_received != 0
    group by t_date, user_name
    order by sum(message_sent)::float/sum(message_received) desc
    limit 1;

    • @srk312
      @srk312 2 ปีที่แล้ว +8

      for each day...one row..not one row overall

  • @beaglesnlove580
    @beaglesnlove580 3 ปีที่แล้ว

    Lol these questions are a joke. I broke into fb. Least squires, MLE or gradient descent. Ans:
    Logistic regression, or something classsifier.
    Confidence intervals-these are estimates of regression variables.
    Presence of 0, u have to do t-test on the individual variable