The Meta engg guy is on-point. Every stage of the pipeline/process has so many nuances which could have made this into a 2hr+ video - maybe consider doing a podcast version of this and make a guestlist of viewers who can submit questions? @Exponent
As a current master's student in data science actively job hunting, I must say this mock interview is incredible. Thank you so much, Vicram! Where can I find more of your content?
This is perfect content for even guys who are just looking to activate mental faculty in fullstack ML design. The whole scale of thought process from concept to concrete algorithms is super transparent for many
for the candidate generation, I would propose a funel model in which first I use some simple algorithms like logistic regression or Dtree or ANN which he used to quickly narrow down the search space to 1/1000 and then do more advanced techniques for refining it. I will use 2 tower network for ranking my candidates.
Two tower is for learning the embeddings. During serving we use the learnt embeddings from the two tower to located approximate nearest neighbor to the viewer's embedding. In reality we will have several parallel paths to generate candidates - here we show just one in the interest of time. Some of the candidate generation sources include: collaborative filtering (either two tower or matrix factorization), content filtering (keyword/interest matching), popular feeds, viral feeds, connected content feeds (content from socially connected creators) etc. The deeper model for ranking typically would use thousands of features (windowed aggregates, embedding aggregates, embeddings from pre-trained text/image/video processing models etc.). These models are compute and memory intensive to run and we want to only run them on a select thousand (or so) items for a specific viewer. This keeps the serving latency low. This deep ranking model typically has multiple heads (multi-task) with several predictors (like, comment, share etc.). These individual predictors are weighted to generate a score. Reverse sorted scores can be used for creating a candidate post-item list to show the viewer. Does that help clarify?
Thank you so much for this video. I have a question. So once the two-tower model is trained, for candidate generation, the embeddings for items are computed offline, and the user embedding is computed on the go with the user features, and that user embedding is used against the item embedding vectors for kNN. Is that correct? If so, since the output of the two-tower model is binary, where would I be getting the embeddings from? From a layer before the sigmoid?
As I understand, it is just 1 model for candidate selection and for ranking. Then why go to that same model twice? We generate post embedding asynchronously. Does Approximate Nearest Neighbour search is faster than taking dot product with all items. Also 0.5 ROC-AUC is not random prediction but a constant prediction for all values.
person has a lot of knowledge but is racing to complete his explanations (understandably so due to time limit of the mock interview) . A lot of knowledge is being shared but at this fast pace it might not be possible for the interviewer to keep up unless they too are at the same frequency. Would request the person to slow down a bit or check with the mock interviewer to see if she allows him a little more to explain given that this is being recorded for YT audience. Great explanation overall though and it helped tremendously. Thank you
also, things like extra post processing considerations and as well as production level deployment discussions are mark of a senior level engineer (E6) And it can be seen very clearly that this person had it. this discussion alone can make a difference between mid level position and senior position.
Great interview. I have always been confused, in an ML System design interview, should we focus on the ML model data/training/eval pipeline more or the inference pipeline( which is ore of a traditional system design) more ??
In an ML system design interview, you typically need to cover the entire process, including the problem statement, data engineering, modeling, and deployment. It's important to address both the data/training/evaluation pipeline and the inference pipeline. To determine where to focus more on, you may take cues from your interviewer or directly ask them for guidance.
The Meta engg guy is on-point. Every stage of the pipeline/process has so many nuances which could have made this into a 2hr+ video - maybe consider doing a podcast version of this and make a guestlist of viewers who can submit questions? @Exponent
As a current master's student in data science actively job hunting, I must say this mock interview is incredible. Thank you so much, Vicram! Where can I find more of your content?
Thanks for your feedback! You can check out the full interview course at bit.ly/4bUEPbF to see more of Vikram's mock interviews.
This is perfect content for even guys who are just looking to activate mental faculty in fullstack ML design. The whole scale of thought process from concept to concrete algorithms is super transparent for many
The best ML design interview I have seen so far!
The best machine learning mock interview on youtube!!!
for the candidate generation, I would propose a funel model in which first I use some simple algorithms like logistic regression or Dtree or ANN which he used to quickly narrow down the search space to 1/1000 and then do more advanced techniques for refining it. I will use 2 tower network for ranking my candidates.
Two tower is for learning the embeddings. During serving we use the learnt embeddings from the two tower to located approximate nearest neighbor to the viewer's embedding. In reality we will have several parallel paths to generate candidates - here we show just one in the interest of time. Some of the candidate generation sources include: collaborative filtering (either two tower or matrix factorization), content filtering (keyword/interest matching), popular feeds, viral feeds, connected content feeds (content from socially connected creators) etc.
The deeper model for ranking typically would use thousands of features (windowed aggregates, embedding aggregates, embeddings from pre-trained text/image/video processing models etc.). These models are compute and memory intensive to run and we want to only run them on a select thousand (or so) items for a specific viewer. This keeps the serving latency low. This deep ranking model typically has multiple heads (multi-task) with several predictors (like, comment, share etc.). These individual predictors are weighted to generate a score. Reverse sorted scores can be used for creating a candidate post-item list to show the viewer.
Does that help clarify?
One comment could be that two tower network should also be categorized as collaborative filtering
Very educational! Loved it, Keep on brining more ML interviews.. :)
he's cracked. great job to you both
just out of curiosity, do you think the performance is good enough to pass a senior level MLSD interview?
Could you guys please make these interviews easier to understand? It seems to confuse more than make understand.
hire this guy because Meta's recommendation engine is completely broken
That was both remarkable and educational! Excellent session, folks 👍🏾👍🏾 Thanks for sharing.
Thank you so much for this video. I have a question. So once the two-tower model is trained, for candidate generation, the embeddings for items are computed offline, and the user embedding is computed on the go with the user features, and that user embedding is used against the item embedding vectors for kNN. Is that correct? If so, since the output of the two-tower model is binary, where would I be getting the embeddings from? From a layer before the sigmoid?
yes, just apply user tower or item tower.
Where is the cycle of learning? How about monitoring? When do we train? Do we automate it? how?
As I understand, it is just 1 model for candidate selection and for ranking. Then why go to that same model twice? We generate post embedding asynchronously. Does Approximate Nearest Neighbour search is faster than taking dot product with all items.
Also 0.5 ROC-AUC is not random prediction but a constant prediction for all values.
This was a great mock interview. Thanks for sharing it.
person has a lot of knowledge but is racing to complete his explanations (understandably so due to time limit of the mock interview) . A lot of knowledge is being shared but at this fast pace it might not be possible for the interviewer to keep up unless they too are at the same frequency. Would request the person to slow down a bit or check with the mock interviewer to see if she allows him a little more to explain given that this is being recorded for YT audience.
Great explanation overall though and it helped tremendously. Thank you
also, things like extra post processing considerations and as well as production level deployment discussions are mark of a senior level engineer (E6) And it can be seen very clearly that this person had it. this discussion alone can make a difference between mid level position and senior position.
can someone explain the label part in the two tower model ?
Great interview. I have always been confused, in an ML System design interview, should we focus on the ML model data/training/eval pipeline more or the inference pipeline( which is ore of a traditional system design) more ??
In an ML system design interview, you typically need to cover the entire process, including the problem statement, data engineering, modeling, and deployment. It's important to address both the data/training/evaluation pipeline and the inference pipeline. To determine where to focus more on, you may take cues from your interviewer or directly ask them for guidance.
What tool is he using to write??
The tool is "Whimsical"!
What level would the candidate pass with this answer? Senior? Staff?