Overall a great interview, a few other points he could have added to the discussion: - A minute or two on the offline metrics like AUC, F1 Score, P & R etc - A bit on making sure there are some systems to alert if there are any failures in production - A bit on the mechanics of A/B testing
This is very well thought of by Sid. One question - why are there no diagrams to draw on whiteboard for this? Usually, all interviews ask you to draw a system design flow chart/diagram of some type.
Use two tower network user and items, turn it to floating point embeddings. normalize the continuous features like age then concatenate the embedding to user and item embeddings. Then use softmax approximations to find reduce time complexity.
Seems like she was trying to lead him down the path of segmenting engagement metrics to tie it back to refining the model to meet stakeholder priorities/business outcomes. If you are performing quite well keeping existing users engaged with your recommendations, incrememtal further improvements to that are not going to impact revenue as much as preventing churn- where you need to focus on new and low-engagement users. Instead, try to build separate recommenders for those groups- so you dont need to degrade the performance of your power-user recommendation enginge in search of preventing churn.
Just starting the video, but evaluating negative recommendations based on simply not clicking I feel can lead to issues with recommenders. I think that you need to distinguish between soft negatives like not clicking, and some sort of hard negative (like user feedback, or clicking away very quickly). From a stakeholder perspective, getting people to expand the number of different artists/songs they are listening to is beneficial, and you can expect users to need to see a recommendation a few times before it is given a negative label and used for your latent embedding.
Just curious, when subscribing to the event-based system (the one that sending out json objects), why do we need to dump that into object storage of some sort? Why not just having our data pipeline listening to that event-based system (Kafka maybe?)
Make sure you're interview-ready with Exponent's machine learning case interview course: bit.ly/488VWnC
Overall a great interview, a few other points he could have added to the discussion:
- A minute or two on the offline metrics like AUC, F1 Score, P & R etc
- A bit on making sure there are some systems to alert if there are any failures in production
- A bit on the mechanics of A/B testing
This is very well thought of by Sid. One question - why are there no diagrams to draw on whiteboard for this? Usually, all interviews ask you to draw a system design flow chart/diagram of some type.
Use two tower network user and items, turn it to floating point embeddings. normalize the continuous features like age then concatenate the embedding to user and item embeddings. Then use softmax approximations to find reduce time complexity.
Seems like she was trying to lead him down the path of segmenting engagement metrics to tie it back to refining the model to meet stakeholder priorities/business outcomes. If you are performing quite well keeping existing users engaged with your recommendations, incrememtal further improvements to that are not going to impact revenue as much as preventing churn- where you need to focus on new and low-engagement users. Instead, try to build separate recommenders for those groups- so you dont need to degrade the performance of your power-user recommendation enginge in search of preventing churn.
Just starting the video, but evaluating negative recommendations based on simply not clicking I feel can lead to issues with recommenders. I think that you need to distinguish between soft negatives like not clicking, and some sort of hard negative (like user feedback, or clicking away very quickly). From a stakeholder perspective, getting people to expand the number of different artists/songs they are listening to is beneficial, and you can expect users to need to see a recommendation a few times before it is given a negative label and used for your latent embedding.
As a aspiring Data scientist this video was so useful in Understan thought process and how to approach a problem❤❤
Sid was great. I hope he gets a raise at FanDuel so he can upgrade from a closet to a studio/1bed.
would’ve liked to see him talking through evaluation of model during training
Great video with very nice insights. Learned a lot from this interview. Keep up the good work!!!
Just curious, when subscribing to the event-based system (the one that sending out json objects), why do we need to dump that into object storage of some sort? Why not just having our data pipeline listening to that event-based system (Kafka maybe?)
Decent video but it mainly caters to junior ML engineers.
Deep Cross Network is better in terms of feature interaction in recommendation system.