Such great tips, especially the "it's OK to make mistakes" and "don't shoe-horn a problem into a framework" parts. I quickly realized after a few interviews that making mistakes is inevitable; what makes or breaks an interview is whether you can recognize your mistakes and seek the right kind & amount of help from the interviewer so you can get back on track. Super stressful but also a fun and realistic process. I'm of two minds about frameworks, though. On the one hand, I find it hard to think thoroughly about what matters for a specific industry (e.g., social media, finance, multisided markets, etc.) without using a framework of some sort, but on the other, I agree 100% that thinking from first principles is key for data scientists. Curious how you go about striking a balance between being thorough vs. being specific to the problem.🤔
I do agree with many of these tips. Mostly because test cases are arbitrary and plain stupid as most of them want you to build an entire data science process, including app and data architecture with itself is stupid. If a company gives you a test case asking for data infrastructure and want you to be familiar with everything, just skip that company.
I have a Data Engineer Case Study interview on Friday, so still helpful. Just need to reframe in regards to Data Infrastructure, Database Design, etc. based on what scenario I am given. Any other thoughts are much appreciated.
It should be something like, given the scenario where I am to design an ETL data pipeline, I need to first consider if I should process data on-premise or cloud. I would mention that I want to take full advantage of server less computing so that I would not have to deal with all of the hardware overhead issues, and find a more fault-tolerant, scalable data storage solution. Thus, I will integrate AWS services such as EMR as a data processing framework and DynamoDB as a storage solution. I will incorporate working with a Python script to run parallel processing. I will work with Terraform to automatically launch services based on desired features. To operationalize, I will work with Docker and Kubernetes...okay I think I need to study more on operationalizing data systems.
If you prefer to read rather than watch this video, check out the blog post I wrote on this, and other, topics here ➡ www.emmading.com/resources
Emma 我通过了atlassian的first round product case 面试!我用了你的framework然后面试官说他很impressed!
太棒啦!为你开心,加油!
@@emma_ding 谢谢你!
Such great tips, especially the "it's OK to make mistakes" and "don't shoe-horn a problem into a framework" parts. I quickly realized after a few interviews that making mistakes is inevitable; what makes or breaks an interview is whether you can recognize your mistakes and seek the right kind & amount of help from the interviewer so you can get back on track. Super stressful but also a fun and realistic process. I'm of two minds about frameworks, though. On the one hand, I find it hard to think thoroughly about what matters for a specific industry (e.g., social media, finance, multisided markets, etc.) without using a framework of some sort, but on the other, I agree 100% that thinking from first principles is key for data scientists. Curious how you go about striking a balance between being thorough vs. being specific to the problem.🤔
Thank you! This came in perfect time. I have an onsite tomorrow :)
how did it go
Superb Emma!
Your videos are really good!, Thanks for the tips
Thank you so much. Really nice video.
I do agree with many of these tips. Mostly because test cases are arbitrary and plain stupid as most of them want you to build an entire data science process, including app and data architecture with itself is stupid.
If a company gives you a test case asking for data infrastructure and want you to be familiar with everything, just skip that company.
Good content and presentation, sadly only 7.5K views. You need to do some promo...
@emma_ding thanks for doing these awesome videos. They have been an incredible resource in my prep.
I have a Data Engineer Case Study interview on Friday, so still helpful. Just need to reframe in regards to Data Infrastructure, Database Design, etc. based on what scenario I am given. Any other thoughts are much appreciated.
It should be something like, given the scenario where I am to design an ETL data pipeline, I need to first consider if I should process data on-premise or cloud. I would mention that I want to take full advantage of server less computing so that I would not have to deal with all of the hardware overhead issues, and find a more fault-tolerant, scalable data storage solution. Thus, I will integrate AWS services such as EMR as a data processing framework and DynamoDB as a storage solution. I will incorporate working with a Python script to run parallel processing. I will work with Terraform to automatically launch services based on desired features. To operationalize, I will work with Docker and Kubernetes...okay I think I need to study more on operationalizing data systems.