Data Engineer Mock Interview | SQL | PySpark | Project & Scenario based Interview Questions

Sumit Mittal

มุมมอง 44 732

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 2 พ.ย. 2024
𝐓𝐨 𝐞𝐧𝐡𝐚𝐧𝐜𝐞 𝐲𝐨𝐮𝐫 𝐜𝐚𝐫𝐞𝐞𝐫 𝐚𝐬 𝐚 𝐂𝐥𝐨𝐮𝐝 𝐃𝐚𝐭𝐚 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫, 𝐂𝐡𝐞𝐜𝐤 trendytech.in/... for curated courses developed by me.
I have trained over 20,000+ professionals in the field of Data Engineering in the last 5 years.
𝐖𝐚𝐧𝐭 𝐭𝐨 𝐌𝐚𝐬𝐭𝐞𝐫 𝐒𝐐𝐋? 𝐋𝐞𝐚𝐫𝐧 𝐒𝐐𝐋 𝐭𝐡𝐞 𝐫𝐢𝐠𝐡𝐭 𝐰𝐚𝐲 𝐭𝐡𝐫𝐨𝐮𝐠𝐡 𝐭𝐡𝐞 𝐦𝐨𝐬𝐭 𝐬𝐨𝐮𝐠𝐡𝐭 𝐚𝐟𝐭𝐞𝐫 𝐜𝐨𝐮𝐫𝐬𝐞 - 𝐒𝐐𝐋 𝐂𝐡𝐚𝐦𝐩𝐢𝐨𝐧𝐬 𝐏𝐫𝐨𝐠𝐫𝐚𝐦!
"𝐀 8 𝐰𝐞𝐞𝐤 𝐏𝐫𝐨𝐠𝐫𝐚𝐦 𝐝𝐞𝐬𝐢𝐠𝐧𝐞𝐝 𝐭𝐨 𝐡𝐞𝐥𝐩 𝐲𝐨𝐮 𝐜𝐫𝐚𝐜𝐤 𝐭𝐡𝐞 𝐢𝐧𝐭𝐞𝐫𝐯𝐢𝐞𝐰𝐬 𝐨𝐟 𝐭𝐨𝐩 𝐩𝐫𝐨𝐝𝐮𝐜𝐭 𝐛𝐚𝐬𝐞𝐝 𝐜𝐨𝐦𝐩𝐚𝐧𝐢𝐞𝐬 𝐛𝐲 𝐝𝐞𝐯𝐞𝐥𝐨𝐩𝐢𝐧𝐠 𝐚 𝐭𝐡𝐨𝐮𝐠𝐡𝐭 𝐩𝐫𝐨𝐜𝐞𝐬𝐬 𝐚𝐧𝐝 𝐚𝐧 𝐚𝐩𝐩𝐫𝐨𝐚𝐜𝐡 𝐭𝐨 𝐬𝐨𝐥𝐯𝐞 𝐚𝐧 𝐮𝐧𝐬𝐞𝐞𝐧 𝐏𝐫𝐨𝐛𝐥𝐞𝐦."
𝐇𝐞𝐫𝐞 𝐢𝐬 𝐡𝐨𝐰 𝐲𝐨𝐮 𝐜𝐚𝐧 𝐫𝐞𝐠𝐢𝐬𝐭𝐞𝐫 𝐟𝐨𝐫 𝐭𝐡𝐞 𝐏𝐫𝐨𝐠𝐫𝐚𝐦 -
𝐑𝐞𝐠𝐢𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐢𝐧𝐤 (𝐂𝐨𝐮𝐫𝐬𝐞 𝐀𝐜𝐜𝐞𝐬𝐬 𝐟𝐫𝐨𝐦 𝐈𝐧𝐝𝐢𝐚) : rzp.io/l/SQLINR
𝐑𝐞𝐠𝐢𝐬𝐭𝐫𝐚𝐭𝐢𝐨𝐧 𝐋𝐢𝐧𝐤 (𝐂𝐨𝐮𝐫𝐬𝐞 𝐀𝐜𝐜𝐞𝐬𝐬 𝐟𝐫𝐨𝐦 𝐨𝐮𝐭𝐬𝐢𝐝𝐞 𝐈𝐧𝐝𝐢𝐚) : rzp.io/l/SQLUSD
30 INTERVIEWS IN 30 DAYS- BIG DATA INTERVIEW SERIES
This mock interview series is launched as a community initiative under Data Engineers Club aimed at aiding the community's growth and development
Our highly experienced guest interviewer, Ankur Bhattacharya, / ankur-bhattacharya-100... shares invaluable insights and practical advice coming from his extensive experience, catering to aspiring data engineers and seasoned professionals alike.
Our talented guest interviewee, Praroop Sacheti, / praroopsacheti has a remarkable approach to answering the interview questions in a very well articulated manner.
Link of Free SQL & Python series developed by me are given below -
SQL Playlist - • SQL tutorial for every...
Python Playlist - • Complete Python By Sum...
Don't miss out - Subscribe to the channel for more such informative interviews and unlock the secrets to success in this thriving field!
Social Media Links :
LinkedIn - / bigdatabysumit
Twitter - / bigdatasumit
Instagram - / bigdatabysumit
Student Testimonials - trendytech.in/...
Discussed Questions : Timestamp
1:30 Introduction
3:29 When you are processing the data with databricks pyspark job. What is the sink for your pipeline?
4:58 Are you incorporating fact and dimension tables, or any schema in your project's database design?
5:50 What amount of data are you dealing with in your day to day pipeline?
6:33 What are the different types of triggers in ADF?
7:45 What is incremental load ? How can you implement it through ADF ?
10:03 Difference between Data Lake and Data Warehouse?
11:41 What is columnar storage in a data warehouse ?
13:38 What were some challenges encountered during your project, and how were they resolved? Describe the strategies implemented to optimize your pipeline?
16:18 Optimizations related to Databricks or pyspark ?
20:41 What is broadcast join ? What exactly happens when we broadcast the table ?
23:01 SQL coding question
35:46 PySpark coding question
Tags
#mockinterview #bigdata #career #dataengineering #data #datascience #dataanalysis #productbasedcompanies #interviewquestions #apachespark #google #interview #faang #companies #amazon #walmart #flipkart #microsoft #azure #databricks #jobs

ความคิดเห็น • 45

@rajnarayanshriwas4653 7 หลายเดือนก่อน ⁺³
For incremental laod why we go about MERGE or UPSERT. MERGE or UPSERT we use to implement SCD types. For incremental load what we want is to copy newly arrived data in ADLS. For which we keep track of some reference key, through which we can recognize the new data. For example, in an Order fact table lets say it is Order_ID which keeps on increasing whenever we get a new order.
@harshitgoel2985 7 หลายเดือนก่อน ⁺⁹
Please attach the questions list link(in view mode) that are asked in mock interview in description
@MrPython100 หลายเดือนก่อน
The better way to handle the location question scenario would be creating a hash map and use it to fetch complete location. This Hash map can be extended in future too. You can broadcast this hash map to make it more optimised if you are dealing with TB's of data.
@ShubhamYadav-gq6fe 7 หลายเดือนก่อน ⁺⁸
Please provide the interview feedback in few mins at the end to help more with this.
@PradyutJoshi 7 หลายเดือนก่อน ⁺⁶
Good initiative. This is quite helpful on how to answer the scenario based questions, with an example. Thank you sir, Ankur and Praroop! 🙌
@sumitmittal07 7 หลายเดือนก่อน
So nice of you
@3A3A11 7 หลายเดือนก่อน ⁺⁷
Sir please make videos on topics like " Someone working in Tech Support from past 5 years and now moving to Data Engineer" What they should write in their resume like in experience section... Whether should give try as fresher or whatever
@ravichakraborty3878 7 หลายเดือนก่อน ⁺¹
Sir, I also have the same question.
@Shivamyogi10 7 หลายเดือนก่อน ⁺¹
Yes that is very valuable. As most of the people are working in different roles but being in support roles in data field we are interested to switch into data engg.
@sumitmittal07 7 หลายเดือนก่อน ⁺²
surely will release a video on this soon
@yifeichen5198 7 หลายเดือนก่อน ⁺²
great content! very insightful questions and answers!
@sumitmittal07 7 หลายเดือนก่อน
Glad you enjoyed it!
@Journey_with_Subham 7 หลายเดือนก่อน ⁺⁵
Great Initiative Sumit Sir !
@sumitmittal07 7 หลายเดือนก่อน ⁺²
thank you. A big thanks to people who are participating in this.
@BooksWala 7 หลายเดือนก่อน ⁺³
Please also some video regarding what kinds of problems data engineer face in their day to days working
@sumitmittal07 7 หลายเดือนก่อน
noted, will bring a video on this soon
@WadieGamer 7 หลายเดือนก่อน ⁺²
Great video for new data engineers like me.
@sumitmittal07 7 หลายเดือนก่อน ⁺¹
Glad you enjoyed it
@NabaKrPaul-ik2oy 7 หลายเดือนก่อน ⁺¹
Hi Sir, Thanks for this series, very insightful. Just a query, does majority of the interviews goes till coding part or majority cases its theory only? or is it mix and match?
@sumitmittal07 7 หลายเดือนก่อน
Yes they do
@VaidehiH-v2l 7 หลายเดือนก่อน ⁺¹
thank you so much sumit sir its really helpful
@sumitmittal07 7 หลายเดือนก่อน
Happy to share more such informative videos for the community!
@gopalgaihre9710 7 หลายเดือนก่อน ⁺¹
Please make videos for freshers as well, because these days no one is looking for freshers for data engineering roles...
@sumitmittal07 7 หลายเดือนก่อน
will make a video for sure
@swapnildande4706 7 หลายเดือนก่อน ⁺¹
Hi Sir ,Request you to please upload more videos on Data engineer mock interview
@sumitmittal07 7 หลายเดือนก่อน
one video daily for next 30 days
@Raghavendraginka 7 หลายเดือนก่อน ⁺¹
sir please make complete video on sql and mock interviews too
@sumitmittal07 7 หลายเดือนก่อน
Definitely, will be covered in the upcoming videos
@RahulSaini-ng6po 7 หลายเดือนก่อน ⁺⁴
Hi Folks, below is the solution to the PySpark problem written in >>SCALA
@harish7548 หลายเดือนก่อน
we need to controll flow with cfg file for incremental dataload
not merge or upsert .
@ashwinigadekar2956 7 หลายเดือนก่อน ⁺¹
Please make interview session
for fresher.
@sumitmittal07 7 หลายเดือนก่อน
surely
@salonisacheti7350 7 หลายเดือนก่อน
Good Work Praroop ❤
@sumitmittal07 7 หลายเดือนก่อน
Praroop has rocked it.
@shivanisaini2076 6 หลายเดือนก่อน
I want to give mock interview.
@saurabhgavande6728 7 หลายเดือนก่อน
can u make a video for aws cloud as of azure
@sumitmittal07 7 หลายเดือนก่อน
surely
@digantapurkait6231 7 หลายเดือนก่อน
wahh
@joerokcz 7 หลายเดือนก่อน
😅
@data_eng_tuts 7 หลายเดือนก่อน
Really?
@karthikeyanr1171 7 หลายเดือนก่อน ⁺³
Solution for Pyspark Problem
def location_f(loc):
if loc == 'CHN':
return 'CHENNAI'
elif loc == 'AP':
return 'ANDHRA PRADESH'
elif loc == 'HYD':
return 'HYDERABAD'
else:
return loc
re_location = F.udf(location_f, StringType())
df1 = df.withColumn('ref_id1', F.split('ref_id','\DIV-|\_')).drop('ref_id')
df2 = df1.withColumn('ref_id', F.col('ref_id1')[2]).withColumn('location', re_location(F.col('ref_id1')[1]))
df3 = df2.select('name', 'ref_id', 'salary','location')
df3.show
@ArunKumar-mr7pc 5 หลายเดือนก่อน
from pyspark.sql.functions import col, lit,when
df_employee.withColumn("LOCATION",
when(col("REF-ID").like("DIV-CHN%"), "CHN-CHENNAI")
.when(col("REF-ID").like("DIV-HYD%"), "HYD-HYDERABAD")
.when(col("REF-ID").like("DIV-AP%"), "AP-ANDHRA PRADESH")
.when(col("REF-ID").like("DIV-PUNE%"), "PUNE-PUNE")).show()
@vishaldeshatwad8690 7 หลายเดือนก่อน ⁺¹
df_new = df.select(col("name"),col("refid"),col("salary"),split("refid","-")[1].alias("l"),split("l","_")[0].alias("loc")).drop(col("l"))
final_result_df = df_new.withColumn("location",when(col("loc")=="CHN","CHENNAI")\
.when(col("loc")=="HYD","HYDERABAD")\
.when(col("loc")=="AP","ANDRA_PRADESH")\
.when(col("loc")=="PUN","PUNE") ).drop("loc")

ต่อไป

เล่นอัตโนมัติ

Azure Data Engineer Mock Interview - Project Special