15 Data Engineering Interview Questions in less than 15 minutes Part-1 #bigdata #interview

Processing 25GB of data in Spark | How many Executors and how much Memory per Executor is required.

Cloud Data Engineer Mock Interview | PySpark Coding Interview Questions |Azure Databricks #question

【พากย์ไทย】ฮ่องเต้เมาและหลับไปกับนางใน แต่นางในตั้งท้องมังกรทันที จึงได้รับการแต่งตั้งเป็นพระมเหสี

ทัวร์สตรีมเมอร์ ROV รอบชิงชนะเลิศ | ชิงเงินรางวัลรวม 25,000 บาท

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

Top 15 Spark Interview Questions in less than 15 minutes Part-2

Sumit Mittal

มุมมอง 31 517

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 20 ธ.ค. 2024

ความคิดเห็น • 11

@killbiilpandey 3 หลายเดือนก่อน
sumit mittal sir, your vedios are gave me huge knowledge. thank you🤝🤝
@piyushjain5852 6 หลายเดือนก่อน ⁺⁷
how number of stages = no of wide transformations + 1 ?
@sugunanindia 6 หลายเดือนก่อน
In Apache Spark, the number of stages in a job is determined by the wide transformations present in the execution plan. Here's a detailed explanation of why the number of stages is equal to the number of wide transformations plus one:
### Transformations in Spark
#### Narrow Transformations
Narrow transformations are operations where each input partition contributes to exactly one output partition. Examples include:
- `map`
- `filter`
- `flatMap`
These transformations do not require data shuffling and can be executed in a single stage.
#### Wide Transformations
Wide transformations are operations where each input partition can contribute to multiple output partitions. These transformations require data shuffling across the network. Examples include:
- `reduceByKey`
- `groupByKey`
- `join`
Wide transformations result in a stage boundary because data must be redistributed across the cluster.
### Understanding Stages
#### Stages
A stage in Spark is a set of tasks that can be executed in parallel on different partitions of a dataset without requiring any shuffling of data. A new stage is created each time a wide transformation is encountered because the data needs to be shuffled across the cluster.
### Calculation of Stages
Given the nature of transformations, the rule "number of stages = number of wide transformations + 1" can be explained as follows:
1. **Initial Stage**: The first stage begins with the initial set of narrow transformations until the first wide transformation is encountered.
2. **Subsequent Stages**: Each wide transformation requires a shuffle, resulting in the end of the current stage and the beginning of a new stage.
Thus, for `n` wide transformations, there are `n + 1` stages:
- The initial stage.
- One additional stage for each wide transformation.
### Example
Consider the following Spark job:
```python
from pyspark import SparkContext
sc = SparkContext.getOrCreate()
# Sample RDD
rdd = sc.parallelize([(1, 2), (3, 4), (3, 6)])
# Narrow transformation: map
rdd1 = rdd.map(lambda x: (x[0], x[1] * 2))
# Wide transformation: reduceByKey (requires shuffle)
rdd2 = rdd1.reduceByKey(lambda x, y: x + y)
# Another narrow transformation: filter
rdd3 = rdd2.filter(lambda x: x[1] > 4)
# Wide transformation: groupByKey (requires shuffle)
rdd4 = rdd3.groupByKey()
# Action: collect
result = rdd4.collect()
print(result)
```
**Analysis of Stages**:
1. **Stage 1**: Includes `parallelize`, `map`. This is all narrow transformations.
2. **Stage 2**: Starts with `reduceByKey` (a wide transformation) which triggers a shuffle.
3. **Stage 3**: Includes `filter`, which is a narrow transformation.
4. **Stage 4**: Starts with `groupByKey` (another wide transformation) which triggers another shuffle.
So, there are two wide transformations (`reduceByKey` and `groupByKey`) and three stages (`number of wide transformations + 1`).
### Conclusion
The number of stages in a Spark job is driven by the need to shuffle data between transformations. Each wide transformation introduces a new stage due to the shuffle it triggers, resulting in the formula: `number of stages = number of wide transformations + 1`. This understanding is crucial for optimizing and debugging Spark applications.
@basedsonu 5 หลายเดือนก่อน ⁺⁸
bhai ne bola bapu dikhta toh bapu dikhta
@SabyasachiManna-sw2cg 5 หลายเดือนก่อน ⁺²
To answer your questions, Let's assume you have one wide transform function reduceByKey() , it will create two stages stage0 and stage1 where shuffling is involved. I hope it helps you.
@AmitBenake 3 หลายเดือนก่อน ⁺¹
For wide transformation there is shuffling of data. Like the extra stage is used for aggregation of data.
@maninderbhambra3796 27 วันที่ผ่านมา
Sumit sir please add real time use case...for data skewness i feel answer is not correct .salting is the right method...
@cajaykiran หลายเดือนก่อน
What is logic behind the answer - 2 wide transformations would result in 3 stages?
@vaibhavj12 7 หลายเดือนก่อน ⁺¹
Helpful❤
@AnkitChaudhary-f2m 15 วันที่ผ่านมา
Could you please stop the background music or can you remove background music
Your video are very helpful but the background music 🎶 sometimes irritating
@ReadingForLearning 3 วันที่ผ่านมา
Some questions are too easy and should not be asked in interviews.

ต่อไป

เล่นอัตโนมัติ

15 Data Engineering Interview Questions in less than 15 minutes Part-1 #bigdata #interview

15 Data Engineering Interview Questions in less than 15 minutes Part-1 #bigdata #interview

Processing 25GB of data in Spark | How many Executors and how much Memory per Executor is required.

Processing 25GB of data in Spark | How many Executors and how much Memory per Executor is required.

Cloud Data Engineer Mock Interview | PySpark Coding Interview Questions |Azure Databricks #question

Cloud Data Engineer Mock Interview | PySpark Coding Interview Questions |Azure Databricks #question

【พากย์ไทย】ฮ่องเต้เมาและหลับไปกับนางใน แต่นางในตั้งท้องมังกรทันที จึงได้รับการแต่งตั้งเป็นพระมเหสี

【พากย์ไทย】ฮ่องเต้เมาและหลับไปกับนางใน แต่นางในตั้งท้องมังกรทันที จึงได้รับการแต่งตั้งเป็นพระมเหสี

ทัวร์สตรีมเมอร์ ROV รอบชิงชนะเลิศ | ชิงเงินรางวัลรวม 25,000 บาท

ทัวร์สตรีมเมอร์ ROV รอบชิงชนะเลิศ | ชิงเงินรางวัลรวม 25,000 บาท

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

Apko konsa RC Bus Accah laga

Apko konsa RC Bus Accah laga

10 recently asked Pyspark Interview Questions | Big Data Interview

10 recently asked Pyspark Interview Questions | Big Data Interview

Real Interview Q&A for Senior Data Engineer #1 | Surfalytics

Real Interview Q&A for Senior Data Engineer #1 | Surfalytics

How He Got $600,000 Data Engineer Job

How He Got $600,000 Data Engineer Job

Big Data Interview | Mock | Problem Solving | Technical Round | Pyspark , SQL #interview #question

Big Data Interview | Mock | Problem Solving | Technical Round | Pyspark , SQL #interview #question

Big Data Mock Interview | DSA | APACHE SPARK | PYTHON | DATABRICKS #interview

Big Data Mock Interview | DSA | APACHE SPARK | PYTHON | DATABRICKS #interview

Big Data Engineer Live Mock Interview | Topics: Pyspark, Delta Lake, Data Profiling, Data Governance

Big Data Engineer Live Mock Interview | Topics: Pyspark, Delta Lake, Data Profiling, Data Governance

Azure Cloud Data Engineer Mock Interview | Important Questions asked in Big Data Interviews| Pyspark

Azure Cloud Data Engineer Mock Interview | Important Questions asked in Big Data Interviews| Pyspark

4 Recently asked Pyspark Coding Questions | Apache Spark Interview

4 Recently asked Pyspark Coding Questions | Apache Spark Interview

Top 50 PySpark Interview Questions & Answers 2024 | PySpark Interview Questions | MindMajix

Top 50 PySpark Interview Questions & Answers 2024 | PySpark Interview Questions | MindMajix

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 16 : แมนเชสเตอร์ ซิตี้ พบ แมนเชสเตอร์ ยูไนเต็ด

ไฮไลท์ฟุตบอล พรีเมียร์ลีก 2024/25 สัปดาห์ที่ 16 : แมนเชสเตอร์ ซิตี้ พบ แมนเชสเตอร์ ยูไนเต็ด

BABYMONSTER - 'Love In My Heart' M/V

BABYMONSTER - 'Love In My Heart' M/V

เดี่ยว - วันที่ได้คำตอบ - Live Show - The Voice Thailand 2024 - 15 Dec 2024

เดี่ยว - วันที่ได้คำตอบ - Live Show - The Voice Thailand 2024 - 15 Dec 2024

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

หนีบ้านมากาดงัว

หนีบ้านมากาดงัว

ถ้าม้าโดนแกล้งที่โรงเรียน ม้าจะฟ้องครูว่าอะไร #แต้มเซน #การ์ตูน #tamzen #ตลก #shortvideo #การ์ตูน

ถ้าม้าโดนแกล้งที่โรงเรียน ม้าจะฟ้องครูว่าอะไร #แต้มเซน #การ์ตูน #tamzen #ตลก #shortvideo #การ์ตูน

ふわふわシフォン大作戦🩷スイーツ戦隊のキラキラミッション✨【銀座コージーコーナー】 #shorts #シフォンケーキ #クリスマスケーキ #クリスマス #ケーキ #チョコケーキ #christmas

ふわふわシフォン大作戦🩷スイーツ戦隊のキラキラミッション✨【銀座コージーコーナー】 #shorts #シフォンケーキ #クリスマスケーキ #クリスマス #ケーキ #チョコケーキ #christmas

ใครคือฆาตกรตัวจริง ?! EP.11 (ver. คืนคริสมาสต์ สุดสยอง !!!

ใครคือฆาตกรตัวจริง ?! EP.11 (ver. คืนคริสมาสต์ สุดสยอง !!!