34. Databricks - Spark: Data Skew Optimization

24 Fix Skewness and Spillage with Salting in Spark | Salting Technique | How to identify Skewness

Spark Performance Tuning | Handling DATA Skewness | Interview Question

🔴𝐋𝐈𝐕𝐄 การแข่งขัน RoV นานาชาติ AIC 2024 รอบ Swiss Stage วันที่ 9

ไฮไลท์การแข่งขัน สิงคโปร์ 2-4 ไทย | ฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

แมนยู Corner : คุยหลังเกม แมนฯซิตี้ 1-2 แมนฯยู ชัยชนะมาจากอโมริมกล้าตัด แรชฟอร์ด , การ์นาโช

How to handle Data skewness in Apache Spark using Key Salting Technique

Tech Island

มุมมอง 28 516

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 5 ก.พ. 2025
Handling the Data Skewness using Key Salting Technique. One of the biggest problem in parallel computational systems is data skewness. Data Skewness in Spark happens due to joining on a key that is not evenly distributed across the cluster, causing some partitions to be very large and not allowing Spark to process data in parallel.
GitHub Link - github.com/gje...
Content By - Jeevan Madhur [LinkedIn - / jeevan-madhur-225a3a86 ]
Editing By - Sivaraman Ravi [LinkedIn - / sivaraman-ravi-791838114 ]

ความคิดเห็น •

@gurumoorthysivakolunthu9878 2 ปีที่แล้ว ⁺³
Hi Sir... Perfect Great Explanation... Thank you for your effort...
I have a doubt :--
After joining The Salting step should be - unsalted and then grouped by has to be applied, Right...?
.....
@gautamyadav-cx7zx 2 ปีที่แล้ว
Well, I must say, thanks a lot.....have been searching for this kind of explaination.
@pariksheetde4573 4 ปีที่แล้ว ⁺⁴
Excellent. Thank you
@deepanshusingh6057 2 หลายเดือนก่อน
I would have appreciated if you would have run the code of salting and showed us on spark UI for better clarity what is happening internally within spark
@someshchandra007 3 ปีที่แล้ว
This really great and crystal clear explanations....thanks a lot for sharing and spreading knowledge!
@ashwinc9867 4 ปีที่แล้ว ⁺¹
Excellent video..thanks for the explanation and sharing the code
@SpiritOfIndiaaa 2 ปีที่แล้ว ⁺¹
Thanks but if we have multiple columns as KEY how to handle it ?
@joeturkington1304 3 ปีที่แล้ว
Excellent Description
@arunsundar3739 9 หลายเดือนก่อน
beautifully explained, thank you very much :)
@chetansp912 4 ปีที่แล้ว ⁺¹
Amazing video..!!
@savage_su 3 ปีที่แล้ว
Good work, its better you show the ourput after the salting dataframes and explain udf more detail.
@akashhudge5735 2 ปีที่แล้ว
but the join output will not be correct because in previous scenario it would have joined with all the matching ids but with new salting method it will join with only newly slated key, that's weird
@rishigc 4 ปีที่แล้ว ⁺¹
amazing video.. however, i don't know scala. So can you please give an example on how to implement the salting technique with Spark SQL queries ? that'll be of great help..
@jeevanmadhur3732 4 ปีที่แล้ว
Will update SQL query
@ashwinc9867 4 ปีที่แล้ว
@@jeevanmadhur3732 waiting for the query
@balajia8376 3 ปีที่แล้ว
@@ashwinc9867 did you get it?
@vijeandran 4 ปีที่แล้ว
Amazing video.... How can we use the salting technique in PySpark for data skew?
@MahmoudHanafy1992 3 ปีที่แล้ว ⁺¹
Great Explanation, Thanks for sharing this.
I think there is off by 1 error.
You are using (0 to 3) which will have (0, 1, 2, 3)
but random number range will be (0, 1, 2)
@shwetanandwani9059 3 ปีที่แล้ว
Hey great video, could you also link the associated resources you referred to while making this video?
@soumyadipdas1406 4 ปีที่แล้ว ⁺¹
amazing sir! thanks a lot
@thomashass1 2 ปีที่แล้ว
I have 2 questions:
First one: I think that is wrong on your visual presentation of table 2 after salting. Why don't you have z_2 und z_3 there? Also why are you using capital letters sometimes, that's confusing.
Secone question: I don't get the benefit of Key Salting in general. How is this different from broadcasting you second table? Because you explode it and then you will end up with sending the whole table to every executor anyway? No one can give an answer to this question.
@tanushreenagar3116 2 ปีที่แล้ว
best
@aravindkumar4411 4 ปีที่แล้ว
Can u please explain how to take the random number count
@jeevanmadhur3732 4 ปีที่แล้ว ⁺¹
Hi Aravind, If I understand your question correctly you wanted to take the first data frame count where we are appending a random number
var df1 = leftTable
.withColumn(leftCol, concat(
leftTable.col(leftCol), lit("_"), lit(floor(rand(123456) * 10))))
We can simply do
df1.select(col("id")).count()
This should give the count of the first data frame column
For more details, you can refer below git link
github.com/gjeevanm/SparkDataSkewness/blob/master/src/main/scala/com/gjeevan/DataSkew/RemoveDataSkew.scala
@NishaKumari-op2ek 4 ปีที่แล้ว
Hi, are you missing something in code ?? I used your code but its throwing an exception for the below code of lines
//join after elminating data skewness
df3.join(
df4,
df3.col("id") df4.col("id")
)
.show(100,false)
}
@jeevanmadhur3732 4 ปีที่แล้ว ⁺¹
Hi,
Thanks for highlighting, there is small issue with checked-in join code which I fixed now. Please pull latest code and try out
@NishaKumari-op2ek 4 ปีที่แล้ว ⁺²
@@jeevanmadhur3732 Thank you Jeevan. your videos helps us a lot :)

ต่อไป

เล่นอัตโนมัติ

34. Databricks - Spark: Data Skew Optimization

34. Databricks - Spark: Data Skew Optimization

24 Fix Skewness and Spillage with Salting in Spark | Salting Technique | How to identify Skewness

24 Fix Skewness and Spillage with Salting in Spark | Salting Technique | How to identify Skewness

Spark Performance Tuning | Handling DATA Skewness | Interview Question

Spark Performance Tuning | Handling DATA Skewness | Interview Question

🔴𝐋𝐈𝐕𝐄 การแข่งขัน RoV นานาชาติ AIC 2024 รอบ Swiss Stage วันที่ 9

🔴𝐋𝐈𝐕𝐄 การแข่งขัน RoV นานาชาติ AIC 2024 รอบ Swiss Stage วันที่ 9

ไฮไลท์การแข่งขัน สิงคโปร์ 2-4 ไทย | ฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

ไฮไลท์การแข่งขัน สิงคโปร์ 2-4 ไทย | ฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

แมนยู Corner : คุยหลังเกม แมนฯซิตี้ 1-2 แมนฯยู ชัยชนะมาจากอโมริมกล้าตัด แรชฟอร์ด , การ์นาโช

แมนยู Corner : คุยหลังเกม แมนฯซิตี้ 1-2 แมนฯยู ชัยชนะมาจากอโมริมกล้าตัด แรชฟอร์ด , การ์นาโช

เอก - ตาสว่าง - Live Show - The Voice Thailand 2024 - 15 Dec 2024

เอก - ตาสว่าง - Live Show - The Voice Thailand 2024 - 15 Dec 2024

Spark Parallelism using JDBC similar to Sqoop

Spark Parallelism using JDBC similar to Sqoop

Spark 3.0 Features | Adaptive Query Execution(AQE) | Part 1 - Optimizing SKEW Joins

Spark 3.0 Features | Adaptive Query Execution(AQE) | Part 1 - Optimizing SKEW Joins

Salting Technique to Handle Skewed data in Apache Spark

Salting Technique to Handle Skewed data in Apache Spark

All about Spark DAGs

All about Spark DAGs

Salting in Apache Spark - Part I

Salting in Apache Spark - Part I

Spark Data Skew

Spark Data Skew

Spark Join and shuffle | Understanding the Internals of Spark Join | How Spark Shuffle works

Spark Join and shuffle | Understanding the Internals of Spark Join | How Spark Shuffle works

75. Databricks | Pyspark | Performance Optimization - Bucketing

75. Databricks | Pyspark | Performance Optimization - Bucketing

Spark Performance Tuning | EXECUTOR Tuning | Interview Question

Spark Performance Tuning | EXECUTOR Tuning | Interview Question

ไทยพลิกแซงสิงคโปร์ 2-4! อาเซียนยกเป็นแมตช์สุดมันส์!! เหงียนชมดูไทยเล่นสนุกจริง!

ไทยพลิกแซงสิงคโปร์ 2-4! อาเซียนยกเป็นแมตช์สุดมันส์!! เหงียนชมดูไทยเล่นสนุกจริง!

หัวหน้าแก๊งพาลูกสาวไปกินไก่ทอด เจอกลุ่มนักเลงหาเรื่อง เลยจัดการพวกนั้นจนพ่ายแพ้

หัวหน้าแก๊งพาลูกสาวไปกินไก่ทอด เจอกลุ่มนักเลงหาเรื่อง เลยจัดการพวกนั้นจนพ่ายแพ้

【พากย์ไทย】สาวใช้ในวังจะถูกประหารชีวิต แต่เธอมีฐานะที่ไม่ธรรมดา คือพระราชบุตรีแท้ๆ ของพระราชา!

【พากย์ไทย】สาวใช้ในวังจะถูกประหารชีวิต แต่เธอมีฐานะที่ไม่ธรรมดา คือพระราชบุตรีแท้ๆ ของพระราชา!

ไฮไลท์การแข่งขัน สิงคโปร์ 2-4 ไทย | ฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

ไฮไลท์การแข่งขัน สิงคโปร์ 2-4 ไทย | ฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

MARK 마크 '프락치 (Fraktsiya) (Feat. 이영지)' MV

【พากย์ไทย】ฮ่องเต้เมาและหลับไปกับนางใน แต่นางในตั้งท้องมังกรทันที จึงได้รับการแต่งตั้งเป็นพระมเหสี

【พากย์ไทย】ฮ่องเต้เมาและหลับไปกับนางใน แต่นางในตั้งท้องมังกรทันที จึงได้รับการแต่งตั้งเป็นพระมเหสี

แมนยู Corner : คุยหลังเกม แมนฯซิตี้ 1-2 แมนฯยู ชัยชนะมาจากอโมริมกล้าตัด แรชฟอร์ด , การ์นาโช

แมนยู Corner : คุยหลังเกม แมนฯซิตี้ 1-2 แมนฯยู ชัยชนะมาจากอโมริมกล้าตัด แรชฟอร์ด , การ์นาโช

ตรวจหวยงวดวันที่ 16 ธันวาคม 2567 พร้อมรางวัล N3 รางวัลพิเศษ รางวัล 2 ตัว : Matichon Online

ตรวจหวยงวดวันที่ 16 ธันวาคม 2567 พร้อมรางวัล N3 รางวัลพิเศษ รางวัล 2 ตัว : Matichon Online