Spark Performance Tuning | EXECUTOR Tuning | Interview Question

How to handle Data skewness in Apache Spark using Key Salting Technique

Spark Performance Tuning | Performance Optimization | Interview Question

วิ่งจนเหนื่อย #พายคอนเฟลก

อาตมาไม่ทน 4 อรหันต์แท็คทีม ฉะ อาจารย์เบียร์ คนตื่นธรรม สอนธรรมะบิดเบี้ยว l EP.1817 l 5 ธ.ค.67

เมื่อฉันเป็นคนตัวเล็ก

Spark Performance Tuning | Handling DATA Skewness | Interview Question

TechWithViresh

มุมมอง 24 629

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 10 ธ.ค. 2024

ความคิดเห็น • 34

@sahiba2227 3 ปีที่แล้ว ⁺³
7:24, repartition does full shuffling and hence creates equal size partitions. i.e It always guarantess the equal sized partitions.
@desparadoking8209 3 ปีที่แล้ว ⁺¹
Very informative video 👍🙂
@ajaywade9418 ปีที่แล้ว
video from 11:30, we are adding random key to exiting towerid key
for Example. tower id: 101 and salt key : 67 then 101+67= 168 hash value of the 168 would be a final value right.
what in case of partition column is string datatype. ??
@TechWithViresh ปีที่แล้ว
Incase of strings, we can add surrogate keys, based on string column values and then do the salting.
@brothermalcolm 2 ปีที่แล้ว
Great video, like the pace, like the presentation.
@TechWithViresh 2 ปีที่แล้ว
Glad you liked it!
@vijeandran 4 ปีที่แล้ว ⁺¹
Hi Viresh, Thanks for the video.... How can we achieve salting technique in Pyspark?
@Dsmehra379 4 ปีที่แล้ว ⁺¹
thank you so much for this video, but i am not able to find the 2nd part of this video.. Can you please comment the link for the part 2 video
@udaynayak4788 2 ปีที่แล้ว
Hi Viresh, can you please share the link for part 2
@AdityaKommu1 4 ปีที่แล้ว ⁺²
Hello,
Your videos are very good,
Can you please do a video on incremental data load and full data load by taking an example?
@vivekpuurkayastha1580 2 ปีที่แล้ว
If the partition key is non numeric then how to perform salting? like your tower ids were numeric, but if instead of being 1, 2, .. they are to be A, B, ...
@ramkumarananthapalli7151 3 ปีที่แล้ว ⁺¹
Thank you for making this video. Could you suggest on which column mean, medium and the mode are calculated?
@bentchow ปีที่แล้ว
The columns are those that are being shuffled by such as the join columns or group by columns. There is data skew when the distribution is not normal.
@SpiritOfIndiaaa 4 ปีที่แล้ว
thank you so much , really good , so what is the difference b/w isolation salting and salting ? and what is difference b/w , isolation map join & map join ??
@Gecasomx 3 ปีที่แล้ว
Thanks for the video, no part 2 tho?
@ayanbizz 4 ปีที่แล้ว
Nice explanation.A couple of questions 1) Repartitioning does ensure the data distribution is not skewed (unlike coalesce) 2) You said repartitioning uses the hash value to distribute the data (are you talking about bucketing ?)
@TechWithViresh 4 ปีที่แล้ว
There are two provided partitioners in Spark 1. Hash partitioner and 2. Range partitioner.Default is Hash one.
@harshvardhansolanki1466 4 ปีที่แล้ว
If you repartition on column, there you can get skewed data. If you repartition by number of parts then distribution may be almost equal.
@rishigc 4 ปีที่แล้ว
@TechWithViresh I simply love your videos. I have watched your other tutorial videos too. They are awesome. I am interested in knowing how to do Iterative Broadcast Join with the SQL API. Any help is highly appreciated. Can you pls advise.
@thanoojbharateeyudu3786 4 ปีที่แล้ว
We could loose our key join by Salting key adding random numbers
If we want to do join with the same key then problem
May be join key could be the different on other than salted column
@aneksingh4496 4 ปีที่แล้ว
have u uploaded part 2 of this
@TechWithViresh 4 ปีที่แล้ว
Check out other videos in the playlist for performance optimization and executor tuning.
@harshvardhansolanki1466 4 ปีที่แล้ว
Thank you so much for the video. I seek some clarification though.
In your example you did mapPartition. Means for each partition of different keys, you updated the key with salt. But still the records remained in the respective partitions only. How will those records be shuffled across partitions for equal distribution?
@TechWithViresh 4 ปีที่แล้ว
Partition will change with the change in the key, as it is essentially the hascode of key+salt now.
@harshvardhansolanki1466 4 ปีที่แล้ว
@@TechWithViresh I tried it so I believe a new DF will have to be created and REPARTITIONED again! in order for the records to be shuffle by updated salted keys. It wont just trigger shuffle on key update in mapPartition function! That only makes sense.
@jaiharsad7121 4 ปีที่แล้ว
Hi sir, pls upload the spark interview question videos which were present earlier.. I'm not able to find them in your playlist
@TechWithViresh 4 ปีที่แล้ว
All the videos are uploaded, please check:)
@rishigc 4 ปีที่แล้ว
where is Part 2 ?
@chilukapavan6344 5 ปีที่แล้ว
Awesome video 🙏...can you pls share part2 video link
@TechWithViresh 4 ปีที่แล้ว
Coming Soon!! , Thanks :)
@pareshpal3533 3 ปีที่แล้ว
@@TechWithViresh when ?
@Hk-eo5yr 5 ปีที่แล้ว
can u share part 2 video
@TechWithViresh 4 ปีที่แล้ว
Coming Soon!! , Thanks :)
@bhuneshwarsingh630 5 ปีที่แล้ว
please give some solid coding example with explaination

ต่อไป

เล่นอัตโนมัติ

Spark Performance Tuning | EXECUTOR Tuning | Interview Question

Spark Performance Tuning | EXECUTOR Tuning | Interview Question

How to handle Data skewness in Apache Spark using Key Salting Technique

How to handle Data skewness in Apache Spark using Key Salting Technique

Spark Performance Tuning | Performance Optimization | Interview Question

Spark Performance Tuning | Performance Optimization | Interview Question

วิ่งจนเหนื่อย #พายคอนเฟลก

วิ่งจนเหนื่อย #พายคอนเฟลก

อาตมาไม่ทน 4 อรหันต์แท็คทีม ฉะ อาจารย์เบียร์ คนตื่นธรรม สอนธรรมะบิดเบี้ยว l EP.1817 l 5 ธ.ค.67

อาตมาไม่ทน 4 อรหันต์แท็คทีม ฉะ อาจารย์เบียร์ คนตื่นธรรม สอนธรรมะบิดเบี้ยว l EP.1817 l 5 ธ.ค.67

เมื่อฉันเป็นคนตัวเล็ก

เมื่อฉันเป็นคนตัวเล็ก

กินข้าวพร้อมกับ 'เซียนหรั่ง' มื้อนี้แซ่บหลายเด้อ!!

กินข้าวพร้อมกับ 'เซียนหรั่ง' มื้อนี้แซ่บหลายเด้อ!!

Spark Interview Question | Bucketing | Spark SQL

Spark Interview Question | Bucketing | Spark SQL

Spark Scenario Interview Question | Persistence Vs Broadcast

Spark Scenario Interview Question | Persistence Vs Broadcast

Spark Join Without Shuffle | Spark Interview Question

Spark Join Without Shuffle | Spark Interview Question

Why Data Skew Will Ruin Your Spark Performance

Why Data Skew Will Ruin Your Spark Performance

Spark Performance Tuning | Avoid GroupBy | Interview Question

Spark Performance Tuning | Avoid GroupBy | Interview Question

Performance Tuning in Spark

Performance Tuning in Spark

Spark Interview Question | Cost Based Optimizer

Spark Interview Question | Cost Based Optimizer

Bucketing - The One Spark Optimization You're Not Doing

Bucketing - The One Spark Optimization You're Not Doing

Broadcast Joins & AQE (Adaptive Query Execution)

Broadcast Joins & AQE (Adaptive Query Execution)

กินแหลก 6 ร้านเด็ด ก๋วยเตี๋ยวป๊อกป๊อกพัทยา 12 ชั่วโมง BANKII 4K

กินแหลก 6 ร้านเด็ด ก๋วยเตี๋ยวป๊อกป๊อกพัทยา 12 ชั่วโมง BANKII 4K

Đang ngồi chơi bỗng dưng bể cá vỡ kính, may có CCTV chứng minh sự trong sạch cho cô bé

Đang ngồi chơi bỗng dưng bể cá vỡ kính, may có CCTV chứng minh sự trong sạch cho cô bé

ห้ามพูดคำบนหัว ep9 #คำต้องห้าม

ห้ามพูดคำบนหัว ep9 #คำต้องห้าม

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

ยิงทิ้ง (2529) | หนังไทย เต็มเรื่อง | สรพงศ์ ชาตรี | คลาสสิคฟิล์ม

ยิงทิ้ง (2529) | หนังไทย เต็มเรื่อง | สรพงศ์ ชาตรี | คลาสสิคฟิล์ม

IGITT! HAT ER GERADE EINE SCHWAMM GEGESSEN?! ICH BIN RAUS! 😹🧽

IGITT! HAT ER GERADE EINE SCHWAMM GEGESSEN?! ICH BIN RAUS! 😹🧽

ทหารไทยไม่ธรรมดา!18ธ.ค.จบปัญหา"ว้าแดง" | HOTSHOT เดลินิวส์ 08/12/67

ทหารไทยไม่ธรรมดา!18ธ.ค.จบปัญหา"ว้าแดง" | HOTSHOT เดลินิวส์ 08/12/67

อันไหนเสียงดัง💥 #freefire #ฟีฟาย #shorts

อันไหนเสียงดัง💥 #freefire #ฟีฟาย #shorts