Number of Partitions in Dataframe | Spark Tutorial | Interview Question

Spark Performance Tuning | Avoid GroupBy | Interview Question

Spark Interview Question | Bucketing | Spark SQL

เอ้า ไม่คิดว่าพี่เอกจะเล่นเกมนี้ | Infinity Nikki

เมื่อฉันเป็นคนตัวเล็ก

ถ้าเอเลี่ยนบุกโลก คุณจะเลือกช่วยใคร

Spark Join Without Shuffle | Spark Interview Question

TechWithViresh

มุมมอง 21 643

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 12 ธ.ค. 2024

ความคิดเห็น • 29

@shivrajsingh5559 3 ปีที่แล้ว ⁺²
That's what i was looking for. It's a great help Viresh
@mrkrish501 4 ปีที่แล้ว ⁺¹
i m really happy with your in deep dive spark. Thank you.
@MohitKumar-st3ms 4 ปีที่แล้ว ⁺³
Let's say if you are having two large dataframe , then How will you optimize the join ? And why are you using the rdd as it's very slow as compared to dataframe ?
@gemini_537 3 ปีที่แล้ว ⁺²
small2 is not defined. Also why is the shuffle cost of partitioning the 2 RDDs separately lower than the shuffle cost of joining them directly? They are basically doing the same thing, moving data of the same join key to a same executor.
@SpiritOfIndiaaa 4 ปีที่แล้ว ⁺¹
thanks Veresh , here "rdd"s been used , how to do same using Dataset/Dataframe ?? where you got "small2" from??
@gemini_537 3 ปีที่แล้ว ⁺¹
What's the benefit of persisting the 2 RDDs?
@adamantnams 4 ปีที่แล้ว ⁺¹
Any suggestions for dataframes?
@gemini_537 3 ปีที่แล้ว ⁺²
I feel the title is misleading, repartitioning the 2 RDDs involves shuffle.
@naveenkumar-tb1de 4 ปีที่แล้ว ⁺¹
I have been asked like, if I have 2 tables with same volume of data but say one has 10 column and other has 3 columns, how to optimise this joining.
@SpiritOfIndiaaa 4 ปีที่แล้ว ⁺²
really nice , thanks bro , in line 14 , is it "small.partition.get" instead "small2.partition.get" right ? why shuffle.partitions set to 2 only ?
@TechWithViresh 4 ปีที่แล้ว
Otherwise remaining 198 partitions would be empty
@SpiritOfIndiaaa 4 ปีที่แล้ว
@@TechWithViresh is it otherwise or other words ? want to keep 198 partitions empty ?
@monku1821 3 ปีที่แล้ว ⁺¹
have been following the series, its pretty good but this video is not at all clear, you should make another with same question
@gemini_537 3 ปีที่แล้ว
What's the book/picture in the video?
@rishigc 4 ปีที่แล้ว
Even with repartitioning we have to move data to different partitions causing a shuffle, isnt it ?
@Trip-Train ปีที่แล้ว
Why are you converting dataframe to rdd ?? It is very bad practice in terms of performance
@keyaar3393 3 ปีที่แล้ว
shuffle during join OR doing repartition before join .... u r saying that the second one is better.... right? Whats the difference? u have not mentioned why is it better... some one has to take care of repartitioning -> either join will shuffle or we have to repartition -> its fine... pls let us know why this approach is better.
@Mryajivramuk 3 ปีที่แล้ว
Concept is really worth testing.
Code is incomplete at places .
I took time to fill gaps.
Last line display()..will it work in scala spark ?🙄
@TechWithViresh 3 ปีที่แล้ว
This code will run fine on Azure Databricks.
@shankargs7685 4 ปีที่แล้ว ⁺¹
partition.get is returning None in largeRDD line no. 14
@IndianCoupleinUKBLR 4 ปีที่แล้ว
where did small2 came from .....there is typo mistakes...can you please update it.??
@rohinirithe1522 4 ปีที่แล้ว
getting error for line number 14 --->
error: value partitioner is not a member of org.apache.spark.sql.DataFrame
Kindly suggest
@saurabhgarud6690 4 ปีที่แล้ว ⁺¹
Very Nice content provided on this channel thanks for that, Q:- Can range partition work here ?
@dipanjansaha6824 4 ปีที่แล้ว ⁺¹
How to connect with you?
@TechWithViresh 4 ปีที่แล้ว ⁺²
TechWithViresh@gmail.com
@sagarrawal7740 10 หลายเดือนก่อน
Video recommendatin at the end are blocking the content...
@dheerendrakumarjain6672 3 ปีที่แล้ว
your example is not up to the mark, whatever you describe in your lecture it is not understandable, only the shake of creating a video you do this, I did not get your point whatever you told us regarding the join how it happens and what happens please describe in a much better understandable manner.

ต่อไป

เล่นอัตโนมัติ

Number of Partitions in Dataframe | Spark Tutorial | Interview Question

Number of Partitions in Dataframe | Spark Tutorial | Interview Question

Spark Performance Tuning | Avoid GroupBy | Interview Question

Spark Performance Tuning | Avoid GroupBy | Interview Question

Spark Interview Question | Bucketing | Spark SQL

Spark Interview Question | Bucketing | Spark SQL

เอ้า ไม่คิดว่าพี่เอกจะเล่นเกมนี้ | Infinity Nikki

เอ้า ไม่คิดว่าพี่เอกจะเล่นเกมนี้ | Infinity Nikki

เมื่อฉันเป็นคนตัวเล็ก

เมื่อฉันเป็นคนตัวเล็ก

ถ้าเอเลี่ยนบุกโลก คุณจะเลือกช่วยใคร

ถ้าเอเลี่ยนบุกโลก คุณจะเลือกช่วยใคร

เปิดคำทำนาย นอสตราดามุส - บาบา วานก้า ปี 2025 คนทั่วโลกต้องเจออะไรบ้าง? | แฉ 11 ธ.ค. 67 [2/3] |GMM25

เปิดคำทำนาย นอสตราดามุส - บาบา วานก้า ปี 2025 คนทั่วโลกต้องเจออะไรบ้าง? | แฉ 11 ธ.ค. 67 [2/3] |GMM25

Broadcast Join vs Sort Merge Join | 65% reduction in processing time by using Broadcast Join

Broadcast Join vs Sort Merge Join | 65% reduction in processing time by using Broadcast Join

Cache vs Persist | Spark Tutorial | Deep Dive

Cache vs Persist | Spark Tutorial | Deep Dive

Spark Performance Tuning | EXECUTOR Tuning | Interview Question

Spark Performance Tuning | EXECUTOR Tuning | Interview Question

Apache Spark 3 | New Feature | Performance Optimization | Dynamic Partition Pruning

Apache Spark 3 | New Feature | Performance Optimization | Dynamic Partition Pruning

Pyspark Advanced interview questions part 1 #Databricks #PysparkInterviewQuestions #DeltaLake

Pyspark Advanced interview questions part 1 #Databricks #PysparkInterviewQuestions #DeltaLake

Apache Spark 3 | Design | Architecture | New Features | Interview Question

Apache Spark 3 | Design | Architecture | New Features | Interview Question

Top Minds in AI Explain What’s Coming After GPT-4o | EP #130

Top Minds in AI Explain What’s Coming After GPT-4o | EP #130

Spark Performance Tuning | Handling DATA Skewness | Interview Question

Spark Performance Tuning | Handling DATA Skewness | Interview Question

Комментарий к текущим событиям от 12 декабря 2024 года. Михаил Хазин

Комментарий к текущим событиям от 12 декабря 2024 года. Михаил Хазин

🔴LIVE มาเลเซีย vs ติมอร์-เลสเต | ฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024 | รอบแรก กลุ่ม A

🔴LIVE มาเลเซีย vs ติมอร์-เลสเต | ฟุตบอล ASEAN Mitsubishi Electric Cup™ 2024 | รอบแรก กลุ่ม A

Quando eu quero Sushi (sem desperdiçar) 🍣

Quando eu quero Sushi (sem desperdiçar) 🍣

I Bought Everything In A Grocery Store!

I Bought Everything In A Grocery Store!

MARCKRIS BUS - กี่หมื่นครั้งที่ตกหลุมรักคนเดิม (Fall And Fall In Love) | Official MV

MARCKRIS BUS - กี่หมื่นครั้งที่ตกหลุมรักคนเดิม (Fall And Fall In Love) | Official MV

LIVE🔴 : Timor-Leste vs Thailand | ASEAN Championship 2024 | 08.12.24

LIVE🔴 : Timor-Leste vs Thailand | ASEAN Championship 2024 | 08.12.24

СОБАКИ НАУЧИЛИСЬ РИСОВАТЬ! 🤯

СОБАКИ НАУЧИЛИСЬ РИСОВАТЬ! 🤯

เมื่อฉันเป็นคนตัวเล็ก

เมื่อฉันเป็นคนตัวเล็ก

ร้องเพลงสั่งข้าว Ver.จื่อบ่ (ຈື່ບໍ່) - ก้านตอง ทุ่งเงิน | Feat @sayamo9589 #ร้องเพลงสั่งข้าว

ร้องเพลงสั่งข้าว Ver.จื่อบ่ (ຈື່ບໍ່) - ก้านตอง ทุ่งเงิน | Feat @sayamo9589 #ร้องเพลงสั่งข้าว