Spark Join Without Shuffle | Spark Interview Question

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 ธ.ค. 2024

ความคิดเห็น • 29

  • @shivrajsingh5559
    @shivrajsingh5559 3 ปีที่แล้ว +2

    That's what i was looking for. It's a great help Viresh

  • @mrkrish501
    @mrkrish501 4 ปีที่แล้ว +1

    i m really happy with your in deep dive spark. Thank you.

  • @MohitKumar-st3ms
    @MohitKumar-st3ms 4 ปีที่แล้ว +3

    Let's say if you are having two large dataframe , then How will you optimize the join ? And why are you using the rdd as it's very slow as compared to dataframe ?

  • @gemini_537
    @gemini_537 3 ปีที่แล้ว +2

    small2 is not defined. Also why is the shuffle cost of partitioning the 2 RDDs separately lower than the shuffle cost of joining them directly? They are basically doing the same thing, moving data of the same join key to a same executor.

  • @SpiritOfIndiaaa
    @SpiritOfIndiaaa 4 ปีที่แล้ว +1

    thanks Veresh , here "rdd"s been used , how to do same using Dataset/Dataframe ?? where you got "small2" from??

  • @gemini_537
    @gemini_537 3 ปีที่แล้ว +1

    What's the benefit of persisting the 2 RDDs?

  • @adamantnams
    @adamantnams 4 ปีที่แล้ว +1

    Any suggestions for dataframes?

  • @gemini_537
    @gemini_537 3 ปีที่แล้ว +2

    I feel the title is misleading, repartitioning the 2 RDDs involves shuffle.

  • @naveenkumar-tb1de
    @naveenkumar-tb1de 4 ปีที่แล้ว +1

    I have been asked like, if I have 2 tables with same volume of data but say one has 10 column and other has 3 columns, how to optimise this joining.

  • @SpiritOfIndiaaa
    @SpiritOfIndiaaa 4 ปีที่แล้ว +2

    really nice , thanks bro , in line 14 , is it "small.partition.get" instead "small2.partition.get" right ? why shuffle.partitions set to 2 only ?

    • @TechWithViresh
      @TechWithViresh  4 ปีที่แล้ว

      Otherwise remaining 198 partitions would be empty

    • @SpiritOfIndiaaa
      @SpiritOfIndiaaa 4 ปีที่แล้ว

      @@TechWithViresh is it otherwise or other words ? want to keep 198 partitions empty ?

  • @monku1821
    @monku1821 3 ปีที่แล้ว +1

    have been following the series, its pretty good but this video is not at all clear, you should make another with same question

  • @gemini_537
    @gemini_537 3 ปีที่แล้ว

    What's the book/picture in the video?

  • @rishigc
    @rishigc 4 ปีที่แล้ว

    Even with repartitioning we have to move data to different partitions causing a shuffle, isnt it ?

  • @Trip-Train
    @Trip-Train ปีที่แล้ว

    Why are you converting dataframe to rdd ?? It is very bad practice in terms of performance

  • @keyaar3393
    @keyaar3393 3 ปีที่แล้ว

    shuffle during join OR doing repartition before join .... u r saying that the second one is better.... right? Whats the difference? u have not mentioned why is it better... some one has to take care of repartitioning -> either join will shuffle or we have to repartition -> its fine... pls let us know why this approach is better.

  • @Mryajivramuk
    @Mryajivramuk 3 ปีที่แล้ว

    Concept is really worth testing.
    Code is incomplete at places .
    I took time to fill gaps.
    Last line display()..will it work in scala spark ?🙄

    • @TechWithViresh
      @TechWithViresh  3 ปีที่แล้ว

      This code will run fine on Azure Databricks.

  • @shankargs7685
    @shankargs7685 4 ปีที่แล้ว +1

    partition.get is returning None in largeRDD line no. 14

  • @IndianCoupleinUKBLR
    @IndianCoupleinUKBLR 4 ปีที่แล้ว

    where did small2 came from .....there is typo mistakes...can you please update it.??

  • @rohinirithe1522
    @rohinirithe1522 4 ปีที่แล้ว

    getting error for line number 14 --->
    error: value partitioner is not a member of org.apache.spark.sql.DataFrame
    Kindly suggest

  • @saurabhgarud6690
    @saurabhgarud6690 4 ปีที่แล้ว +1

    Very Nice content provided on this channel thanks for that, Q:- Can range partition work here ?

  • @dipanjansaha6824
    @dipanjansaha6824 4 ปีที่แล้ว +1

    How to connect with you?

  • @sagarrawal7740
    @sagarrawal7740 10 หลายเดือนก่อน

    Video recommendatin at the end are blocking the content...

  • @dheerendrakumarjain6672
    @dheerendrakumarjain6672 3 ปีที่แล้ว

    your example is not up to the mark, whatever you describe in your lecture it is not understandable, only the shake of creating a video you do this, I did not get your point whatever you told us regarding the join how it happens and what happens please describe in a much better understandable manner.