Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 ก.พ. 2025

ความคิดเห็น • 10

  • @raviiit6415
    @raviiit6415 ปีที่แล้ว +2

    great talk both of you.

  • @LuisFelipe-qe2pj
    @LuisFelipe-qe2pj 3 ปีที่แล้ว +1

    Very nice presentation!! 👏👏👏

  • @rishigc
    @rishigc 4 ปีที่แล้ว +2

    @22:13 - where can i find an example of implementation with the SQL API ?

  • @bikashpatra119
    @bikashpatra119 4 ปีที่แล้ว +1

    Can you please provide the link to benchmark in githug

    • @JimRohn-u8c
      @JimRohn-u8c 8 หลายเดือนก่อน +1

      Go to 23:25 in the video, he shows the GitHub URL in that part of the video.

  • @vishakhrameshan9932
    @vishakhrameshan9932 6 ปีที่แล้ว +2

    Hi, I am facing skewed data issue in my spark application. Here I have 2 tables both are of same size (in the sense same rows but different column size) and am checking table A not in table B. This Spark SQL is taking lot of time.
    I have given 100 executers in production env and also tried writing the both tables to a file to avoid in memory processing for such huge data and tried reading it to do the sql operation.
    My application contains a lot of spark sql operation and this sql comes in some what in between the entire operation. When i run my application, it runs till this sql and then takes more than 6hrs to run 2M records
    How can I achieve faster result with repartitioning, or iterative broadcast. Please help.

    • @arpangrwl
      @arpangrwl 5 ปีที่แล้ว

      Hi VIshakh did you found the solution for the problem you mentioned ?

    • @shankarravi749
      @shankarravi749 5 ปีที่แล้ว

      @@arpangrwl May i know the Solution What was needs to be done??

    • @JoHeN1990
      @JoHeN1990 5 ปีที่แล้ว

      Try bucketing the table before writing, it might take longer during write. But will be faster during joins

    • @TechWithViresh
      @TechWithViresh 4 ปีที่แล้ว +1

      check this : th-cam.com/video/HIlfO1pGo0w/w-d-xo.html