Spark Performance Tuning | Performance Optimization | Interview Question

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 ธ.ค. 2024

ความคิดเห็น • 10

  • @Bhawnasays
    @Bhawnasays 4 ปีที่แล้ว +6

    I personally liked your videos. can you mention your linkedin?

  • @deepakgupta-hk9ig
    @deepakgupta-hk9ig 2 ปีที่แล้ว +1

    Hi, now We have Tungsten which uses encoders for serilisaztion. SO now still we should use Kyro for serlization or tungsten will take care of it?

  • @vamshi878
    @vamshi878 4 ปีที่แล้ว +1

    Hi i have one doubt, in this performance tuning tips only when we use RDD?

    • @TechWithViresh
      @TechWithViresh  4 ปีที่แล้ว +2

      Under the hood everything is red, be it dataset or df

    • @onbootstrap
      @onbootstrap 4 ปีที่แล้ว

      @@TechWithViresh I don't think dataframes and datasets are under the hood powered by RDD.. can you please share any citation to the above claim? .. thanks..

    • @rahulpandit9082
      @rahulpandit9082 4 ปีที่แล้ว +4

      @@onbootstrap RDD is building block of spark. No matter which abstraction dataframe or dataset we use, internally final computation is done on RDD..

  • @nikhilvalsaraj4860
    @nikhilvalsaraj4860 4 ปีที่แล้ว +1

    very useful info

  • @shyamsundar.r7373
    @shyamsundar.r7373 4 ปีที่แล้ว

    I have one common doubt, We could see spark is a cluster computing technique so spake job will be splited and sent across various node in cluster and processed in parallel and get us an output so here my doubt is while job splited and sent to nodes whether data to be processed and program code also will be sent? Please clarify.

    • @TechWithViresh
      @TechWithViresh  4 ปีที่แล้ว

      So , distributed systems work on the architectural theme of sending code to the data, which the backbone and the breakthrough concept for handling of terabytes of data

  • @pratikmokal7046
    @pratikmokal7046 3 ปีที่แล้ว

    Thanks