6.8 Catalyst Optimizer | Spark Interview questions

แชร์
ฝัง
  • เผยแพร่เมื่อ 5 ก.ย. 2024
  • As part of our spark Interview question Series, we want to help you prepare for your spark interviews. We will discuss various topics about spark like Lineage, reduceby vs group by, yarn client mode vs yarn cluster mode etc.
    As part of this video we are covering what is spark catalyst optimizer. Catalyst optimizer is very important in dataframe and datasets
    Please subscribe to our channel.
    Here is link to other spark interview questions
    • 2.5 Transformations Vs...
    Here is link to other Hadoop interview questions
    • 1.1 Why Spark is Faste...

ความคิดเห็น • 59

  • @vishwanathh9848
    @vishwanathh9848 4 ปีที่แล้ว +1

    You are precisely choosing the topics and very very accurately explaining them. Please keep it up.

    • @DataSavvy
      @DataSavvy  3 ปีที่แล้ว

      Thanks Vishwanath

  • @pandurangbhadange25
    @pandurangbhadange25 6 หลายเดือนก่อน

    1. Parsing - create an abstract syntax tree.
    2. Analysis - Catalyst analyzer performs semantic analysis on tree. This includes resolving references, type checking, and creating a logical plan. The analyzer also infers data types.
    3. Logical optimisation - Rewrite the plan into a more efficient form. This includes predicate pushdown, constant folding.
    4. Physical planning - Spark stages and tasks created.
    5. Physical optimisation - optimized further by considering factors like data partitioning, join order, and choosing the most efficient physical operators
    6.Code Generation - generates Java bytecode for the optimized physical plan

  • @MrManish389
    @MrManish389 4 ปีที่แล้ว +3

    Your video is very nice and specific.
    Apart from this sir, i want to add some thing... We will get many physical plan only when cost based optimization is enabled.

    • @DataSavvy
      @DataSavvy  4 ปีที่แล้ว +1

      You are Right... Thanks for adding information

  • @prabhakaransubramaniyan6538
    @prabhakaransubramaniyan6538 6 ปีที่แล้ว

    i have came across many video for catalyst optimizer. i found this is the best and well explained:)

  • @jubinsharma8441
    @jubinsharma8441 3 ปีที่แล้ว

    Please try keeping volume at a higher pitch, ur videos are very educative elaborative and helpful. Please try improving the sound as well. Sometimes it is very difficult to understand and I close the video.

  • @souravsinha5330
    @souravsinha5330 ปีที่แล้ว +1

    Clearly explained thanks

    • @DataSavvy
      @DataSavvy  ปีที่แล้ว

      Glad it helped

  • @31bikashdash
    @31bikashdash ปีที่แล้ว

    great

  • @himanshusekharpaul476
    @himanshusekharpaul476 6 ปีที่แล้ว +2

    Add scenario based questions from Spark (Core , SQL , Streaming) . .. also add Questions for Scala

    • @DataSavvy
      @DataSavvy  6 ปีที่แล้ว +1

      Sure Hemanshu... Do u have any examples of scenario based questions? I will create video for that

    • @himanshusekharpaul476
      @himanshusekharpaul476 6 ปีที่แล้ว +1

      I don't have complet list of scenario . But it can be created ..Like
      .
      Let's say you got a file of 8 GB . How can you copy it to each executer memory .
      What it the meaning of add jar parameter in Spark- submit?
      What each parameter in Spark submit do internally?
      How you can do some customization with those parameter list ?? Etc

  • @ankitamahadik2756
    @ankitamahadik2756 4 ปีที่แล้ว +1

    Very nicely explained 👍

    • @DataSavvy
      @DataSavvy  4 ปีที่แล้ว

      Thank Ankita... I am happy that you liked it... Please share your suggestions, if any to improve content on this channel.

    • @DataSavvy
      @DataSavvy  4 ปีที่แล้ว

      Please subscribe to channel. It motivates to create more useful content for everyone.. Thanks :)

  • @sureshu5671
    @sureshu5671 3 ปีที่แล้ว

    very good explanation

  • @arundhingra4536
    @arundhingra4536 5 ปีที่แล้ว

    Good explaining of optimizer

  • @shwetabalkawade3322
    @shwetabalkawade3322 2 ปีที่แล้ว

    Great and helpful video but voice is low

  • @koushikdas6840
    @koushikdas6840 4 ปีที่แล้ว +1

    Great contents :)

    • @DataSavvy
      @DataSavvy  4 ปีที่แล้ว

      Thanks Kaushik

  • @MrVivekc
    @MrVivekc 4 ปีที่แล้ว +1

    also pls make a video on spark RDD vs df vs sparksql performance and which one outperforms other and in which case.

    • @DataSavvy
      @DataSavvy  4 ปีที่แล้ว

      This video will be helpful th-cam.com/video/ZirbI1355B8/w-d-xo.html

  • @paroolsingh597
    @paroolsingh597 3 ปีที่แล้ว

    Hi sir, great video! Could you please let us know what is whole stage code generation ? Is it the RDD code which is generated after picking up the most optimized plan ?

  • @telugutravellerraj
    @telugutravellerraj 5 ปีที่แล้ว +1

    Good info. Can you publish a video showing dataframe vs dataset difference with an example.

    • @DataSavvy
      @DataSavvy  5 ปีที่แล้ว

      th-cam.com/video/ZirbI1355B8/w-d-xo.html

  • @vinodmani3900
    @vinodmani3900 5 ปีที่แล้ว

    Thanks for the detailed explanation. However I am slightly confused now after watching previous video. There you mentioned logical plan , dag , execution plan is the pattern. Could you please connect that in this detailed context. Is DAG is part of this catalyst optimizer?

  • @ankursrivastava2112
    @ankursrivastava2112 6 ปีที่แล้ว

    Thanks for your spark explanation. Can you please make a video on serialization, deserialization? thanku

  • @heenasaxena6118
    @heenasaxena6118 3 ปีที่แล้ว

    Apki awaz bahut dheere hai.. Saabhi vedios mai.. Please make louder vedios. Content is awesum...

    • @DataSavvy
      @DataSavvy  3 ปีที่แล้ว +1

      Thanks Heena... I was very new jab maine ye sari video banai.. It was microphone issue.. unfortunately youtube does not give option to edit already uploaded videos... I have improved this in latest videos...

    • @heenasaxena6118
      @heenasaxena6118 3 ปีที่แล้ว

      @@DataSavvy I see all your vedios. Major problem I face in interview is in explaining project flow from end to end. Can you please make some vedio which teaches me how to explain project to interviewers.

  • @SantoshSingh-ki8bx
    @SantoshSingh-ki8bx 6 ปีที่แล้ว +2

    would suggest before publishing it .plz check if it is audible or not

    • @DataSavvy
      @DataSavvy  6 ปีที่แล้ว +3

      Thanks Santosh for suggestion... I have been doing that... However as soon as I upload video on TH-cam, TH-cam decreases voice quality after processing video... In New videos I have used new microphone and changed format of video... There is some improvement in voice quality... Apologies for inconvenience

    • @DataSavvy
      @DataSavvy  6 ปีที่แล้ว +1

      Low audio is issue on mobiles majorly, I tested on laptop, it looks fine... If that helps

  • @DesireIsIrrelevant
    @DesireIsIrrelevant 5 ปีที่แล้ว +1

    Regardless of using any join in my code does optimizer converts it in (terms of physical plan) into most efficient join like you said map side/broadcast/hash join??

    • @DataSavvy
      @DataSavvy  4 ปีที่แล้ว

      yes... it does... wherever optimization is possible

  • @mtamitsharma
    @mtamitsharma 5 ปีที่แล้ว

    Hi Sir,
    Thanks for sharing valuable spark interview questions with us.
    could you please tell us the difference between Tungsten and Catalyst optimizer?
    can we create more than one spark context for an application, I have confusion with the allowMultipleContext property while creating a spark context?
    Kindly share any information with us on this.
    Thanks

    • @rajeshwarreddyracha4655
      @rajeshwarreddyracha4655 4 ปีที่แล้ว +2

      Multiple spark contexts by setting up, Spark.driver.allowMultipleContexts to TRUE.
      Multiple spark contexts for single JVM is not recommended, since crashing of one spark context will affect other.
      Spark Context contains same ContextId, But Spark Session contains different Session id’s while creating new ones and all Spark sessions will share the same Context id.

  • @snehaljadhav9905
    @snehaljadhav9905 3 ปีที่แล้ว

    Is there any video on Spark optimization techniques ?
    I did not found so please help me with this.
    Thanks in advance.

    • @DataSavvy
      @DataSavvy  3 ปีที่แล้ว

      Are your looking for act specific technique?

  • @rajareddy47444
    @rajareddy47444 6 ปีที่แล้ว

    Hi. Can you explain about Case classes

  • @user-fm4kz1ur1h
    @user-fm4kz1ur1h ปีที่แล้ว

    Volume very low sir

  • @ashutoshranghar4113
    @ashutoshranghar4113 6 ปีที่แล้ว +1

    REAL TIME SCENARIOSSS PLS

  • @dilsha795
    @dilsha795 5 ปีที่แล้ว +1

    Could you please suggest a good spark tutorial?

    • @DataSavvy
      @DataSavvy  5 ปีที่แล้ว +1

      Bro, I thought my channel has good tutorial. :) Can you suggest what is missing here

    • @dilsha795
      @dilsha795 5 ปีที่แล้ว +2

      @@DataSavvy Your is channel is excellent on an interview point of view. I couldn't find proper tutorial that explains from basic level

    • @DataSavvy
      @DataSavvy  5 ปีที่แล้ว

      @@dilsha795 got it... :) Will start creating videos for tutorial point of view also

  • @MrVivekc
    @MrVivekc 4 ปีที่แล้ว

    logical plan is lineage and physical plan is DAG, pls confirm?

    • @DataSavvy
      @DataSavvy  4 ปีที่แล้ว

      Not Really...

  • @rakeshdey1702
    @rakeshdey1702 4 ปีที่แล้ว

    cost based optimizer and rule based optimizer eliminated catalyst from spark 2??

    • @DataSavvy
      @DataSavvy  4 ปีที่แล้ว

      That's a news... Let me check and get back

    • @rakeshdey1702
      @rakeshdey1702 4 ปีที่แล้ว

      @@DataSavvy Actually CBO is used to select most optimized execution plan.. so catalyst optimizer actually does from logical to execution plan. Before converting RDD, CBO actually selects most optimized execution plan. Let me know if you conclude same.. CBO comes in picture from spark 2.3 I think

  • @chaitanyag.8415
    @chaitanyag.8415 3 ปีที่แล้ว

    please increase the audio.

  • @Step2learn
    @Step2learn 3 ปีที่แล้ว

    sound aa thaaan pesaan da

  • @ashwenkumar
    @ashwenkumar 5 ปีที่แล้ว

    from your next video kindly speak louder

  • @bhavanatanwar1108
    @bhavanatanwar1108 4 ปีที่แล้ว

    please improve audio..

    • @DataSavvy
      @DataSavvy  4 ปีที่แล้ว +1

      Hi Bhavana... I have improved this in New videos... Excuse me for inconvenience

  • @thanooj
    @thanooj 3 ปีที่แล้ว

    Would you speak a little louder, please.

  • @kishoregarimella7987
    @kishoregarimella7987 3 ปีที่แล้ว

    Pathetic sound quality