Spark Optimization | Broadcast Variable with Demo | Session - 1 | LearntoSpark

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 ธ.ค. 2024

ความคิดเห็น • 27

  • @aniabouadi15
    @aniabouadi15 ปีที่แล้ว

    Thank you for your video, very useful

  • @shilpasthavarmath5262
    @shilpasthavarmath5262 3 ปีที่แล้ว +1

    Hi.. Nice one.. Please make a video on scala class

  • @SpiritOfIndiaaa
    @SpiritOfIndiaaa 2 ปีที่แล้ว

    Thanks a lot , nice .. how to access broadcast variables inside the UDFs ?

  • @ranganath4795
    @ranganath4795 3 ปีที่แล้ว

    some time while creating a new cluster in databricks it is taking long time, even after for some time also cluster is not creating.
    Tried terminating/Deleting the cluster and created new one , also same issue

  • @bhanubrahmadesam4508
    @bhanubrahmadesam4508 4 ปีที่แล้ว

    bro, can broadcast be used only with UDF, I tried like below and its not working., could you pls have a look
    df.withColumn('City_Name', broad.value[State_Code]).show(5)
    # NameError: name 'State_Code' is not defined
    df.withColumn('City_Name', funcreg('State_Code')).show(5)
    # (this works just fine)

  • @praneethbhat4703
    @praneethbhat4703 3 ปีที่แล้ว

    Ur videos are very good and helps me a lot

  • @yaniv54
    @yaniv54 4 ปีที่แล้ว

    Very well explained.

  • @preethamp1826
    @preethamp1826 4 ปีที่แล้ว

    What is the difference of destroy and unpersist? Are both remove the data from cache memory?

    • @AzarudeenShahul
      @AzarudeenShahul  4 ปีที่แล้ว +1

      I hope, u had see vdo till end.. unpersist remove the data from cache.. whereas destroy removes the data from driver itself..

  • @MrManishelectra
    @MrManishelectra 4 ปีที่แล้ว +1

    Informative video... can you please create one video on accumulator as well🙂

    • @AzarudeenShahul
      @AzarudeenShahul  4 ปีที่แล้ว +2

      Thanks, Made a video on Accumulator. Hope it will be useful :)

    • @MrManishelectra
      @MrManishelectra 4 ปีที่แล้ว

      @@AzarudeenShahul Thanks 😊

  • @allinonetutorials26
    @allinonetutorials26 3 ปีที่แล้ว

    Why broadcast is useful in this scenario, i mean we can add directly the state name in input file

  • @satyamverma4726
    @satyamverma4726 2 ปีที่แล้ว

    I want to update the value of the broadcast variable after each iteration in the loop. Is it possible?

  • @dileepkumar-nd1fo
    @dileepkumar-nd1fo 2 ปีที่แล้ว

    Broadcast variables never get copied over to Executors memory. What if my broadcast data is 1Gb and I have 10 executors. Will that 1GB gets copied to 10 executors? means 1GB is replicated to 10GB which is not a right approach..
    Broadcast variables are copied to executed memory only whenever its required and also it doesn't copy entire data at once. It uses Torrent Broadcast algorithm internally.

    • @sanskarsuman9340
      @sanskarsuman9340 2 ปีที่แล้ว

      instead of declaring and broadcasting variable, if we use case when condition on that df to populate fullname ,how different will be both?
      eg: withColumn("state_name",case when state='NY',then "New York")

  • @bhanubrahmadesam4508
    @bhanubrahmadesam4508 4 ปีที่แล้ว

    How to populate a default value when there is no match

  • @dippusingh3204
    @dippusingh3204 4 ปีที่แล้ว

    Hi Azarudeen...Can you please share similar video on Broadcast Join using IntelliJ Sbt? Your videos are really helpful

    • @dippusingh3204
      @dippusingh3204 4 ปีที่แล้ว +1

      Got my answer....Below is code snippet..had one question though. It is possible to broadcast all list of values (having multiple columns) from file or its not??
      val input_df= spark.read.option("header","true").option("delimiter","|").option("inferSchema","true").csv("input/uspopulation.csv")
      val states = Map(("NY","New York"),("CA","California"),("FL","Florida"),("IL","Illinois"),("AZ","Arizona"),("TX","Texas"),("CO","Colorado"))
      val statesbc =sc.broadcast(states)
      val statesbcfunc= ( x : String) => {statesbc.value.get(x)}
      val statesbcudf=udf(statesbcfunc)
      input_df.withColumn("state",statesbcudf(input_df("State_Code"))).show(false)

    • @AzarudeenShahul
      @AzarudeenShahul  4 ปีที่แล้ว

      Yes u can broadcast from file.. read file as DataFrame and broadcast the DataFrame.

  • @maheshk1678
    @maheshk1678 4 ปีที่แล้ว

    Bro could you put one video to read hbase table in to structure..

    • @AzarudeenShahul
      @AzarudeenShahul  4 ปีที่แล้ว +2

      Sure Bro, I am setting up Hbase in local, once done. will do one :) Thanks for your support

    • @maheshk1678
      @maheshk1678 4 ปีที่แล้ว

      @@AzarudeenShahul thank you bro. And also please share video on kafka spark streaming and dynamically handled the nested json in json file or kafka topic