05. Databricks | Pyspark: Cluster Deployment

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ต.ค. 2024

ความคิดเห็น • 34

  • @ashutoshjadhav6922
    @ashutoshjadhav6922 2 ปีที่แล้ว +6

    Sir Your videos are really very helpful, nobody has given such information in simple language in their video. Keep up the good work of spreading the knowledge❤

  • @anirudhnegi4523
    @anirudhnegi4523 ปีที่แล้ว +3

    Great content. Smooth explaination. Keep up the good work!

  • @phanisrikrishna
    @phanisrikrishna ปีที่แล้ว +3

    Hi Raja, that's a great lecture series on pyspark. Is there any possibility of getting the slides for reference?

  • @gulsahtanay2341
    @gulsahtanay2341 7 หลายเดือนก่อน +1

    Great content, thank you!

  • @YashSharma-ou7rh
    @YashSharma-ou7rh หลายเดือนก่อน

    Sir your videos are really good and very understandable in a very simple language. I'm stuck as my cluster is not running. it is saying - Azure Quota Exceeded Exception.
    I'd be grateful if you could help solve this.

  • @sravankumar1767
    @sravankumar1767 2 ปีที่แล้ว +2

    Superb Raja

  • @kamalbhallachd
    @kamalbhallachd 3 ปีที่แล้ว +1

    Helpful and knowledgeable session

  • @alex45688
    @alex45688 ปีที่แล้ว +1

    I don't have such option as you have shown
    in community edition related to cluster, it is mentioned as your compute will automatically terminate after an idle period of one or two hours.

    • @rajasdataengineering7585
      @rajasdataengineering7585  ปีที่แล้ว

      Databricks community edition has many limitations and this is one of that limitation

    • @alex45688
      @alex45688 ปีที่แล้ว

      @@rajasdataengineering7585 ok can you share the data sample it would be helpful for practice

  • @Kohli079
    @Kohli079 3 หลายเดือนก่อน

    Hi sir I don't understand use of spot instance?

  • @rajanib9057
    @rajanib9057 ปีที่แล้ว

    whats the difference between standard and single node what woukd be the driver /worker configuration for standard cluster

    • @krisharjunakinjarapu3071
      @krisharjunakinjarapu3071 4 หลายเดือนก่อน +1

      Standard have both worker and driver single node will have only driver this driver only have to do execution

  • @bossofdata
    @bossofdata 2 ปีที่แล้ว +1

    I am confused about the terminate after _ minute option
    what will happen after a specified time period
    is that cluster will terminate permanently
    or will that cluster restart again while executing the next query

    • @rajasdataengineering7585
      @rajasdataengineering7585  2 ปีที่แล้ว

      It will permanently shut down after x minutes of idle time, if this option is enabled. So when we need to execute the next query, need to turn on the cluster manually

  • @SatishKumar-nr8ju
    @SatishKumar-nr8ju 5 หลายเดือนก่อน +1

    Hi Raja Can you please provide Remaining Transformations also

  • @VISHVABATTULA-p7l
    @VISHVABATTULA-p7l 2 หลายเดือนก่อน +1

    thank you sir

  • @saivamsiy7312
    @saivamsiy7312 2 หลายเดือนก่อน +1

    Raja the great 👍

  • @NikhilGosavi-go7be
    @NikhilGosavi-go7be หลายเดือนก่อน +2

    done

  • @arupnaskar3818
    @arupnaskar3818 2 ปีที่แล้ว +1

    Hi Raja u r superb... 🌸
    Thank you so much for your excellent work 💐💐..
    Raja I am stuck I an situation... actually, I have a oltp postgres server and datalake as olap ... currently, all the customer service on my oltp server .... due to which we have indexes on oltp ...
    which is making huge latency on inserting ... COULD you pls suggest how to add a staging server .. which will get change feeds from my main oltp ...
    Please, help .. 🙏 thank u :)

    • @rajasdataengineering7585
      @rajasdataengineering7585  2 ปีที่แล้ว

      Thanks Arup.
      If index is problem while inserting, you can think of partitin switch/exchange which is most commonly available in most of the databases. Also you can think of option where we can disable index while bulk loading and enable right after loading

    • @arupnaskar3818
      @arupnaskar3818 2 ปีที่แล้ว +1

      @@rajasdataengineering7585 Thank you so much Raja .🍁.for replying ...
      Actuall, data coming as as streaming so bulk data load and applying index won't be a good option ...but I will do POC on partition switch.. thank u once again...

  • @sureshkoduru8810
    @sureshkoduru8810 2 ปีที่แล้ว

    Hi raja nice video, I have farced the one of the interview question what is cluster size in your project, how to configure cluster size

    • @harishgangisetty7482
      @harishgangisetty7482 7 หลายเดือนก่อน

      Hi are you working related to data engineering

  • @sankarazad7574
    @sankarazad7574 ปีที่แล้ว +1

    Hi sir is there any python script to create the cluster automatically

    • @rajasdataengineering7585
      @rajasdataengineering7585  ปีที่แล้ว

      Hi, yes it is possible through cluster APIs provided by databricks

    • @sankarazad7574
      @sankarazad7574 ปีที่แล้ว

      @@rajasdataengineering7585 can you please provide any link for reference here.