Pyspark Tutorials 3 | pandas vs pyspark || what is rdd in spark || Features of RDD

แชร์
ฝัง
  • เผยแพร่เมื่อ 26 ธ.ค. 2024

ความคิดเห็น • 27

  • @satyanarayana9630
    @satyanarayana9630 4 หลายเดือนก่อน

    Great and Clear expression..the one and only best playlist for Pyspark in youtube

  • @fahdelalaoui3228
    @fahdelalaoui3228 2 ปีที่แล้ว +2

    that's what I call quality content. Very logically presented and instructed.

  • @deepaktamhane8373
    @deepaktamhane8373 3 ปีที่แล้ว +3

    Great sir ...happy for clearing the concepts

    • @RanjanSharma
      @RanjanSharma  3 ปีที่แล้ว

      Keep watching..thanks bro . Keep sharing and Exploring bro :)

  • @neerajjain2138
    @neerajjain2138 3 ปีที่แล้ว +6

    Very neat and clear explanation. Thank you so much.!! .**SUBSCRIBED**
    one more thing ..how can someone dislike anyone's efforts to produce such helpful content. please respect the hard work.

    • @RanjanSharma
      @RanjanSharma  3 ปีที่แล้ว

      thanks So nice of you :) . Keep sharing and Exploring bro :)

  • @sukhishdhawan
    @sukhishdhawan 3 ปีที่แล้ว +2

    excellent explanation,, strong hold on concepts,,

    • @RanjanSharma
      @RanjanSharma  3 ปีที่แล้ว

      Glad you liked it! thank you :)

  • @HamdiBejjar
    @HamdiBejjar 2 ปีที่แล้ว

    Excellent Content, Thank you Ranjan.. Subscribed :D

  • @dhanyadave6146
    @dhanyadave6146 2 ปีที่แล้ว +1

    Hi Ranjan, thank you for the great series and excellent explanations. I have two questions:
    1) In the video at 5:05, you mention that PySpark requires a cluster to be created. However, we can create Spark Sessions locally as well if I am not mistaken. When we run spark locally, could you please explain how PySpark would outperform pandas? I am confused about this concept. You can process data using various cores locally, but your ram size will not change right?
    2) In the previous video you mentioned that Apache Spark computing engine is much faster than Hadoop Map Reduce because Hadoop Map Reduce reads data from the hard disk memory during data processing steps, whereas Apache Spark loads the data on the node's RAM. Would there be a situation where this can be a problem? For example, if our dataset is 4TB and we have 4 nodes in our cluster and we assign 1TB to each node. How will an individual node load 1TB data into RAM? Would we have to create more nested clusters in this case?

    • @universal4334
      @universal4334 2 ปีที่แล้ว

      I've same doubt. How spark would store TB's of data in ram

  • @mohamedamineazizi3360
    @mohamedamineazizi3360 3 ปีที่แล้ว +1

    great explanation

    • @RanjanSharma
      @RanjanSharma  3 ปีที่แล้ว

      Glad you think so! Buddy keep exploring and sharing with your friends :)

  • @sridharm8550
    @sridharm8550 2 ปีที่แล้ว

    Nice explanation

  • @JeFFiNat0R
    @JeFFiNat0R 3 ปีที่แล้ว

    Great thank you for this explanation

    • @RanjanSharma
      @RanjanSharma  3 ปีที่แล้ว

      Thanks :) Keep Exploring :)

    • @JeFFiNat0R
      @JeFFiNat0R 3 ปีที่แล้ว

      @@RanjanSharma I just got a job offer for a data engineer working with databricks spark. Your video definitely helped me in the interview. Thank you again.

    • @RanjanSharma
      @RanjanSharma  3 ปีที่แล้ว +1

      @@JeFFiNat0R Glad i could help you 😊

  • @guitarkahero4885
    @guitarkahero4885 3 ปีที่แล้ว +2

    Content wise great videos.. way of explaining can be improved.

    • @RanjanSharma
      @RanjanSharma  3 ปีที่แล้ว

      Glad you think so!Thanks :) Keep Exploring :)

  • @naveenchandra7388
    @naveenchandra7388 3 ปีที่แล้ว

    @9:19 min RDD in memory computation? Panda does in memory isn't it? do RDD also do in-memory.. may be i lost somewhere with point can you explain this minute difference please?

  • @TK-vt3ep
    @TK-vt3ep 4 ปีที่แล้ว +2

    you are too fast in explaining things. Could you please slow down a bit ? btw, good work

    • @RanjanSharma
      @RanjanSharma  4 ปีที่แล้ว +1

      Thanks for your visit .. Keep Exploring :)
      in my further videos , i have decreased the pace.

  • @loganboyd
    @loganboyd 4 ปีที่แล้ว

    Why are you still using RDDs and not the Spark SQL Dataframe API?

    • @RanjanSharma
      @RanjanSharma  4 ปีที่แล้ว +1

      This video was just for explanation of RDD. In next video, I will be explaining SQL DataFrame.

  • @AkashShahapure
    @AkashShahapure ปีที่แล้ว

    Audio is low compared previous 2 videos.

  • @kritikalai8204
    @kritikalai8204 3 ปีที่แล้ว

    **gj**