Apache Spark - 04 - Architecture - Part 2

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 ธ.ค. 2024
  • Spark Programming and Azure Databricks ILT Master Class by Prashant Kumar Pandey - Fill out the google form for Course inquiry.
    forms.gle/Nxk8...
    -------------------------------------------------------------------
    Data Engineering using is one of the highest-paid jobs of today.
    It is going to remain in the top IT skills forever.
    Are you in database development, data warehousing, ETL tools, data analysis, SQL, PL/QL development?
    I have a well-crafted success path for you.
    I will help you get prepared for the data engineer and solution architect role depending on your profile and experience.
    We created a course that takes you deep into core data engineering technology and masters it.
    If you are a working professional:
    1. Aspiring to become a data engineer.
    2. Change your career to data engineering.
    3. Grow your data engineering career.
    4. Get Databricks Spark Certification.
    5. Crack the Spark Data Engineering interviews.
    ScholarNest is offering a one-stop integrated Learning Path.
    The course is open for registration.
    The course delivers an example-driven approach and project-based learning.
    You will be practicing the skills using MCQ, Coding Exercises, and Capstone Projects.
    The course comes with the following integrated services.
    1. Technical support and Doubt Clarification
    2. Live Project Discussion
    3. Resume Building
    4. Interview Preparation
    5. Mock Interviews
    Course Duration: 6 Months
    Course Prerequisite: Programming and SQL Knowledge
    Target Audience: Working Professionals
    Batch start: Registration Started
    Fill out the below form for more details and course inquiries.
    forms.gle/Nxk8...
    --------------------------------------------------------------------------
    Learn more at www.scholarnes...
    Best place to learn Data engineering, Bigdata, Apache Spark, Databricks, Apache Kafka, Confluent Cloud, AWS Cloud Computing, Azure Cloud, Google Cloud - Self-paced, Instructor-led, Certification courses, and practice tests.
    ========================================================
    SPARK COURSES
    -----------------------------
    www.scholarnes...
    www.scholarnes...
    www.scholarnes...
    www.scholarnes...
    www.scholarnes...
    KAFKA COURSES
    --------------------------------
    www.scholarnes...
    www.scholarnes...
    www.scholarnes...
    AWS CLOUD
    ------------------------
    www.scholarnes...
    www.scholarnes...
    PYTHON
    ------------------
    www.scholarnes...
    ========================================
    We are also available on the Udemy Platform
    Check out the below link for our Courses on Udemy
    www.learningjo...
    =======================================
    You can also find us on Oreilly Learning
    www.oreilly.co...
    www.oreilly.co...
    www.oreilly.co...
    www.oreilly.co...
    www.oreilly.co...
    www.oreilly.co...
    www.oreilly.co...
    www.oreilly.co...
    =========================================
    Follow us on Social Media
    / scholarnest
    / scholarnesttechnologies
    / scholarnest
    / scholarnest
    github.com/Sch...
    github.com/lea...
    ========================================

ความคิดเห็น • 112

  • @ScholarNest
    @ScholarNest  3 ปีที่แล้ว

    Want to learn more Big Data Technology courses. You can get lifetime access to our courses on the Udemy platform. Visit the below link for Discounts and Coupon Code.
    www.learningjournal.guru/courses/

    • @mjeewani123
      @mjeewani123 2 ปีที่แล้ว

      I really like your content, very easy to understand. THANK YOU. Have you covered any where how RDD helps recover from fault tolerance?

  • @iitgupta2010
    @iitgupta2010 5 ปีที่แล้ว +18

    This is the best video....wt a explanation sir..mind blowing. I feel bad where bad teacher get too much attention where people like you don't get much...
    U r brilliant

  • @philippederome2434
    @philippederome2434 5 ปีที่แล้ว +11

    I love the curtains opening up special effect!

  • @davidpeng8431
    @davidpeng8431 ปีที่แล้ว

    Your video is one of the best for Spark, not spend too much on theory and high level, but down to the earth, very practical.

  • @manasamathi7157
    @manasamathi7157 3 ปีที่แล้ว

    Even beginners can understand the flow who has zero knowledge about spark.. Great explanation 😊

  • @sethiabhithemaverick
    @sethiabhithemaverick 6 ปีที่แล้ว +4

    This is the easiest and most explanatory explanation of complete spark architecture one can ever get

  • @muhammadarslanbhatti2139
    @muhammadarslanbhatti2139 4 ปีที่แล้ว

    Hands down the best explanation you'll find on youtube

  • @SachinChavan13
    @SachinChavan13 5 ปีที่แล้ว +1

    Wow ! Very crisp and to-the-point explanation. Really helpful. Thank you Prashant!

  • @physicsmadness
    @physicsmadness 3 ปีที่แล้ว

    extremely lucid and to the point...congrats !

  • @sthirumalai
    @sthirumalai 5 ปีที่แล้ว

    I started learning Spark by enrolling to a self learning course in UDEMY but by far this is the best video i have ever watched which explained the core concepts of SPARK clear and precise. I appreciate your efforts

  • @sanjaysoni5017
    @sanjaysoni5017 4 ปีที่แล้ว

    Awesome video with correct explanation.

  • @franciscovinueza5320
    @franciscovinueza5320 5 ปีที่แล้ว

    Images, Colors, examples and clear explanations! This video has everything! Keep up the good work! Thank you Sir.

  • @maheshpoddar4065
    @maheshpoddar4065 4 ปีที่แล้ว

    Exceptional way of explaining and making concepts crystal clear. I am enjoying it the way I used to enjoy your earlier videos on Hadoop.

  • @anilpatil3056
    @anilpatil3056 6 ปีที่แล้ว +1

    Highly recommended for every Spark newbee. BTW thanks a lot..

  • @gefeizhu3953
    @gefeizhu3953 5 ปีที่แล้ว +1

    Fantastic video,I have subscribed your video!

  • @erpoojadak
    @erpoojadak 6 ปีที่แล้ว +1

    the best tutorial i have ever seen..simply awesome

  • @Nasirmah
    @Nasirmah 5 ปีที่แล้ว

    It should be a=> (a(1),1)) to get the second field if you want the result to be as shown in 9:30. First field of the array is empty string, but you can still reduce by key since it will be all empty string at the end but not if all files didnt contain /etc root path. I find it useful to run collect() at each step like kvRDD.collect() to see. Thank you very much for the best spark tutorial, I let the adds run to help out.

  • @philippederome2434
    @philippederome2434 5 ปีที่แล้ว +2

    I like the animal logos for the 3 APIs, turtle for RDD (slowest), cat for SQL-Dataset (medium), rabbit for DataFrame (fastest), but see Brian M. Clapper recent video on Frameless API (fast, compile-safe, and more functional, i.e. can compose actions).

  • @abhishekt450
    @abhishekt450 4 ปีที่แล้ว

    Just brilliant 👌.. point to point..

  • @nishantgupta8562
    @nishantgupta8562 6 ปีที่แล้ว

    Best video by far.. What a teacher you are.

  • @bhavaniv1721
    @bhavaniv1721 5 ปีที่แล้ว

    Please post more videos,I following all Ur video,Ur videos are something different to others.....it easily understandable way

    • @pratiksingh9480
      @pratiksingh9480 3 ปีที่แล้ว

      you mean Data Savvy :P , Yeah Prashant is really good in explanation

  • @shivam.shakya
    @shivam.shakya 2 ปีที่แล้ว

    Great video

  • @vijaykumar-wq9db
    @vijaykumar-wq9db 4 ปีที่แล้ว

    Thank you sir...super video

  • @khelifakemouche4070
    @khelifakemouche4070 6 ปีที่แล้ว

    Great tutorial and Excellent teaching

  • @sachinpatil54321
    @sachinpatil54321 5 ปีที่แล้ว

    Outstanding..

  • @deepanshunagpal6440
    @deepanshunagpal6440 3 ปีที่แล้ว

    nicely explained.

  • @Vihaan_Nigam16
    @Vihaan_Nigam16 6 ปีที่แล้ว

    Excellent way of teaching ...Thank you

  • @pramodswain6043
    @pramodswain6043 6 ปีที่แล้ว

    It is really appreciated,i never ever seen an explanation like this,so thanks a lots sir for revealing such extraordinary skills.....

  • @kannanarumugam9257
    @kannanarumugam9257 7 ปีที่แล้ว

    Thank you ver much!, nicely explained spark architecture. there is no other better way than this.. keep the good work.!

  • @AzharHussain2u
    @AzharHussain2u 4 ปีที่แล้ว

    just awesome

  • @sandeepkumarvadde
    @sandeepkumarvadde 5 ปีที่แล้ว

    This is an extraordinary explanation of spark architecture.
    Sir, please pick a few examples to implement on cluster mode too.

  • @sachinhaldankars
    @sachinhaldankars 6 ปีที่แล้ว

    Simply Awesome explanation...

  • @damodharable
    @damodharable 6 ปีที่แล้ว +2

    excellent teaching skills,thanks a lot :)

  • @shubhampatil2391
    @shubhampatil2391 2 ปีที่แล้ว

    Thank you for the great content!! just one request though please add a highlighter to your pointer it is kind of hard to track its movement and often have to rewind to check what actually you clicked on

  • @akashhudge5735
    @akashhudge5735 4 ปีที่แล้ว

    nice explanation

  • @malapatiprasanna
    @malapatiprasanna 5 ปีที่แล้ว

    Thanks a lot, sir for your outstanding efforts in making us brilliant. Could you please add some more spark videos shared variables, detailed transformations, and actions. Really I am double satisfied with your explanations, going forward we want to see more from you on spark.

  • @tejaswianagani8756
    @tejaswianagani8756 6 ปีที่แล้ว

    Very very good explaination sir, am very much thankfull to you.

  • @asksmruti
    @asksmruti 6 ปีที่แล้ว

    Your tutorials are simply awesome.. :-) Super Like

  • @TheVikash620
    @TheVikash620 6 ปีที่แล้ว

    Great explanation sir. Waiting for new concepts to be covered in future videos.

  • @vishalteli7343
    @vishalteli7343 5 ปีที่แล้ว

    Simply Best!

  • @RahulEdvin
    @RahulEdvin 5 ปีที่แล้ว

    excellent explanation !

  • @avijitmukherjee678
    @avijitmukherjee678 4 ปีที่แล้ว

    Thanks so much, Sir

  • @sanjaykumarmahapatra
    @sanjaykumarmahapatra 7 ปีที่แล้ว

    Nice way of explanation. Thank you so much for your effort on making so nice tutorials. I am becoming a fan of you man! keep it up (Y)

  • @althafmohammed5285
    @althafmohammed5285 6 ปีที่แล้ว

    It's really amazing it's really real time level

  • @KoushikPaulliveandletlive
    @KoushikPaulliveandletlive 5 ปีที่แล้ว

    just too good, you need too much of knowledge, when you can explain the complex things such easily

  • @csharma8732
    @csharma8732 6 ปีที่แล้ว

    Very nice video sir. Thank you.

  • @nationviews6760
    @nationviews6760 7 ปีที่แล้ว

    Thank you so much, Sir, for providing such a nice practical explanation.

  • @sagarsinghrajpoot3832
    @sagarsinghrajpoot3832 6 ปีที่แล้ว

    Great video 🤓🤓sir

  • @damodargoud6263
    @damodargoud6263 5 ปีที่แล้ว

    thanks for sharing your knowledge.

  • @paritoshahuja5058
    @paritoshahuja5058 5 ปีที่แล้ว

    Really amazing explanation thank you

  • @송찬호-r8j
    @송찬호-r8j 5 ปีที่แล้ว

    very very amazing . thank you

  • @raunakgpt
    @raunakgpt 4 ปีที่แล้ว

    Very Good video. Thanks sir. But I didn't anything with Apache Spark -05 in Playlist. Do we have some more videos on architecture?

  • @repsycled1605
    @repsycled1605 6 ปีที่แล้ว

    One of the best video series for learning .. Do you also provide classroom trainings as well

  • @biswajitsarkar5538
    @biswajitsarkar5538 6 ปีที่แล้ว

    Great explanation !!

  • @premrajkumar6910
    @premrajkumar6910 7 ปีที่แล้ว

    Nice video with very clear explanation. But We will have to wait very long for a new session . Please try to upload fast, otherwise it will take a year to learn Spark.

  • @helloworld4u
    @helloworld4u 4 ปีที่แล้ว

    Thankyou

  • @pc0riginal870
    @pc0riginal870 5 ปีที่แล้ว

    thank you so much from the bottom of my heart. god make you happy.

  • @robind999
    @robind999 5 ปีที่แล้ว

    Very good one .any airflow demo?

    • @ScholarNest
      @ScholarNest  5 ปีที่แล้ว

      That's still incubating...I do not use open source until they graduate to become production ready.

    • @robind999
      @robind999 5 ปีที่แล้ว

      @@ScholarNest Thank you so much for this info.

  • @pratiksingh9480
    @pratiksingh9480 3 ปีที่แล้ว

    Hi Prashant Sir ,
    First things first :
    I am planning to take-up the course . Your explanation viz etc. are awesome kudos for that. The only thing that concerns me is that I have lot of questions when I study anything , some silly as well.
    Is there any channel (Slack/Discord etc for enrolled students) , where the doubts are cleared. Some AMA kind of sessions etc , becuse going through stuffs and having uncleared doubts will leave a learner is almost the same state. Will share the same message with you over lnkedin as well , not sure how frequently you look into TH-cam comments.

  • @sbylk99
    @sbylk99 5 ปีที่แล้ว

    Best tutorial thank you!

  • @annaynomouse2821
    @annaynomouse2821 4 ปีที่แล้ว

    How do you create animation shown from 4:50 to 4:55. Which software. I like how you bring clarity visually.

    • @ScholarNest
      @ScholarNest  4 ปีที่แล้ว

      Power point :-) Office 365

  • @a143r
    @a143r 6 ปีที่แล้ว

    xcellent sir....!

  • @rakeshsahoo16
    @rakeshsahoo16 5 ปีที่แล้ว

    Why proc and opt came in 1 partition ??

  • @143badri
    @143badri 5 ปีที่แล้ว

    What is the default number of partitions if we are not defining it...

  • @kidslearningscience
    @kidslearningscience 5 ปีที่แล้ว

    A supplementary video with Amazon EMR please.

  • @ramkumarananthapalli7151
    @ramkumarananthapalli7151 3 ปีที่แล้ว

    Hi
    Thanks a lot for these videos. They are quite helpful. In this video you mentioned that RDD is immutable, but you have overridden same RDD right, by changing number of partitions. Also we can load different text file into the same named variable(RDD). Could you explain how it is immutable in this case.
    Thanks in advance for your help.

  • @nikhil199029
    @nikhil199029 6 ปีที่แล้ว

    is reducybykey a spark/scala specefic function?

  • @AIMLBites
    @AIMLBites 5 ปีที่แล้ว

    Thanks for the wonderful explanation in this video. Can you please tell if this is a general scenario for each job in spark, that map and reducebykey operations usually run in 2 different stages always or if there are cases that they can run in a single stage as well. Any examples or leads would be appreciated!

    • @ScholarNest
      @ScholarNest  5 ปีที่แล้ว

      Think about it. Map and Reduce? You are already talking about two stages.

  • @4ukcs2004
    @4ukcs2004 6 ปีที่แล้ว

    Sincerely looking for spark streaming with Kafka tutorial sir...when r u pubishing sir..you are the best..

  • @gopinathGopiRebel
    @gopinathGopiRebel 6 ปีที่แล้ว

    Sir i have a doubt like how no of cores of executors and processing of partitions depend on ?

  • @nidhidewan5173
    @nidhidewan5173 7 ปีที่แล้ว

    waiting for more videos :)

    • @ScholarNest
      @ScholarNest  7 ปีที่แล้ว

      +Nidhi Dewan, coming soon.... I am slightly busy to code a website for learning journal. Just another week away from the release.

    • @tushibhaque863
      @tushibhaque863 7 ปีที่แล้ว

      I thank you from the deep of my heart for your hard work....

  • @sunilgaikwad3254
    @sunilgaikwad3254 6 ปีที่แล้ว

    Hello Sir, loved this tutorial..thanks a lot.
    I have one doubt, consider following scenario:
    Input data size(raeding from hdfs): 20 GB
    No of executors: 2
    executor memory : 8 GB
    RDD partition factor: 2
    and we run a spark job in client mode.
    So in this case:
    1. how total 20GB data will get processed through sparkjob?
    2. How many stages and task will get created?
    3. how total 20gb data will be partitioned?

    • @ScholarNest
      @ScholarNest  6 ปีที่แล้ว

      1. Do you need 20GB memory to process 20 GB data? No. More memory can improve performance but you can still process it with less memory.
      2. Stages depend on your logic and the number of task on executors.
      3. You asked for two partition so it will shuffle and make it two in that stage. Next stage depends on other factors.

  • @NareshJadapalli236
    @NareshJadapalli236 5 ปีที่แล้ว

    I am confused in one step.
    When we say, RDD distributes data into nodes.
    We create 5 partitions from RDD. It means RDD has loaded all the data and do partitioning, is it?
    Will it load data from different nodes to the driver node and keep it in memory and distribute across?
    If yes, it is not following data locality paradigm and data movement is very costly. (I am sure spark follows data locality)
    What am I missing?

  • @tajirapb
    @tajirapb 5 ปีที่แล้ว

    With spark 2.3.2, number of elements within each partition is not being displayed by the code that your have shown.

  • @Modern_revolution
    @Modern_revolution 5 ปีที่แล้ว

    Super happy

  • @DavidZYW
    @DavidZYW 6 ปีที่แล้ว

    thanks, I have a question, does the shuffle and sort executed in multiple nodes ?

    • @ScholarNest
      @ScholarNest  6 ปีที่แล้ว

      Yes, every node that owns a partition must participate in shuffle & sort.

  • @amirboutaghou274
    @amirboutaghou274 5 ปีที่แล้ว

    hello , so first of all i want to thank you for this superb tutorial. please i have one question following your example of imagin we have 10 partitions and 2 executor and we lets suppose in this example we dont have transformation that gona cause shuffle how many task parrallel there is it 5 ?
    thank in advance for your answear

    • @ScholarNest
      @ScholarNest  5 ปีที่แล้ว +1

      Why do you think it's going to be 5?
      Because 10 partitions /2 executors?
      Number of executor have nothing to do with how many tasks are created. Once tasks are created, they will run on only two executors because you have only two executors.

    • @amirboutaghou274
      @amirboutaghou274 5 ปีที่แล้ว

      @@ScholarNest first of all thank for your quick reply. so i undersntand in my example number of task created per stage depend only by number of partition . number of execeutor have nothing to with it im a correct plaese ? so i my example i will still 10 task because i have 10 partition ?

  • @AwaraGhumakkad
    @AwaraGhumakkad 4 ปีที่แล้ว

    Sir i have executed the textFile() command with 5 partitions in cluster mode (5 W) but every time I could see that job is being executed only 1 of the workers.
    I mean in every run only 1 worker was executing all the partitions.
    is there any extra configuration required here.
    i am using spark-shell mode

    • @AwaraGhumakkad
      @AwaraGhumakkad 4 ปีที่แล้ว

      Please ignore this i got my answer .
      thanks anyways

  • @surajpillai2117
    @surajpillai2117 5 ปีที่แล้ว

    hello... I had a question. The intermediate RDDs which are generated, the partitioned data under them would also be distributed to the executors? or would the redistribution only happen on an action? Please help! :)

    • @ScholarNest
      @ScholarNest  5 ปีที่แล้ว

      Everything is lazy so nothing happens until an action is executed.

  • @premrajkumar6910
    @premrajkumar6910 7 ปีที่แล้ว

    Also if possible please explain the code using Java APIs too. I am doing development using Java API, but some methods are not supporting even after it's mentioned in API document and throwing run-time error. Is that when we are doing development using Java API or Python API, will it get converted to Scala language internally?

  • @NikhilKekan
    @NikhilKekan 6 ปีที่แล้ว

    Hello,Great tutorial.
    Can you please elaborate more on reduceByKey((x,y) => x+y) that you have used to count number of pairs with same key.
    I am a bit confused how x+y will give us the total count

    • @xmankamal
      @xmankamal 5 ปีที่แล้ว +2

      Here reduceByKey is aggregating the result of array (for similar key) to one value.
      Suppose you have (key, value) list :- List((hello, 1), (world, 1), (hello, 1), (hello, 2)).
      reduceByKey will perform operation on similar key and x, y denotes the value only from key, value pair (you cannot perform operation on key here)
      For key: hello
      rdd.reduceByKey(x,y => x+y) -equivalent to (1,1 => 1+1) => List((hello, 2), (world, 1), (hello, 2))
      There are still pair exists belongs to hello key here. so again operation will be perform
      rdd.reduceByKey(x,y => x+y) -equivalent to (2,2 => 2+2) => List((hello, 4), (world, 1))
      Now list has only one hello key pair, so no further reduction can be possible here.

  • @ravikirantuduru1061
    @ravikirantuduru1061 6 ปีที่แล้ว

    Sir I have one doubt is no of partitions is equal to no of executors?

  • @abhishekbarnwal5867
    @abhishekbarnwal5867 5 ปีที่แล้ว

    I am using spark 2.2.0 but the code shown by you in the video doesn't print any output in shell.
    val myrdd = sc.textFile("UserData.txt",4)
    myrdd.foreachPartition(x => println("No. of elements in partition: " + x.count(y=>true)))
    Please share the workable code.

  • @ravikirantuduru1061
    @ravikirantuduru1061 6 ปีที่แล้ว

    I have one doubt is no of partitons is equal to no of executive?

    • @csharma8732
      @csharma8732 6 ปีที่แล้ว

      NO, Executor runs tasks in it.

  • @KnowWorldsFact
    @KnowWorldsFact 5 ปีที่แล้ว

    Thanks Sir, Can you please give me link for part-3. I couldn't find

    • @ScholarNest
      @ScholarNest  5 ปีที่แล้ว +1

      Check the playlist

    • @KnowWorldsFact
      @KnowWorldsFact 5 ปีที่แล้ว

      Thanks,will Check. you have explained all videos in very simple language. :)

  • @jaineshmodi
    @jaineshmodi 7 ปีที่แล้ว

    Sir I am doing development using spring Kafka, could you please help me with consumer question? how do i poll in regular intervals e.g every 5 mins and how do I specify number of records to be read in every poll?
    I saw batch listener can be used to specify number of records to read but did not find polling interval option.
    Thanks.

    • @ScholarNest
      @ScholarNest  7 ปีที่แล้ว

      +Jainesh Modi what is spring Kafka?

    • @jaineshmodi
      @jaineshmodi 7 ปีที่แล้ว

      Learning Journal sir I meant to say Kafka with spring boot

    • @ScholarNest
      @ScholarNest  7 ปีที่แล้ว

      +Jainesh Modi have you seen my Kafka videos? I have discussed consumer APIs in detail. I am not sure what do you mean my batch listener?

    • @jaineshmodi
      @jaineshmodi 7 ปีที่แล้ว

      Yes sir i have gone through ur videos.
      my requirement is as a consumer i want to put delay with every poll and also want to control number of records being read in every poll.

    • @ScholarNest
      @ScholarNest  7 ปีที่แล้ว

      +Jainesh Modi to be honest, I haven't used spring Kafka, just saw the documentation. Looks interesting. I will plan for some time to evaluate it and send you details if I find an answer to your problem.

  • @elvinanoronha6032
    @elvinanoronha6032 5 ปีที่แล้ว

    Awesome explanation !!!!!