Advanced Apache Spark Training - Sameer Farooqui (Databricks)

แชร์
ฝัง
  • เผยแพร่เมื่อ 13 ธ.ค. 2024

ความคิดเห็น • 198

  • @MrTulufan
    @MrTulufan 6 ปีที่แล้ว +367

    1:30 Agenda
    5:14 History of Spark
    27:40 RDD fundamentals
    1:20:23 Spark Runtime architecture and resource managers
    2:49:24 Memory and Persistence
    3:15:30 Serialization
    3:19:50 Staging
    3:42:00 Shuffle
    3:55:00 Broadcast and accumulators
    4:31:25 PySpark
    4:49:00 Next Gen Shuffle
    5:32:00 Spark Streaming

  • @skipperkongen
    @skipperkongen 9 ปีที่แล้ว +94

    Probably the best Spark video on the Internet right now.

    • @avsbharadwaj8190
      @avsbharadwaj8190 4 ปีที่แล้ว +7

      Even till date

    • @zanfet
      @zanfet 9 หลายเดือนก่อน +1

      It still is

    • @barmalini
      @barmalini 4 หลายเดือนก่อน +1

      Nothing changed till 2024

  • @adrishpal8713
    @adrishpal8713 2 ปีที่แล้ว +2

    Just want to share. I came across this video back in 2016 when spark was a buzz word mostly. Did not understand most of it back then and did not watch it. Now again watching it in 2022. It's true gem.

    • @PatelMahendra
      @PatelMahendra 2 ปีที่แล้ว

      is this video still relevent? I am new to spark and came across this video should I watch it?

    • @adrishpal8713
      @adrishpal8713 2 ปีที่แล้ว +2

      Definitely. It will help you understand the core fundamentals of spark and many other things. Though some of the points might be irrelevant now, but that is not deal breaker.

    • @blueplasticvideos
      @blueplasticvideos ปีที่แล้ว +1

      Aww, my goal with it was to on-board completely new folk to Spark. Sorry if it was confusing first time you watched it.

    • @vabz_parab
      @vabz_parab 3 หลายเดือนก่อน

      I'm watching in 2024

  • @javaidmir9831
    @javaidmir9831 2 ปีที่แล้ว

    This is one of the best free videos ever available on the youtube community.

    • @blueplasticvideos
      @blueplasticvideos ปีที่แล้ว

      Well, it can't compete with 3 blue 1 brown's educational videos. Those are on another level.

  • @pats8589
    @pats8589 4 ปีที่แล้ว +69

    I wish they made a sequel in 2020

    • @Sachin-xd9oj
      @Sachin-xd9oj 4 ปีที่แล้ว

      Samee

    • @geraldinecaszo9563
      @geraldinecaszo9563 3 ปีที่แล้ว +1

      Can we get the entire deck with all the technical slides?

    • @barmalini
      @barmalini 4 หลายเดือนก่อน +1

      or in 2024

  • @sohailpatel7549
    @sohailpatel7549 วันที่ผ่านมา

    He said join if you wanna build next gen of big data, watching a 9 year old video I exclaimed, YOU'VE MADE IT!

  • @arunbm123
    @arunbm123 9 ปีที่แล้ว +15

    This is best tutorials I seen..I admire you Sameer for your patience while you answered all Q...

  • @christianlira1259
    @christianlira1259 5 ปีที่แล้ว +5

    Sameer thank you for putting a professional video that finally explains Spark at the pro level. Much appreciated.

  • @arada123
    @arada123 7 ปีที่แล้ว +2

    the best presenter ever. Expert in spark as well.

  • @SandeshMendan
    @SandeshMendan 6 ปีที่แล้ว +2

    Ultimate video ever seen on Spark internals!

  • @dharmendrabhojwani
    @dharmendrabhojwani 8 ปีที่แล้ว +14

    such a sincere presentation.

  • @TusharKale9
    @TusharKale9 9 ปีที่แล้ว +4

    Great work Sameer,
    So far the best detailed Spark presentation I have seen online.
    Appreciate a bunch.
    Thank you,
    Tushar Kale

  • @singalong9962
    @singalong9962 5 ปีที่แล้ว +3

    Excellent presentation of core spark, among the best I've ever watched, despite the older version it covers. Presenter's knowledge is very deep and he delivers it very clearly. Excellent job!!

  • @maxdemoulin5245
    @maxdemoulin5245 9 ปีที่แล้ว +3

    Note that it is now possible for a worker to spawn multiple executors for the same application, in standalone mode. See PR github.com/apache/spark/pull/731

  • @jahartyagi9695
    @jahartyagi9695 7 ปีที่แล้ว +2

    Best Spark tutorial I have ever come accross.... Thanks Sameer Farooqui....

  • @surendratiwari7980
    @surendratiwari7980 8 ปีที่แล้ว +9

    As a new spark learner I can't ask for more :) This is real developer talk and help in designing and modelling any initial spark projects. Thanks a ton Sameer!!!

    • @harjeetkumar4632
      @harjeetkumar4632 6 ปีที่แล้ว +2

      Here are more Spark videos, if you are interested Spark Interview Questions: th-cam.com/play/PL9sbKmQTkW05mXqnq1vrrT8pCsEa53std.html

    • @passions9730
      @passions9730 2 ปีที่แล้ว

      @@harjeetkumar4632 hi bro, iam newbie to spark, so want to learn can you pls share the path..thank you..😊

  • @yjwoo1131
    @yjwoo1131 7 ปีที่แล้ว +1

    The best tutorials for spark, really.

  • @aleksandrivanov4345
    @aleksandrivanov4345 9 ปีที่แล้ว +10

    Joining others, it's a must watch video

  • @craigholley3287
    @craigholley3287 9 ปีที่แล้ว +3

    Sameer, you have done us all a great service here, appreciate having this posted....very deep coverage of the core architecture, helpful from any number of aspects. Look forward to seeing more in the future as the platform evolves.

  • @jubinsoni4694
    @jubinsoni4694 5 ปีที่แล้ว +1

    Thank You Sameer.I learned a lot about spark after watching your videos....Will be waiting for your next 5hrs hands on video in next Summit

  • @rajeshsurpur
    @rajeshsurpur 6 ปีที่แล้ว +1

    Really, you are the fantastic presentation Sameer.! Keep posting some more video.

  • @αλήθεια-σ4κ
    @αλήθεια-σ4κ 7 ปีที่แล้ว +1

    Session starts at @5:20

  • @정문식-s6y
    @정문식-s6y 7 ปีที่แล้ว

    about cluster mode (standalone mode) @1:39:12

  • @com2ram
    @com2ram 4 ปีที่แล้ว

    Thanks Sameer !! This is a best video on Spark Internals i came across.

  • @bpriorb
    @bpriorb 9 ปีที่แล้ว +16

    Wow, fantastic presentation Sameer! The topics you cover about Spark Core are awesomely explained. Great work!

  • @tomkoshy
    @tomkoshy 9 ปีที่แล้ว +1

    Excellent presentation! It really walks through all aspects in detail. thanks

  • @arunsundar3739
    @arunsundar3739 5 ปีที่แล้ว +1

    complex concepts explained nicely in diagrams, easy to grasp when Sameer explains :)

  • @lackshubalasubramaniam7311
    @lackshubalasubramaniam7311 6 ปีที่แล้ว

    Excellent video. Great starting point for Databricks/Spark

  • @kiraninam
    @kiraninam 4 ปีที่แล้ว

    What a introduction and overview. Great session

  • @aiSuresh
    @aiSuresh 5 ปีที่แล้ว

    Excellent content on Spark Architecture

  • @NishaKumari-op2ek
    @NishaKumari-op2ek 4 ปีที่แล้ว +1

    One of the best detailed spark session. Thank you
    where can I find the slides?

  • @HolmesPatrick
    @HolmesPatrick 9 ปีที่แล้ว

    one of the best presentation on spark

  • @정문식-s6y
    @정문식-s6y 7 ปีที่แล้ว

    about cluster mode (local mode) @1:29:44

  • @mdfurqan3487
    @mdfurqan3487 8 ปีที่แล้ว +1

    very good presentation

  • @uma_mataji
    @uma_mataji 8 ปีที่แล้ว +1

    Very good presentation ..Thank you .

  • @petrnovak9271
    @petrnovak9271 7 ปีที่แล้ว

    The best video. Any chance to get updated one with the latest changes? Like support for multiple executors. Anything else is out of date for Spark 2.x?

  • @seenu0104
    @seenu0104 2 ปีที่แล้ว +2

    Hii.. this is one of the best presentation about spark. One question is, Spark evolved a lot from here. Are these concepts still relevant till today? Any changes or obsolete content of this video? Can any one tell me pls.

    • @blueplasticvideos
      @blueplasticvideos ปีที่แล้ว

      Thanks! I'm surprised to see that this video is still being watched since it's 8 years old 😳 I would say that like 75% of it is still accurate. Even if it's not accurate, watch it for the fancy graphics and jokes man.

  • @SohelKhan-tr6jr
    @SohelKhan-tr6jr 9 ปีที่แล้ว

    Excellent presentation Sameer. Thank you.

  • @surajmon123
    @surajmon123 8 ปีที่แล้ว +2

    Sameer,i being a beginner,found this talk a very useful one and towards the end of it i am confident to talk to people about spark.BTW,i loved the standalone flamingo logo you have chosen

    • @blueplasticvideos
      @blueplasticvideos 8 ปีที่แล้ว +3

      Hah! I was able to somehow sneak that in. When making the sides, I was looking for an icon that could visually remind the students of Standalone mode... so I searched google images for "standalone" and found that Flamingo standing alone on one leg...

    • @madhavkondapalli785
      @madhavkondapalli785 4 ปีที่แล้ว

      @@blueplasticvideos how can I download the slides
      The link is not working.

  • @DeepakRajak05
    @DeepakRajak05 5 ปีที่แล้ว +1

    Who disliked this video? This is the spark bible. Thanks Sameer

  • @kaleemahmadkhan5764
    @kaleemahmadkhan5764 4 ปีที่แล้ว

    Best video on spark

  • @vsandeep06
    @vsandeep06 5 ปีที่แล้ว +1

    Sammer, you are awesome ... very good presentation Thanks bro.

  • @debmalyapanday
    @debmalyapanday 3 ปีที่แล้ว

    A Masterpiece, thanks Sameer & Databricks

  • @hafizca
    @hafizca 9 ปีที่แล้ว +1

    The best ever on spark!!

  • @NB-xc6qq
    @NB-xc6qq 8 ปีที่แล้ว +1

    Awesome Sameer. Thank you.

  • @nirmalagra
    @nirmalagra 7 ปีที่แล้ว

    Awesome explanation. Thanks a lot Sameer.

  • @sarnathk1946
    @sarnathk1946 5 ปีที่แล้ว +2

    Thank you so much! I had lot of my fundamental doubts cleared (as an Engineer who likes to know what goes on underneath)

  • @krishna079
    @krishna079 8 ปีที่แล้ว

    Excellent Session Sameer !

  • @AbhaySingh-ir4fy
    @AbhaySingh-ir4fy 7 ปีที่แล้ว +1

    Very nice video. Best online tutorial for Spark. Sameer has superb presentation skill. Thanks:)

  • @rpiitkgpian
    @rpiitkgpian 6 ปีที่แล้ว +1

    It would be great if you could share link to the labs.

  • @watchmanling
    @watchmanling 5 ปีที่แล้ว +2

    can anyone share slides

  • @Yash94888
    @Yash94888 8 ปีที่แล้ว +2

    Just amazing..

  • @jizztastic
    @jizztastic 7 ปีที่แล้ว +6

    i think 2:03:07 is the only time he smiles
    awesome video though

    • @philippederome2434
      @philippederome2434 6 ปีที่แล้ว

      he was highly concentrating and the audience was stressful.

  • @virenderdeswal
    @virenderdeswal 8 ปีที่แล้ว

    you are doing a great job Bro.....your sessions are very useful...please keep posting

  • @smagadi124
    @smagadi124 8 ปีที่แล้ว +2

    seriously good

  • @provashdowari2926
    @provashdowari2926 7 ปีที่แล้ว

    Good tutorial to understand in-depth knowledge about spark core. It also help for production setup.

  • @rahulgulati890
    @rahulgulati890 9 ปีที่แล้ว +1

    Hi Sameer,
    You have mentioned that in sort based shuffle Map side will keep one file handle open. So in above example will that mean one File would be of 1200 MB(1.2 GB) as total size of RDD Partition is 3.6 gb and there are 3 files for each map tasks thereby making 3.6 gb?
    Thanks
    Rahul

  • @sribaddela
    @sribaddela 9 ปีที่แล้ว

    Great presentation!

    • @rishiagr
      @rishiagr 9 ปีที่แล้ว

      ***** - Great Lecture Sameer. Can we have access to DevOps labs 101 and 102 too ??

  • @vinodpatil3497
    @vinodpatil3497 8 ปีที่แล้ว

    Very good Presentation. Thank you!!

  • @koumospecial
    @koumospecial 5 ปีที่แล้ว

    1:01:00 Databricks Demo starts

  • @sarrae100
    @sarrae100 7 ปีที่แล้ว

    Nice detailed explanation

  • @viren8577
    @viren8577 4 ปีที่แล้ว

    best spark talk ever !!

  • @dionwang
    @dionwang 9 ปีที่แล้ว +1

    Great video!

  • @rajturani4721
    @rajturani4721 4 ปีที่แล้ว

    Where can I possibly get latest spark2019 summit videos

  • @aidenzhang5959
    @aidenzhang5959 2 ปีที่แล้ว

    Thank you this is very helpful!

  • @BlackHermit
    @BlackHermit 4 ปีที่แล้ว

    Thank you so much Mr. Farooqui!

  • @avsbharadwaj8190
    @avsbharadwaj8190 4 ปีที่แล้ว +1

    Today Kubernetes has become the go to Cluster manager for Spark Cluster Computing. Correct me if i am wrong .

  • @sasmitapanigrahi3520
    @sasmitapanigrahi3520 9 ปีที่แล้ว +2

    just loved it,,

  • @arastuece04
    @arastuece04 8 ปีที่แล้ว

    Hi Sameer,
    Can I get access to those labs to play with ? Maybe just the devops notebooks

  • @sureshsindhwani6317
    @sureshsindhwani6317 7 ปีที่แล้ว

    Great stuff Sameer!!

  • @Luchox5006
    @Luchox5006 4 หลายเดือนก่อน

    Is there a way to add subtitles?

  • @meditating010
    @meditating010 9 ปีที่แล้ว +3

    best stuff ever.

  • @JavaHomeCloud
    @JavaHomeCloud 9 ปีที่แล้ว

    Do you provide online training for hadoop?

  • @soravgulati100
    @soravgulati100 7 ปีที่แล้ว

    In Yarn client or cluster mode, is one executor per application per node holds true as in Spark Standalone?

  • @tadastadux
    @tadastadux 5 ปีที่แล้ว

    Could someone post link to the slides as it is no longer available? Thank you.

  • @muhammadyaseen8923
    @muhammadyaseen8923 5 ปีที่แล้ว

    Anyone looking for Spark on YARN mode, jump to 2:13:04

  • @user-co8oc1rm5w
    @user-co8oc1rm5w 3 ปีที่แล้ว +1

    nice session.thanks.
    jokes apart glass water level was not going down though you drank multiple times...lol.also not a single time found smile on your face...so serious.lol...anyways it was a great session Sameer.

  • @girishjangannavar7827
    @girishjangannavar7827 5 ปีที่แล้ว

    Thank you so much Sameer..

  • @rmyou
    @rmyou 6 ปีที่แล้ว

    Best Spark Material.

  • @정문식-s6y
    @정문식-s6y 7 ปีที่แล้ว

    about cluster mode @1:21:06

  • @2007selvam
    @2007selvam 7 ปีที่แล้ว

    It is excellent session.

  • @kmukul
    @kmukul 9 ปีที่แล้ว +2

    Great video, can we download the slides?

    • @kevalan1042
      @kevalan1042 9 ปีที่แล้ว +1

      Mukul Kumar spark-summit.org/east/training/devops

    • @kmukul
      @kmukul 9 ปีที่แล้ว

      Thanks Sameer and Kev I was able to get hold of the slides, I have another question. i have heard many times that when a rdd partition gets lost it can be recreated, but is the rdd partitioning logic always deterministic so as to allow its recreation? can i parallelize or do something else which could cause non deterministic partitioning?

  • @Nerky7654
    @Nerky7654 9 ปีที่แล้ว +1

    Hi Sameer you are a good presenter man, not so sure i need any sparks or Apache but well done

  • @mdfurqan3487
    @mdfurqan3487 8 ปีที่แล้ว

    i have a question ? on what basis the partition in RDD decides ?

  • @pravinpathak7934
    @pravinpathak7934 7 ปีที่แล้ว

    Loved It,,Thankyou Sameer for Such a nice very very informative presentation!!

    • @blueplasticvideos
      @blueplasticvideos 7 ปีที่แล้ว

      Thanks, Pravin. Glad you found it helpful!

  • @thomasswann1800
    @thomasswann1800 9 ปีที่แล้ว

    Great talk - got a lot out of this.

  • @TechnoSparkBigData
    @TechnoSparkBigData 5 ปีที่แล้ว +2

    Where i can get these notebooks?

  • @maouhoubmouchtaq1688
    @maouhoubmouchtaq1688 8 ปีที่แล้ว

    very good presentation :)

  • @TalhaAsifRahim
    @TalhaAsifRahim 8 ปีที่แล้ว

    Fantastic

  • @hatrixyesa
    @hatrixyesa 8 ปีที่แล้ว

    Is the code and data available from this session?

  • @1UniverseGames
    @1UniverseGames 2 ปีที่แล้ว

    Does anyone have resources or source code for a deep learning based RLScheduler in a single node level task scheduling

  • @shakkur07
    @shakkur07 8 ปีที่แล้ว

    can anyone tell me how to use note on my local Apache Spark instead command line(shell)

  • @guoqiongsong
    @guoqiongsong 9 ปีที่แล้ว +1

    Great talk, how can I have the slides?

    • @partha4utube86
      @partha4utube86 8 ปีที่แล้ว +2

      +guoqiong song link to the slid spark-summit.org/wp-content/uploads/2015/03/SparkSummitEast2015-AdvDevOps-StudentSlides.pdf

  • @hazdazzler
    @hazdazzler 9 ปีที่แล้ว

    great vid!

  • @uday264
    @uday264 9 ปีที่แล้ว +5

    Really great explanation about Spark Core.. I've followed your Hadoop tutorials as well, Seems this one is a best one(Improved one). Voice is very clear Sameer

    • @syedtahaaziz240
      @syedtahaaziz240 7 ปีที่แล้ว

      can you share the link for his hadoop tutorials?

    • @PMestry007
      @PMestry007 6 ปีที่แล้ว

      th-cam.com/video/ziqx2hJY8Hg/w-d-xo.html

    • @harjeetkumar4632
      @harjeetkumar4632 6 ปีที่แล้ว

      Here are more videos if you are interested Spark Interview Questions: th-cam.com/play/PL9sbKmQTkW05mXqnq1vrrT8pCsEa53std.html

  • @madhu.badiginchala
    @madhu.badiginchala 8 ปีที่แล้ว

    Excellent job Sameer.... thank you!!!

  • @jumperankur
    @jumperankur 3 ปีที่แล้ว

    can you please elaborate a scenario where shuffling of data is good ?

  • @lols1503
    @lols1503 7 ปีที่แล้ว

    wow I jumped straight to the part that I was looking for 4:47:30, which is benchmarking, how are the odds :D

  • @dharmendrabhojwani
    @dharmendrabhojwani 8 ปีที่แล้ว

    what is the hardware configuration of each of the worker node. How should we decide that ?

    • @blueplasticvideos
      @blueplasticvideos 8 ปีที่แล้ว +1

      Typically in production Spark deployments I'm seeing machines with like 30-60 GB of RAM and maybe 2 TB SSDs. Each Executor JVM is typically ~30 GB and the Driver JVM is also around 30 GB. For the Worker JVM or Spark Master JVM (in Standalone mode) maybe 4 GB of RAM for each should be fine. You'll want to experiment with different hardware profiles for your specific workloads and use case though.

  • @MohamedFawzy0x00
    @MohamedFawzy0x00 7 ปีที่แล้ว +3

    video isn't available ?!

    • @Pradeepkumar-is8vs
      @Pradeepkumar-is8vs 7 ปีที่แล้ว

      Mohamed Fawzy In my mobile shows "video not available".But in pc available!!#