Just want to share. I came across this video back in 2016 when spark was a buzz word mostly. Did not understand most of it back then and did not watch it. Now again watching it in 2022. It's true gem.
Definitely. It will help you understand the core fundamentals of spark and many other things. Though some of the points might be irrelevant now, but that is not deal breaker.
Excellent presentation of core spark, among the best I've ever watched, despite the older version it covers. Presenter's knowledge is very deep and he delivers it very clearly. Excellent job!!
Note that it is now possible for a worker to spawn multiple executors for the same application, in standalone mode. See PR github.com/apache/spark/pull/731
As a new spark learner I can't ask for more :) This is real developer talk and help in designing and modelling any initial spark projects. Thanks a ton Sameer!!!
Sameer, you have done us all a great service here, appreciate having this posted....very deep coverage of the core architecture, helpful from any number of aspects. Look forward to seeing more in the future as the platform evolves.
The best video. Any chance to get updated one with the latest changes? Like support for multiple executors. Anything else is out of date for Spark 2.x?
Hii.. this is one of the best presentation about spark. One question is, Spark evolved a lot from here. Are these concepts still relevant till today? Any changes or obsolete content of this video? Can any one tell me pls.
Thanks! I'm surprised to see that this video is still being watched since it's 8 years old 😳 I would say that like 75% of it is still accurate. Even if it's not accurate, watch it for the fancy graphics and jokes man.
Sameer,i being a beginner,found this talk a very useful one and towards the end of it i am confident to talk to people about spark.BTW,i loved the standalone flamingo logo you have chosen
Hah! I was able to somehow sneak that in. When making the sides, I was looking for an icon that could visually remind the students of Standalone mode... so I searched google images for "standalone" and found that Flamingo standing alone on one leg...
Hi Sameer, You have mentioned that in sort based shuffle Map side will keep one file handle open. So in above example will that mean one File would be of 1200 MB(1.2 GB) as total size of RDD Partition is 3.6 gb and there are 3 files for each map tasks thereby making 3.6 gb? Thanks Rahul
nice session.thanks. jokes apart glass water level was not going down though you drank multiple times...lol.also not a single time found smile on your face...so serious.lol...anyways it was a great session Sameer.
Thanks Sameer and Kev I was able to get hold of the slides, I have another question. i have heard many times that when a rdd partition gets lost it can be recreated, but is the rdd partitioning logic always deterministic so as to allow its recreation? can i parallelize or do something else which could cause non deterministic partitioning?
Really great explanation about Spark Core.. I've followed your Hadoop tutorials as well, Seems this one is a best one(Improved one). Voice is very clear Sameer
Typically in production Spark deployments I'm seeing machines with like 30-60 GB of RAM and maybe 2 TB SSDs. Each Executor JVM is typically ~30 GB and the Driver JVM is also around 30 GB. For the Worker JVM or Spark Master JVM (in Standalone mode) maybe 4 GB of RAM for each should be fine. You'll want to experiment with different hardware profiles for your specific workloads and use case though.
1:30 Agenda
5:14 History of Spark
27:40 RDD fundamentals
1:20:23 Spark Runtime architecture and resource managers
2:49:24 Memory and Persistence
3:15:30 Serialization
3:19:50 Staging
3:42:00 Shuffle
3:55:00 Broadcast and accumulators
4:31:25 PySpark
4:49:00 Next Gen Shuffle
5:32:00 Spark Streaming
Thanks for the info!
very helpful breakdown of the video. thanks.
Thank you for time offsets
MrTulufan has
this is really helpful thanks
Probably the best Spark video on the Internet right now.
Even till date
It still is
Nothing changed till 2024
Just want to share. I came across this video back in 2016 when spark was a buzz word mostly. Did not understand most of it back then and did not watch it. Now again watching it in 2022. It's true gem.
is this video still relevent? I am new to spark and came across this video should I watch it?
Definitely. It will help you understand the core fundamentals of spark and many other things. Though some of the points might be irrelevant now, but that is not deal breaker.
Aww, my goal with it was to on-board completely new folk to Spark. Sorry if it was confusing first time you watched it.
I'm watching in 2024
This is one of the best free videos ever available on the youtube community.
Well, it can't compete with 3 blue 1 brown's educational videos. Those are on another level.
I wish they made a sequel in 2020
Samee
Can we get the entire deck with all the technical slides?
or in 2024
He said join if you wanna build next gen of big data, watching a 9 year old video I exclaimed, YOU'VE MADE IT!
This is best tutorials I seen..I admire you Sameer for your patience while you answered all Q...
Sameer thank you for putting a professional video that finally explains Spark at the pro level. Much appreciated.
the best presenter ever. Expert in spark as well.
Ultimate video ever seen on Spark internals!
such a sincere presentation.
Great work Sameer,
So far the best detailed Spark presentation I have seen online.
Appreciate a bunch.
Thank you,
Tushar Kale
Excellent presentation of core spark, among the best I've ever watched, despite the older version it covers. Presenter's knowledge is very deep and he delivers it very clearly. Excellent job!!
Note that it is now possible for a worker to spawn multiple executors for the same application, in standalone mode. See PR github.com/apache/spark/pull/731
Best Spark tutorial I have ever come accross.... Thanks Sameer Farooqui....
As a new spark learner I can't ask for more :) This is real developer talk and help in designing and modelling any initial spark projects. Thanks a ton Sameer!!!
Here are more Spark videos, if you are interested Spark Interview Questions: th-cam.com/play/PL9sbKmQTkW05mXqnq1vrrT8pCsEa53std.html
@@harjeetkumar4632 hi bro, iam newbie to spark, so want to learn can you pls share the path..thank you..😊
The best tutorials for spark, really.
Joining others, it's a must watch video
Sameer, you have done us all a great service here, appreciate having this posted....very deep coverage of the core architecture, helpful from any number of aspects. Look forward to seeing more in the future as the platform evolves.
Thank You Sameer.I learned a lot about spark after watching your videos....Will be waiting for your next 5hrs hands on video in next Summit
Really, you are the fantastic presentation Sameer.! Keep posting some more video.
Session starts at @5:20
about cluster mode (standalone mode) @1:39:12
Thanks Sameer !! This is a best video on Spark Internals i came across.
Wow, fantastic presentation Sameer! The topics you cover about Spark Core are awesomely explained. Great work!
Excellent presentation! It really walks through all aspects in detail. thanks
complex concepts explained nicely in diagrams, easy to grasp when Sameer explains :)
Excellent video. Great starting point for Databricks/Spark
What a introduction and overview. Great session
Excellent content on Spark Architecture
One of the best detailed spark session. Thank you
where can I find the slides?
one of the best presentation on spark
about cluster mode (local mode) @1:29:44
very good presentation
Very good presentation ..Thank you .
The best video. Any chance to get updated one with the latest changes? Like support for multiple executors. Anything else is out of date for Spark 2.x?
Hii.. this is one of the best presentation about spark. One question is, Spark evolved a lot from here. Are these concepts still relevant till today? Any changes or obsolete content of this video? Can any one tell me pls.
Thanks! I'm surprised to see that this video is still being watched since it's 8 years old 😳 I would say that like 75% of it is still accurate. Even if it's not accurate, watch it for the fancy graphics and jokes man.
Excellent presentation Sameer. Thank you.
Sameer,i being a beginner,found this talk a very useful one and towards the end of it i am confident to talk to people about spark.BTW,i loved the standalone flamingo logo you have chosen
Hah! I was able to somehow sneak that in. When making the sides, I was looking for an icon that could visually remind the students of Standalone mode... so I searched google images for "standalone" and found that Flamingo standing alone on one leg...
@@blueplasticvideos how can I download the slides
The link is not working.
Who disliked this video? This is the spark bible. Thanks Sameer
haters gonna hate.
Best video on spark
Sammer, you are awesome ... very good presentation Thanks bro.
Off course.
A Masterpiece, thanks Sameer & Databricks
The best ever on spark!!
Awesome Sameer. Thank you.
Awesome explanation. Thanks a lot Sameer.
Thank you so much! I had lot of my fundamental doubts cleared (as an Engineer who likes to know what goes on underneath)
Excellent Session Sameer !
Very nice video. Best online tutorial for Spark. Sameer has superb presentation skill. Thanks:)
It would be great if you could share link to the labs.
can anyone share slides
Just amazing..
i think 2:03:07 is the only time he smiles
awesome video though
he was highly concentrating and the audience was stressful.
you are doing a great job Bro.....your sessions are very useful...please keep posting
seriously good
Good tutorial to understand in-depth knowledge about spark core. It also help for production setup.
Hi Sameer,
You have mentioned that in sort based shuffle Map side will keep one file handle open. So in above example will that mean one File would be of 1200 MB(1.2 GB) as total size of RDD Partition is 3.6 gb and there are 3 files for each map tasks thereby making 3.6 gb?
Thanks
Rahul
Great presentation!
***** - Great Lecture Sameer. Can we have access to DevOps labs 101 and 102 too ??
Very good Presentation. Thank you!!
1:01:00 Databricks Demo starts
Nice detailed explanation
best spark talk ever !!
Great video!
Where can I possibly get latest spark2019 summit videos
Thank you this is very helpful!
Thank you so much Mr. Farooqui!
Today Kubernetes has become the go to Cluster manager for Spark Cluster Computing. Correct me if i am wrong .
just loved it,,
Hi Sameer,
Can I get access to those labs to play with ? Maybe just the devops notebooks
Great stuff Sameer!!
Is there a way to add subtitles?
best stuff ever.
Do you provide online training for hadoop?
In Yarn client or cluster mode, is one executor per application per node holds true as in Spark Standalone?
Could someone post link to the slides as it is no longer available? Thank you.
Anyone looking for Spark on YARN mode, jump to 2:13:04
nice session.thanks.
jokes apart glass water level was not going down though you drank multiple times...lol.also not a single time found smile on your face...so serious.lol...anyways it was a great session Sameer.
Andy Kaufman never smiled either.
Thank you so much Sameer..
Best Spark Material.
about cluster mode @1:21:06
It is excellent session.
Great video, can we download the slides?
Mukul Kumar spark-summit.org/east/training/devops
Thanks Sameer and Kev I was able to get hold of the slides, I have another question. i have heard many times that when a rdd partition gets lost it can be recreated, but is the rdd partitioning logic always deterministic so as to allow its recreation? can i parallelize or do something else which could cause non deterministic partitioning?
Hi Sameer you are a good presenter man, not so sure i need any sparks or Apache but well done
i have a question ? on what basis the partition in RDD decides ?
Loved It,,Thankyou Sameer for Such a nice very very informative presentation!!
Thanks, Pravin. Glad you found it helpful!
Great talk - got a lot out of this.
Where i can get these notebooks?
Not where, but when. 2015.
@@blueplasticvideos i was asking about notebook links 😀
very good presentation :)
Fantastic
Is the code and data available from this session?
Does anyone have resources or source code for a deep learning based RLScheduler in a single node level task scheduling
can anyone tell me how to use note on my local Apache Spark instead command line(shell)
Great talk, how can I have the slides?
+guoqiong song link to the slid spark-summit.org/wp-content/uploads/2015/03/SparkSummitEast2015-AdvDevOps-StudentSlides.pdf
great vid!
Really great explanation about Spark Core.. I've followed your Hadoop tutorials as well, Seems this one is a best one(Improved one). Voice is very clear Sameer
can you share the link for his hadoop tutorials?
th-cam.com/video/ziqx2hJY8Hg/w-d-xo.html
Here are more videos if you are interested Spark Interview Questions: th-cam.com/play/PL9sbKmQTkW05mXqnq1vrrT8pCsEa53std.html
Excellent job Sameer.... thank you!!!
can you please elaborate a scenario where shuffling of data is good ?
before you play poker in a data center.
wow I jumped straight to the part that I was looking for 4:47:30, which is benchmarking, how are the odds :D
what is the hardware configuration of each of the worker node. How should we decide that ?
Typically in production Spark deployments I'm seeing machines with like 30-60 GB of RAM and maybe 2 TB SSDs. Each Executor JVM is typically ~30 GB and the Driver JVM is also around 30 GB. For the Worker JVM or Spark Master JVM (in Standalone mode) maybe 4 GB of RAM for each should be fine. You'll want to experiment with different hardware profiles for your specific workloads and use case though.
video isn't available ?!
Mohamed Fawzy In my mobile shows "video not available".But in pc available!!#