Intro to Amazon EMR - Big Data Tutorial using Spark
ฝัง
- เผยแพร่เมื่อ 26 มิ.ย. 2024
- Edit*
Make sure you encrypt your Spark script as you upload it inside S3 (timestamp: 13:42)
There's a small typo in line 41 of the code, should be "add_argument"
Intro
Today we're going to talk about a popular tool in Data Engineering. Amazon EMR is an industry-leading big data platform. It's a really mature service developed way back in 2009, and draws a lot of heuristics from the Apache Hadoop project. EMR is used for processing terabytes worth of data, and training machine learning models. In this tutorial, we'll dive deep into EMR's architecture, a live demo on how to trigger jobs using Steps, and demonstrate how to use Spark to extrapolate data from Amazon S3. Hope you enjoy this one!
Timestamps ⏰
0:00 Intro
1:16 Overview of Amazon EMR
5:10 Create filesystem, VPC, and configure EMR cluster
9:04 Writing our Spark script
13:42 3 ways to Trigger Steps in EMR
18:32 SSH into Resource Manager in YARN
19:50 Enable EMR managed auto-scaling
20:57 Summary
Notes from video 📝
bittersweet-mall-f00.notion.s...
Who am I? 🙋🏻♂️
I'm Jay, I love making videos about travel, self-help and tech. I currently work in New York City as a data engineer, but I grew up in Malaysia and lived in the UK when I was 19. Back then, I had no idea what life was about, moving to so many places, navigating career in Tech. Today, I've learned a lot and wanna share my perspective through filmmaking.
Socials 📱
/ jayzern
/ jayzern
Sub Count: 4,539
Absolutely enjoyed watching the entire video. I felt this video is gonna be great start to understand EMR. Thanks for making it jay
I hope you create more videos about AWS services. Loved the way you explain things, perfect for beginners.
We need more videos Jaaay 🙏🏻💪🏻 You're awesome dude!
So sad your channel doesn't have more tutorials like this :( thank you so much!
This is an outstanding tutorial. Thank you for making this!
Very clear! Thank you for sharing this excellent tutorial!
great tutorial! can’t wait to see more
awesome explanation, simple , subtle and to the point!
Great work mate, very crisp!
Thanks man!! Love ur content
gnarly stuff man! great content.
Hey, thank you so much!!.. you really explain very well!
Very well explained, kudos
Go ahead bro....CONGRATS TUTO
Great Article ! Thanks for sharing..
Thanks for sharing man 👌
Great!! Thank u so much!
nice job, great tutorial
This is so goood :). Please keep making these kind of videos! Hello from Seattle
Thanks Isaac from Seattle! Appreciate your support
Your video is very interesting!
Hope you release many new videos :)
Great job
impressive and informative video, good job, go on doing tutorials plss :) Would be very interesting to see a video about spark and snowflake on your channel!
Thanks .. Good stuff
thanks for the video
More tutorials 🙏
this is crazy ❤❤❤ wish i had seen this earlier ! is this how the whole amazon product in a actual work flow look like? and also could you maybe make another showing azure system? pleaaase
Great tutorials! thanks for the headup! do you have a git repo or more notion notes? would like some guidance
You killed it. Loved it! Extremely useful
Thank you man! Hope to create more
Could you share more about project for data engineer beginners? I have start to learn to be a DE recently and I hope to know more about some personal project that help me to enhance my skills. Thank you so much for your sharing and waiting for your next video :> Have a good day
great video! can you make also for AWS Glue? Thank you!
when writing the spark script, does it ever change or the skeleton layout remains the same? i truly appreciate this and i cannot wait for more
Fantastic tutorial indeed! I did as instructed and I got two fails in deploying the 'add step' part of the EMR Cluster stage, any insights would be appreciated.
Please post more videos
More videos on Streaming, Airflow and Spark
is this free to use or do i need to have a licensed software in order to use? this is quite interesting.
I am wondering if I only needed to do PySpark, is EMR the best tool or is it overkill and Glue serverless would be good enough with a lot less to manage and fewer configurations to worry about. Is it possible to enable better performance with all the options in EMR?
And thank you for this video - I’m studying for AWS certification and it was helpful to see your demonstration
From where you learn that coding part 😢
Don't stop
Isnt using EMR notebook one of of the ways to trigger EMR job?
Yes it is! Wanted to keep things simple in the video so didn't include it
I hav done masters of science in biotech, 38 yers of age, want to switch to data science...how shud i do it??? Plz reply.....
Do projects and add them in your resume. Try upwork and do some projects as freelancers. Keep applying
Don’t
@@Ved3sten y , plz reply...
@@syedmehdi5125 bc most companies want senior data analysts or graduate students when it comes to data science. You’ll waste more money chasing a data science job than you’ll make
eah good in EMR AWS but an absolute rookie in Videography and equipment use manual focus since you are stationary.... your autofocus keeps looking for something and change light set-up
Fair point 👍 will work on lighting and camera setup more next time
nice try but its not working
Let me know how I can help
I can add a step for the spark application@@jayzern
Check if
1. the Spark script is encrypted when you upload it inside S3
2. any typos (line 41 should be "add_argument")
I had tried. but it's not working for me @@jayzern
Send me a DM on instagram @jayzern or linkedin, happy to pair up
Love the ways how you demonstrate! so clear and easy to understand! Thanks for sharing @jayzern
thanks so much