Hi Folks - The much requested update to this video with the new AWS Console UI for AWS Glue is now available on the channel with a new GitHub repo containing everything you need to follow along. th-cam.com/video/ZvJSaioPYyo/w-d-xo.html.
in 41 minutes, I never knew I can gain this much knowledge and at least be ready with AWS glue as we are in a transformation project. It really helped, thank you so much!
Troubleshooting tip for especially complex environments: If the user has access to ALL S3 locations in the table, then Glue will assemble the table for Athena to query. If even one S3 location can't be accessed by the user, then the table won't show up in Athena. Hope this helps complex schema lovers.
Great intro to AWS Glue for someone who's never seen it before. Needed to know these basics as we're moving to Glue for our data transformation/warehousing and this video does just that! Thanks!
At 28:00 (AWS Glue Jobs section) current screens are much different than video. Figure all the same information has to be entered, but order and screen flips are totally different.
I am enjoying the video, thanks for this resource! I wanted to note that it seems you blurred your S3 buckets early in the video, but at 19:22 and also 21:33 when configuring glue, you do not blur your other S3 buckets. Also 24:49.
As a skilled Ops Engineer / Developer i have search for infomartion about whats Glue and how its working in the basic and thanks so much for your video, its late me understanding the basic about what Glue are and whats it can be used for, :)
Fantastic video: I have two questions 1. Can you create a partition key when importing data with a crawler? 2. The UI seems to have changed. The 'ETL job' has changed. Can you publish a refresher on that part?
Towards the end of the course, I lost my way after we created the parquet file because my version is different from yours and I couldn't just get the visuals on my dashboard. Thank you so much and I will give this a try again some other time.
This was a great tutorial! I really learned a lot about AWS Glue and I plan to leverage this knowledge to be more effective in my job tasks. Thanks so much, Johnny!!
Thank you and well done. Was able to do the course still in June of 2023, though the ETL chapter was a bit of a challenge as AWS has completely redone those screens.
Thank you for your work! Could you please help few little questions: 1) i saw, that while you it said that "customer_csv" table is partitioned, but "customer_parquet" is not (there is not "partitioned" mark nearby). So how should i make that files partitioned? Because it seemed, that you did the same thing for csv ant parquet, but get different result. Thank you in advance! 2) What would be happened, when you add another csv data for different day? how job will work? i didn't get, how jobs determine, that previous day already transformed to parquet, and newest day - no. And what if . As i understand - in your particular example from that video, that Glue job will be transforming from csv to parquet all the fyles inside customer_csv folder. But how to make it more determined based on run date? for example: in general i want to transform only previous loaddate files. As i understand, it should be done only in code of job
Great & comprehensive coverage of AWS Glue. I was hoping it did much more. Makes you appreciate how much easier, faster & cheaper it is to develop in Azure DataFactory. Where a single pipeline can consume multiple different CSV files, parse the filename, figure out the metadata on load, & redirect the output based on lookup parameters. All graphical or generated via script.
Hi David, This video did not cover AWS Glue Studio which provides more of a compressive graphical interface that I think would be more comparable to Azure Data Factory.
csv file is loaded from S3 but after creation the crawler, csv file data is not shown into the ATHENA. It shown only csv file headers not shown the whole data.
It's very interesting your approach, I have a question, why does the glue's crawler from converts csv file to parquet format could not create the parquets table with a partition definition, which came from csv file?
Hi Jorge. The crawler will only create partitions when there is a folder. So something like s3://table_name/parition_1/file.csv will result in partition being created. However, crawlers can be a bit temperamental. They are some sort of ML algorithm and do go AWOL at times. For that reason I usually just create the tables through code manually - terraform, cloud formation or even DDL via Athena.. when it comes to real life use cases.
Jesus bless you, my friend. Your explanations are very clear. My desire is that you continue making the difference, creating and publishing awesome contents that help people. Congratulations! Good job!
Could you consider updating this course for 2024 and also adding a section for DataBrew? It would be helpful to see you walk through an example of moving a dataset from S3 bucket into DataBrew/Glue and the output being Athena or QuickSight ready
Niranjan - something I can totally do and feel free to suggest ideas. For now there is an AWS Glue Library called Rationalize that can de-nest for you - aws.amazon.com/blogs/big-data/simplify-querying-nested-json-with-the-aws-glue-relationalize-transform/ There is also a pandas library that does exactly the same thing called json_normalize() and this available in glue out of the box - but don't use pandas in glue on bigger JSON loads.
Great Video great efforts. Thanks a lot for detail explanation. When I ran the crawler for 1st time as per video, it did not create partition column, I again created a new crawler with same details using same s3 folders and now it created a partition. What might the possible reason that it failed to detect at 1st time? Any key point to remember during building or some miss that lands us into such situation?
Hi Folks - The much requested update to this video with the new AWS Console UI for AWS Glue is now available on the channel with a new GitHub repo containing everything you need to follow along. th-cam.com/video/ZvJSaioPYyo/w-d-xo.html.
you were preparing the video when the entire west world was yelling "Happy New Year"! Great commitment and awesome result!
in 41 minutes, I never knew I can gain this much knowledge and at least be ready with AWS glue as we are in a transformation project. It really helped, thank you so much!
If you had watched with 1.25 playback speed, it would take you 33 minutes
Amazing content by far !! Please continue with Glue. Never seen such high quality tutorial :D
Thanks for watching
This the most detailed AWS glue video i have ever seen. Keep up the great work Johnny.
you speak my favorite accent that's such a plus for me. Thank you so much for the quality content! :)
Awesome. I've never met people like you that you're so young with solid data science & aws skills.
Thanks Sung!
For me the most amazing thing is that he was working in the morning of January 1st. My respect! 😅
I don't like subscribing to people but lit I wouldn't be able to learn aws without you. Best and unique content
This is one of the best tutorials i've seen in youtube videos. Very well-explained, very useful. Thank you for uploading this!
Thanks for the comment and supporting the channel.
As someone transitioning from a data scientist to DE role, I found this extraordinarily helpful! Subscribed! Thank you!
I am interested in understanding why did you decide to make the transition.
Awesome! Thank you!
I am not sure, does anyone said this before or not, but i wanted to say, You are a rock start.
Johnny Chivers you rock. Really great intro to Glue.
simple and explain everything ... what we need .... thank you man ... i'm from sri lanka
excellent Glue tutorial .. One of the best I have come across which teaches in a composed pace and easy to pickup for anybody
Thanks for watching!!!
your commitment shows in the quality!
Excelent tutorial. I made three others courses before that and only with this that I understead complete this tool
Fantastic video, going to binge-watch your videos for the next few months!
Thanks for watching.
You are Awesome. watching in 2024... ETL steps needs minor updating but I was still able to follow ! Keep up the great work !
Superb course, covering all features with examples
Thanks for comment.
One of THE BEST videos on AWS Glue. Thank you Johnny :)
Troubleshooting tip for especially complex environments: If the user has access to ALL S3 locations in the table, then Glue will assemble the table for Athena to query. If even one S3 location can't be accessed by the user, then the table won't show up in Athena. Hope this helps complex schema lovers.
Good tip.
So excited to find this channel. Looking forward to watching all your videos!
This tutorial is pure gold. Thanks
I just can't believe it's free here on TH-cam! Great content, very well explained. Thanks!
Thanks for watching the video and a supporting the channel!
Johnny, you ROCK, man!!! 🔥👏🔥👏🔥👏🔥 Thank you so much and please don't stop spreading the wisdom of the GURU!!!! 🙏🙏🙏👍👍👏👏🚀🌟
Such an amazing tutorial! Thanks for getting this together. Looking forward to more content.
Such an amazing video. Gives end to end idea about Glue in just 45 mins. Keep it up.
41 minutes was sufficient for me to get hands-on AWS glue. Thanks, Johhny for this awesome tutorial.
I have spent many hours on Udemy for Glue Job and Glue data Catalog, but after watching your video. I must say Damn Good Stuff, Sir!
Great intro to AWS Glue for someone who's never seen it before. Needed to know these basics as we're moving to Glue for our data transformation/warehousing and this video does just that! Thanks!
Thanks for watching Lyndsay!
One of the best tutorials I've ever seen - let alone on a topic that is not easy.
You are amazing and a natural teacher !!
right when i needed this, always amazing content
Thanks for watching Todd
At 28:00 (AWS Glue Jobs section) current screens are much different than video. Figure all the same information has to be entered, but order and screen flips are totally different.
huge fan from brasil, thank you for your great content, it helped me a loooot
Glad the tutorial took me through the console instead of running code.
I am enjoying the video, thanks for this resource!
I wanted to note that it seems you blurred your S3 buckets early in the video, but at 19:22 and also 21:33 when configuring glue, you do not blur your other S3 buckets.
Also 24:49.
Thank you very much for you great explanation and hands-on! It was very useful for me.
This is one of the best AWS Glue tutorial that I've come across on TH-cam. Totally worth a like and sub❤️
Thanks for the sub!
Great tutorial in 40 mins you covered everything essential in Glue 👏
Excellent work Johnny. Keep continuing the work .
Thanks, will do!
It was very usefull for someone who is starting with AWS from scratch! Thank you :)
This was a really nice video. Lot of learning within a very short time. Thanks
Wonderful and neatly described in such a short video
As a skilled Ops Engineer / Developer i have search for infomartion about whats Glue and how its working in the basic and thanks so much for your video, its late me understanding the basic about what Glue are and whats it can be used for, :)
Great intro on AWS Glue! Thanks!!
fantastic video, very helpful and useful. Looking forward to seeing more. keep up the great work and thank you so much.
Very nice and easy to learn video especially for begineers. Thanks for posting this.
Thank you. This is an awesome introduction to AWS Glue for beginners.
I am using this in 2023, the interface has changed, but this still working, thank you so much for the video.
Very good course, gave me a good view of glue AWS service
Great video for beginners I hope to build some projects for keep learning
this channels a great resource for learning data engineering on aws, it's been a big help. Keep up the good work Johnny!
Good tutorial. I was able to follow and execute. Thanks Johnny!
Amazing Content and style of teaching, Thanks a lot for making these videos.
This was awesome explanation of glue, thanks.
Thank you very much, I learnt a lot from this tutorial.
Fantastic video: I have two questions
1. Can you create a partition key when importing data with a crawler?
2. The UI seems to have changed. The 'ETL job' has changed. Can you publish a refresher on that part?
Finally an aws tutorial that uses "wee" and talks in a NI accent, made me laugh that I finally found a fellow NI person :) followed every word :D
Towards the end of the course, I lost my way after we created the parquet file because my version is different from yours and I couldn't just get the visuals on my dashboard. Thank you so much and I will give this a try again some other time.
Thanks for nice video, it helped me understand glue and setting up the pipeline in aws.
Thankyou so much. Very informative Johnny!
really amazing and fun learning with Johnny :)
excellent video on glue
This was a great tutorial! I really learned a lot about AWS Glue and I plan to leverage this knowledge to be more effective in my job tasks. Thanks so much, Johnny!!
Impressive free content. Thank you!
Thanks for watching
Thoroughly enjoyed, Thanks.
Super helpful.... Thank you so much.
Thanks for watching!
Nicely done and explained. I have done my first AWS Glue job, without any issue... thumbs up bro 👍
Fantastic news Ammar!!
Very helpful for my studies
Thank you for this video, it is very helpful gives a clear glimpse of AWS glue. This is a very important video for me for the interviews :)
Thank you and well done. Was able to do the course still in June of 2023, though the ETL chapter was a bit of a challenge as AWS has completely redone those screens.
Great to hear!
How did you get past the ETL job chapter? The interface is totally different.
You are definitely a hero
Thanks for watching
Thank you for your work!
Could you please help few little questions:
1) i saw, that while you it said that "customer_csv" table is partitioned, but "customer_parquet" is not (there is not "partitioned" mark nearby).
So how should i make that files partitioned? Because it seemed, that you did the same thing for csv ant parquet, but get different result.
Thank you in advance!
2) What would be happened, when you add another csv data for different day? how job will work? i didn't get, how jobs determine, that previous day already transformed to parquet, and newest day - no. And what if . As i understand - in your particular example from that video, that Glue job will be transforming from csv to parquet all the fyles inside customer_csv folder. But how to make it more determined based on run date? for example: in general i want to transform only previous loaddate files. As i understand, it should be done only in code of job
Amazing teacher..Many thanks for this
Thanks for watching.
happy new year
Great & comprehensive coverage of AWS Glue. I was hoping it did much more.
Makes you appreciate how much easier, faster & cheaper it is to develop in Azure DataFactory. Where a single pipeline can consume multiple different CSV files, parse the filename, figure out the metadata on load, & redirect the output based on lookup parameters. All graphical or generated via script.
Hi David, This video did not cover AWS Glue Studio which provides more of a compressive graphical interface that I think would be more comparable to Azure Data Factory.
Excellent content! Hope you keep it coming. I just subbed
Thanks for watching.
csv file is loaded from S3 but after creation the crawler, csv file data is not shown into the ATHENA. It shown only csv file headers not shown the whole data.
thanks mate, u made my day.
Awesome tutorial.
Fantastic! Thank you for putting this together. It helped me a lot.
Very well done! Thanks for sharing this!!
Thank you very much, it was very helpful video to learn end to end AWS Glue. 🙂
awesome video!
Amigo, muito obrigado pelo vídeo... Ficou ótimo! Abraços!
Dear friend, thanks for this video. It's really great, helped a lot... Hugs from Brazil :)
That was fantastic, mate! Thank you very much for sharing your knowledge!
It's very interesting your approach, I have a question, why does the glue's crawler from converts csv file to parquet format could not create the parquets table with a partition definition, which came from csv file?
Hi Jorge. The crawler will only create partitions when there is a folder. So something like s3://table_name/parition_1/file.csv will result in partition being created. However, crawlers can be a bit temperamental. They are some sort of ML algorithm and do go AWOL at times. For that reason I usually just create the tables through code manually - terraform, cloud formation or even DDL via Athena.. when it comes to real life use cases.
Jesus bless you, my friend. Your explanations are very clear. My desire is that you continue making the difference, creating and publishing awesome contents that help people. Congratulations! Good job!
Could you consider updating this course for 2024 and also adding a section for DataBrew? It would be helpful to see you walk through an example of moving a dataset from S3 bucket into DataBrew/Glue and the output being Athena or QuickSight ready
Good training course, thank for all
Thanks for watching.
Highly recommend!
Awesome content. Thank you
Thanks for watching.
Love this tutorial, thanks!
Thanks for watching!
Thanks for the video.
We need videos for real world problems with real world data such as nested json. Can you guide on that?
Niranjan - something I can totally do and feel free to suggest ideas.
For now there is an AWS Glue Library called Rationalize that can de-nest for you - aws.amazon.com/blogs/big-data/simplify-querying-nested-json-with-the-aws-glue-relationalize-transform/
There is also a pandas library that does exactly the same thing called json_normalize() and this available in glue out of the box - but don't use pandas in glue on bigger JSON loads.
Awesome content!! Appreciate the effort. Thank you
Great Video great efforts. Thanks a lot for detail explanation.
When I ran the crawler for 1st time as per video, it did not create partition column, I again created a new crawler with same details using same s3 folders and now it created a partition. What might the possible reason that it failed to detect at 1st time? Any key point to remember during building or some miss that lands us into such situation?
Good video. Thanks for this!
wonderful tutorial, great explanation, thank you! Unfortunatly aws glue job console has changed a lot and I could not finished the tuto :(