Ajay, I need some help please! I've been trying all day and I can't. I cataloged a parquet file that was saved after handling a Job with Spark. I'm doing another job to insert data from the parquet into an RDS MYSQL database, I need the data to be inserted in the same order as the parquet to ensure the primary keys, I've tried several ways, but the data is always inserted in a random way in the database table, can you tell me what I can do?
Nice Tutorial Ajay. I have one question. I have a requirement to copy data which is of 4 million records in Dynamo (2017 version) to another table (2019 version). I don't want downtime. Can you please suggest will glue help me in this usecase? If yes then what things i have to consider.
Awesome video!! I have a query: - I wanted to push s3 data(csv) to redshift tables. Can I anyhow use table schema created by crawler to create table in redshift? In every tutorial instructor 1st hand creates a table in redshift, then uses crawler again to create schema in glue then pushes the data to reshift...then what is the use of creating schema using crawler?
Hey Kishlaya, You have to try this. Just search if Glue Data Catalog can be used directly in Redshift. I am aware that Redshift Spectrum can directly use the schema created by crawlers
Hello Ajay, I saw your LinkedIn post of aws data analytics certification Pls explain us the detailed learning path that yoy have taken to pass pls make a video on this out would be very helpful to anyone looking to pursue that exam
I have upoaded a csv into S3 bucket .Crawler is creating the Data Catalog in Glue but when Im trying to view the content of the csv file in Athena using a query, its showing blank,but the columns are present without the values
yes same with me if we do with single csv file then we can se the data but when we crawl multiple files from same folder it is showing blank pls help me out if you get the solution
Hi, Nice explanation on AWS Glue Crawlers, which was very much helpful... Thanks for that. I have some queries about GLue crawler and Athena First try : In my S3 bucket I have put two different files one is Stock table and other is employee table and run glue crawler. Two different tables are generated but with empty data. Is it correct ? Second try : In my S3 bucket I have put two different files one is Stock table and other is employee table and run crawler with Exclude patterns and mentioned employee.csv after that also single table is generated but data is merged from both table. Is it correct ? or I have done some thing wrong. Please let me know.
Hi Saurabh, You have to segregate the data to two different folders. If data is not returning from query, better check if schema is matching in Glue Catalog
Nice explanation on AWS Glue Crawlers, which was very much helpful... Thanks for that. If any in between column is get deleted in newest file the the earlier file & the schema is modified by crawler, then in the earlier files the deleted column is available but the data got shifted ( as I can see the data is disturbed). So is there any configuration in crawler to validate the column names in any files available in S3 location.
Hi Ajay, Is there any way to automate through CI/CD, like I wanted to upload bunch of crawlers files and trigger them automatically and then store inferred schema in local file system. Thanks in advance.
Hi Surya, You can CI/CD to automate things. Also consider Scheduled Lambda function. You can upload your files the either trigger or schedule the processing. Hope this helps!!
Hello Ajay, Your videos was very helpfull. Can we get similar videos for AWS LAMBDA. Is it possible for you to put all your videos relates to AWS(S3, ATHENA, GLUE, KINESIS, LAMBDA, EMR) in UDEMY so that we can buy it for you. Please share your thoughts on this.
Hi Ajay, I uploaded the CSV file to the S3 bucket and created a crawler, I see the table in the database but trying preview data I don't see the data in the table, could you please let me know why I don't see the data?
No, there is a list that you can find on aws Docs website. Common types such as Csv, tsv, DBs, logs, json, parquet are supported. You can write custom crawler also but that would not cover images. Try using AWS Rekognition
Arguably one of the best videos on Glue crawler - thanks a lot!
This means a lot to me😍
Don’t forget to subscribe, this pushes me to create more such content 🚀
Best video i have seen on Glue Crawler. Thanks for your efforts.
I am glad you liked the video. Don’t forget to subscribe.✅
Very helpful for me . Thank you
Great video Ajay . It's very clear and loud.
Best video about crawlers
This is extremely helpful. Thanks
Thanks Aditya
Very informative and useful..🙏
Thanks for the overview, this was really good.
Nice explanation!
very helpful video, thanks.
Don’t forget to subscribe ✅
Thank you so much. It's helpful...N
Wow.. that was really helpful video! Thanks!!
Very nice explanation. Thanks !!
Excellent presentation !!!
Really good tutorial, keep up the good work Ajay!
Thanks Namaryop😊
Don’t forget to subscribe 🚀
Great video!
Ajay, I need some help please! I've been trying all day and I can't. I cataloged a parquet file that was saved after handling a Job with Spark. I'm doing another job to insert data from the parquet into an RDS MYSQL database, I need the data to be inserted in the same order as the parquet to ensure the primary keys, I've tried several ways, but the data is always inserted in a random way in the database table, can you tell me what I can do?
Great tutorial
Glad you liked it✅
Thanks for the detailed information bro ❣️
Always welcome. Do check my latest videos. Pretty exciting videos coming soon.
Good and informative video for beginners.. the speed of content is very proper.. Thank you very much
Ajay, do you take online sessions on AWS Data Engineering? Pls let me know.
Hi Ganesh, sorry I don’t take any online classes.
Great video perfect explanation loved it! Keep it ip bro
Glad you liked it!
Thanks!
Helpful video ..😀
Very informative video. Where is the next part of this video?
I am glad you fund this informative.🙌🏻
Please check next video in my channel..
Don’t forget to subscribe the channel ✅✅
Well done
Nice Tutorial Ajay. I have one question. I have a requirement to copy data which is of 4 million records in Dynamo (2017 version) to another table (2019 version). I don't want downtime. Can you please suggest will glue help me in this usecase? If yes then what things i have to consider.
Did you check DMS service.?
You get the option of Change Data Capture also.
Hi Ajay - thanks for sharing this. Could you please all links related to Glue ( end to end hands-on flow ).
Sir how will we perform etl operations ? I think pyspark is something that we use .... But do we have any other option then python?
Awesome video!!
I have a query: -
I wanted to push s3 data(csv) to redshift tables.
Can I anyhow use table schema created by crawler to create table in redshift?
In every tutorial instructor 1st hand creates a table in redshift, then uses crawler again to create schema in glue then pushes the data to reshift...then what is the use of creating schema using crawler?
Hey Kishlaya,
You have to try this. Just search if Glue Data Catalog can be used directly in Redshift.
I am aware that Redshift Spectrum can directly use the schema created by crawlers
Hello Ajay, I saw your LinkedIn post of aws data analytics certification
Pls explain us the detailed learning path that yoy have taken to pass pls make a video on this out would be very helpful to anyone looking to pursue that exam
I will upload a video on Sunday. Stay tuned for that. Subscribe ✅✅
@@AjayWadhara okay Ajay
@@AjayWadhara any chances of coming about the video
Coming in next 3 hours
@@AjayWadhara got it
Hi Ajay, is there a way to automate data catalog import into Redshif Spectrum from AWS Glue ???
I have upoaded a csv into S3 bucket .Crawler is creating the Data Catalog in Glue but when Im trying to view the content of the csv file in Athena using a query, its showing blank,but the columns are present without the values
yes same with me if we do with single csv file then we can se the data but when we crawl multiple files from same folder it is showing blank pls help me out if you get the solution
Can you do a video to add table from existing schema?
Hi,
Nice explanation on AWS Glue Crawlers, which was very much helpful... Thanks for that.
I have some queries about GLue crawler and Athena
First try : In my S3 bucket I have put two different files one is Stock table and other is employee table and run glue crawler. Two different tables are generated but with empty data. Is it correct ?
Second try : In my S3 bucket I have put two different files one is Stock table and other is employee table and run crawler with Exclude patterns and mentioned employee.csv after that also single table is generated but data is merged from both table. Is it correct ?
or I have done some thing wrong. Please let me know.
Hi Saurabh,
You have to segregate the data to two different folders.
If data is not returning from query, better check if schema is matching in Glue Catalog
Nice explanation on AWS Glue Crawlers, which was very much helpful... Thanks for that.
If any in between column is get deleted in newest file the the earlier file & the schema is modified by crawler, then in the earlier files the deleted column is available but the data got shifted ( as I can see the data is disturbed). So is there any configuration in crawler to validate the column names in any files available in S3 location.
Thanks for adding that jineshwar. I did not explain that in this video.✅✅
@@AjayWadhara Is there any configuration to achieve such scenarios
Hi Ajay,
Is there any way to automate through CI/CD, like I wanted to upload bunch of crawlers files and trigger them automatically and then store inferred schema in local file system.
Thanks in advance.
Hi Surya,
You can CI/CD to automate things. Also consider Scheduled Lambda function. You can upload your files the either trigger or schedule the processing.
Hope this helps!!
Hello Ajay, Your videos was very helpfull. Can we get similar videos for AWS LAMBDA. Is it possible for you to put all your videos relates to AWS(S3, ATHENA, GLUE, KINESIS, LAMBDA, EMR) in UDEMY so that we can buy it for you. Please share your thoughts on this.
I am starting with Lambda series soon..First video coming today...Stay tuned..
Don't forget to share and subscribe✅
I am working on making one Udemy course..But it will take some time.
@@AjayWadhara I already subscribed your series and shared with friends also.
@@AjayWadhara Please have some working session also in udemy course for practise so that all our friends can buy your course.
How will you load Postgres partition table to data lake ?
How to create grok classifier for fixed width files?
how about crawling in another account S3, can you show that too
can you do more advanced examples, please?
Hi Ajay, I uploaded the CSV file to the S3 bucket and created a crawler, I see the table in the database but trying preview data I don't see the data in the table, could you please let me know why I don't see the data?
Are you querying through Athena console?
@@AjayWadhara yes
Check my Athena video, you will definitely get help from that
do crawler work on images ? because i tried but didnt get any data in data catalog
No, there is a list that you can find on aws Docs website. Common types such as Csv, tsv, DBs, logs, json, parquet are supported.
You can write custom crawler also but that would not cover images.
Try using AWS Rekognition
@@AjayWadhara okk thanks.. So even custom classifier wont support images right.
Wonderland mm
everything was good except that horror sound in background..
😂 I will improve my bg music choice
Why everyone is teaching in English can anyone tell me hindi channel for this??