You should add all the pandas functions. I've just tested Databrew and I can't convert nanoseconds into the DateTime format. In pandas it's so simple, with DataBrew I can't... It's not saving me any time.
@@edwinthatsnotmyname3670 Agree, the tool supports JSON (+ nested JSON) today! Also, there is a direct S3 connector. You can also upload a file from your local disk, if you'd like. Take a look at the 3rd screenshot here: aws.amazon.com/blogs/aws/announcing-aws-glue-databrew-a-visual-data-preparation-tool-that-helps-you-clean-and-normalize-data-faster/
Just in case this helps: 1. Create a Glue crawler to run on your unstructured (e.g. JSON) data (if the structure is complex, like highly nested documents, you can Grok patterns or manually do it in Athena (for example: create table... column struct
Power Query to the rescue! Lmao great work AWS ;)
how can we configure char-set for input file? It always garbles :(
exactly what I needed to know. THanks
You should add all the pandas functions. I've just tested Databrew and I can't convert nanoseconds into the DateTime format. In pandas it's so simple, with DataBrew I can't... It's not saving me any time.
Amazing tool.
What it would be great is to be able to export the steps as Python code
Check out Dropbase. It lets you export data processing steps as Python code
@@jimmyechan I known there's tools that do it. I was just providing some tools to AWS for improvement
What about unstructured data?
What kinds of unstructured data are you thinking about?
@@SurbhiDangi xml or json from scraping web sites etc. It appears every dataset has to be a table in glue.
@@edwinthatsnotmyname3670 Agree, the tool supports JSON (+ nested JSON) today! Also, there is a direct S3 connector. You can also upload a file from your local disk, if you'd like. Take a look at the 3rd screenshot here: aws.amazon.com/blogs/aws/announcing-aws-glue-databrew-a-visual-data-preparation-tool-that-helps-you-clean-and-normalize-data-faster/
Just in case this helps: 1. Create a Glue crawler to run on your unstructured (e.g. JSON) data (if the structure is complex, like highly nested documents, you can Grok patterns or manually do it in Athena (for example: create table... column struct