How to build AWS Glue ETL with Python shell | Data pipeline | Read data from S3 and load Redshift

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 ก.ย. 2024

ความคิดเห็น • 19

  • @BiInsightsInc
    @BiInsightsInc  2 ปีที่แล้ว +1

    Link to AWS Playlist: th-cam.com/video/cFO2-gs56d8/w-d-xo.html

  • @GiovanniDeCillis
    @GiovanniDeCillis 2 ปีที่แล้ว +1

    This was extremely helpful! I really like that you are able to compress such valuable information in just 8 mins! I think it would be really useful to see how to build an ETL pipeline in a IaC framework. Haven't see many on the web! Thanks!

  • @calvinbutler5517
    @calvinbutler5517 11 หลายเดือนก่อน

    You're a hero for the well explained content and then answering everyone's comments. :)

    • @BiInsightsInc
      @BiInsightsInc  11 หลายเดือนก่อน

      Thanks for the motivation!

  • @satishmajji481
    @satishmajji481 2 ปีที่แล้ว

    Subscribed!!! Thank you so much for the great content!! Can you please make dedicated videos on how to use AWS Glue, Triggers, Lambda functions and Athena for ETL pipeline?

  • @kofio7581
    @kofio7581 ปีที่แล้ว

    Thanks great video! Other examples I have seen used a crawler to write the schema of the redshift table to the data catalog before loading using a Glue Job.
    If I just wanted to do this using only a Visual Glue Job and without a crawler, is it possible?

    • @BiInsightsInc
      @BiInsightsInc  ปีที่แล้ว +1

      I am not exactly sure what are you trying to ask. The crawler, crawls the data and infer schema from it. Do you mean to say you want to infer schema without the crawler?
      Here is a good read on how a crawler infer schema from AWS.
      repost.aws/knowledge-center/glue-crawler-detect-schema

  • @ArniFuentes
    @ArniFuentes 13 วันที่ผ่านมา

    thanks!!! but I have a question, If my data comes from an API, is S3 not necessary?

    • @BiInsightsInc
      @BiInsightsInc  12 วันที่ผ่านมา +1

      It depends on your data size. If you are dealing with small data size then you can trigger the api via lambda and ingest it in your destination service. Otherwise, you need to stage it to s3.

    • @ArniFuentes
      @ArniFuentes 12 วันที่ผ่านมา

      @@BiInsightsInc So if the data size is not that big, do you recommend me to do the ETL with lambda correctly? and what do you think about AWS Step Functions? thanks again for the content

  • @joegenshlea6827
    @joegenshlea6827 ปีที่แล้ว

    Hi - thanks for such concise content! I noticed that you deployed to S3 without debugging locally. Suppose i wanted to test the etl script before deploying it? is there a way to execute the etl.py script on the local host using aws_cli?

    • @BiInsightsInc
      @BiInsightsInc  ปีที่แล้ว +2

      Yes, you can set up a local development environment to test your work prior to deploying it to AWS. I haven’t covered it but here is an article to get you started.
      medium.com/@bezdelev/how-to-test-a-python-aws-lambda-function-locally-with-pycharm-run-configurations-6de8efc4b206

    • @joegenshlea6827
      @joegenshlea6827 ปีที่แล้ว

      @@BiInsightsInc Thank you. In your view, what is the best practice? Using a terminal, lambda console or the procedure in that link you posted? I'm a big time NB

    • @BiInsightsInc
      @BiInsightsInc  ปีที่แล้ว +1

      @@joegenshlea6827 I would go with the CLI, I have used the AWS CLI to test code locally. Along with AWS CLI AWS provides SAM CLI to test the AWS Lambda function locally. The Lambda developer guide also advocates for this approach.

    • @joegenshlea6827
      @joegenshlea6827 ปีที่แล้ว

      @@BiInsightsInc Thank you again! I think I understand. Maybe a video idea for you! I like your videos because you skip all the superfluous nonsense and get right to the meat and potatoes. Good work.

  • @koyalmudi007
    @koyalmudi007 2 ปีที่แล้ว

    Hi, how can we read the credentials from connections or secrets in aws glue python shell, it not working for me

    • @BiInsightsInc
      @BiInsightsInc  2 ปีที่แล้ว

      Hi KK, you can get the Glue embedded connection details in Python with boto3. Hope this helps.
      glue = boto3.client('glue', region_name='us-east-1')
      # get the connection
      response = glue.get_connection(
      Name='name-of-embedded-connection',
      HidePassword=False
      )
      # get specific connection args
      response['Connection']['ConnectionProperties']['USERNAME']
      response['Connection']['ConnectionProperties']['PASSWORD']

  • @PawanKumar-gl4yw
    @PawanKumar-gl4yw ปีที่แล้ว

    Hi, Can we transfer 1Tb data from s3 to Redshift using Glue or Lambda +Glue ?

    • @BiInsightsInc
      @BiInsightsInc  ปีที่แล้ว +1

      Hi Pawan, you can transfer large datasets with AWS Glue. It’s is a distributed platform, uses Spark behind the scene, to process big data. Lambda is for small to medium size datasets.