AWS Glue Job Import Libraries Explained (And Why We Need Them)

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 ส.ค. 2024
  • This video explains the 6 import statements in a boilerplate glue script to help data engineers understand why we need them and what they do.
    #aws #awsglue #pyspark

ความคิดเห็น • 30

  • @mohammedgt8102
    @mohammedgt8102 2 ปีที่แล้ว +9

    Perfect and straight to the point. I got in 5 min what I couldn't get in an hour.

    • @DataEngUncomplicated
      @DataEngUncomplicated  2 ปีที่แล้ว +1

      Thanks Mohammad, That's the style of videos I go for on my channel. I try to make my videos as short as concise as possible.

  • @BeABetterDev
    @BeABetterDev 2 ปีที่แล้ว +1

    Short and sweet. Thanks.

  • @sukulmahadik0303
    @sukulmahadik0303 ปีที่แล้ว

    Cool explanation. I had never paid attention to these boiler plate statements

  • @danielchicaiza7698
    @danielchicaiza7698 6 หลายเดือนก่อน +1

    Liked, suscribed and commented!
    Thank you very much for your help!
    Greetings from Colombia!

  • @mickyman753
    @mickyman753 4 หลายเดือนก่อน

    Just found your channel. can we have a complete playlist , a type of course or a oneshot video/videos, your explain in depth and I found your videos better than the other tutorials on youtube

    • @DataEngUncomplicated
      @DataEngUncomplicated  4 หลายเดือนก่อน

      Thanks! Check out my playlists I have various ones for each AWS service I have made videos for. It sounds like that's what you are looking for.

  • @nikhilgupta110
    @nikhilgupta110 2 ปีที่แล้ว +2

    Loved this video. Just a question, isn't it import * a bad coding practice?
    If you have already created video on practical implementation of those 24 classes then please share link, if not, I request you to make a video on that. "Took the one less traveled by, And that has made all the difference" .

    • @DataEngUncomplicated
      @DataEngUncomplicated  2 ปีที่แล้ว +2

      Hi Nikhil! thanks for the comment and feedback! Honestly, I wasn't sure if people would find this video interesting or not....These are the boilerplate statements that aws glue provides when you create it from scratch. I guess you can even remove some or modify the statements if you want to keep it more focused or don't need them.
      I have no videos on the 24 classes already but I'm happy to hear that you think there is value in creating videos on these... I will add it to my video backlog list.

  • @abdullahkheruwala9910
    @abdullahkheruwala9910 6 หลายเดือนก่อน

    I have files in an s3 bucket whose type is gz. The gz file consists of json records (each line is a record in json format). How can I read such file using glue dynamic frame?

    • @DataEngUncomplicated
      @DataEngUncomplicated  6 หลายเดือนก่อน

      If you use the data catalog crawler on this folder, it should add the dataset to the glue catalog, you can then read and write to the dynamic from aws glue. Check out my other videos where I walk through how to do this with other formats

  • @Scott-s7f
    @Scott-s7f หลายเดือนก่อน

    nice video! what's the point of using jobs in notebooks since bookmarks aren't supported there? is there another benefit?

    • @DataEngUncomplicated
      @DataEngUncomplicated  หลายเดือนก่อน

      Thanks, the notebook is was just a way for me talk through the content. I would say the benefit of using a notebook is to make the development experience better as you can get feedback after every function you run instead of having to trigger the entire job.

    • @Scott-s7f
      @Scott-s7f หลายเดือนก่อน

      @@DataEngUncomplicated oh thanks but I meant what is the use of the Job import and doing job init and commit in a notebook since bookmarks aren't supported?

  • @sanchitgarg5275
    @sanchitgarg5275 ปีที่แล้ว

    Nice Video! I am struggling to find a way how I can set the script location path in the jupyter notebbok. I can see there is no magic command to do that and aws does not allow to make any changes manually under the tab "job details". Can u help me if there is any way?

  • @AbhishekChauhan-kv7ds
    @AbhishekChauhan-kv7ds 6 หลายเดือนก่อน

    i'm new to aws and i'm working on a project but i'm unable to it. I'm getting Unresolved reference 'awsglue'
    Can you help me with this?

  • @saksheegoel2654
    @saksheegoel2654 ปีที่แล้ว

    Can we not create functions (def fn() ) is streaming glue jobs??

    • @DataEngUncomplicated
      @DataEngUncomplicated  ปีที่แล้ว

      Hi Sakshee, I haven't worked with streaming jobs yet but I don't see why we wouldn't able to create functions in streaming glue jobs.

  • @AmritAgarwal07
    @AmritAgarwal07 ปีที่แล้ว

    Can be update the data in database using glue jobs

    • @DataEngUncomplicated
      @DataEngUncomplicated  ปีที่แล้ว

      I think you are trying to ask if we can update data in database with aws glue? Yes absolutely. It's one of the main use cases

  • @MuhammadImran-lr5tn
    @MuhammadImran-lr5tn ปีที่แล้ว

    Hello sir i am facing no module named awsglue.context when i wrote the above imports in aws glue python shell. can you please help. thank you

    • @DataEngUncomplicated
      @DataEngUncomplicated  ปีที่แล้ว

      Hi Muhammad, the python shell doesn't come with pyspark, you need to create a job that leverages the spark script instead of python shell

    • @MuhammadImran-lr5tn
      @MuhammadImran-lr5tn ปีที่แล้ว

      @@DataEngUncomplicated Thank you for your reply. Can you please elaborate step by step procedure what i should do in order to execute awsglue.context library in aws glue job python shell.

    • @DataEngUncomplicated
      @DataEngUncomplicated  ปีที่แล้ว

      What are you trying to do exactly in your script? If you need to use spark than you shouldn't be configuring a python she'll script. Select the pyspark script option instead.

    • @MuhammadImran-lr5tn
      @MuhammadImran-lr5tn ปีที่แล้ว

      @@DataEngUncomplicated Thank you so much for your quick reply. I understand now what I was doing wrong now because of your guidance again thank you. The only point I want to get clarification on is that please elaborate is awsglue library is something that is used in pyspark context and it is related to pyspark not related to simple python shell am i right ?

    • @DataEngUncomplicated
      @DataEngUncomplicated  ปีที่แล้ว

      @@MuhammadImran-lr5tn You're welcome! Yes, that's my understanding. You don't need that library for creating a python shell job.

  • @Fight3211
    @Fight3211 ปีที่แล้ว

    Hi I have a question about the interaction between creating a "normal" spark session and glue, I needed to import a JAR and I got it working with
    spark = SparkSession.builder\
    .appName("my-app") \
    .config('spark.jars.packages', 'graphframes:graphframes-0.8.2-spark3.2-s_2.12')\
    .getOrCreate()
    I commented out
    sc = SparkContext()
    glueContext = GlueContext(sc)
    spark = glueContext.spark_session
    So two things Im missing out is dynamic frames and save job states, how do I modify the original arguments so that I can bring gluecontext back in? Thank you