How to Use AWS Glue with Snowflake | PySpark-Snowflake Connectivity

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 ส.ค. 2024
  • Step 1:
    -----------
    Download dependent jars--Spark Snowflake Connector, Snowflake Driver
    Upload in s3 location , create Glue Role & Glue Job
    Dependent Jars --drive.google.c...
    Case 1(Read Complete Table):-
    -----------------------------------------------
    from pyspark.sql import SparkSession
    from pyspark import SparkContext
    spark = SparkSession \
    .builder \
    .appName("GlueJobDemo") \
    .getOrCreate()
    def main():
    SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
    snowflake_database="********"
    snowflake_schema="********"
    source_table_name="********"
    snowflake_options = {
    "sfUrl": "********",
    "sfUser": "********",
    "sfPassword": "********",
    "sfDatabase": snowflake_database,
    "sfSchema": snowflake_schema,
    "sfWarehouse": "COMPUTE_WH"
    }
    df = spark.read \
    .format(SNOWFLAKE_SOURCE_NAME) \
    .options(**snowflake_options) \
    .option("dbtable",snowflake_database+"."+snowflake_schema+"."+source_table_name) \
    .load()
    df1=df.groupBy("department").sum("salary");
    df1.write.format("snowflake") \
    .options(**snowflake_options) \
    .option("dbtable", "********").mode("overwrite") \
    .save()
    main()
    Case 2(Read the result-set of a Query):-
    ---------------------------------------------------------------
    from pyspark.sql import SparkSession
    from pyspark import SparkContext
    spark = SparkSession \
    .builder \
    .appName("Oracle_to_snowflake_via_S3") \
    .getOrCreate()
    def main():
    SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
    snowflake_database="********"
    snowflake_schema="********"
    source_table_name="********"
    snowflake_options = {
    "sfUrl": "********",
    "sfUser": "********",
    "sfPassword": "********",
    "sfDatabase": snowflake_database,
    "sfSchema": snowflake_schema,
    "sfWarehouse": "COMPUTE_WH"
    }
    df = spark.read \
    .format(SNOWFLAKE_SOURCE_NAME) \
    .options(**snowflake_options) \
    .option("query","********") \
    .load()
    df.coalesce(1).write.option("mode","overwrite").option("header","true").csv("********")
    main()
    Check this playlist for more AWS Projects in Big Data domain:
    • Demystifying Data Engi...

ความคิดเห็น • 58

  • @nayanroy13
    @nayanroy13 2 ปีที่แล้ว +4

    Exactly what I was looking for. Crisp, clear and to the point!

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  2 ปีที่แล้ว

      Thank You IamDocxy😊Happy Learning :-)

    • @manishvishvkarma8030
      @manishvishvkarma8030 2 ปีที่แล้ว +1

      @@KnowledgeAmplifier1 hii sir can u pls create one vedio on glue job like it will read data from s3 and load it into snowflake table

  • @sreejithsurendran6632
    @sreejithsurendran6632 2 ปีที่แล้ว +3

    Thanks a lot bro..lot of use cases for snowflake and aws learners…

  • @kunnunhs1
    @kunnunhs1 2 ปีที่แล้ว +1

    daarun very good explanation.. one video full clarity

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  2 ปีที่แล้ว

      অনুপ্রেরণামূলক মন্তব্যের জন্য আপনাকে ধন্যবাদ Desi Bhasa Main😊হ্যাপি লার্নিং✌

  • @yadi4diamond
    @yadi4diamond 8 หลายเดือนก่อน +1

    You are simply awesome, Thank you for the knowledge share!!

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  8 หลายเดือนก่อน

      Thank you for your kind words , Yadi! Happy Learning

  • @keshavamugulursrinivasiyen5502
    @keshavamugulursrinivasiyen5502 2 ปีที่แล้ว +2

    Very well presented and nice job

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  2 ปีที่แล้ว

      Thank You Keshava Mugulur Srinivas Iyengar! Happy Learning :-)

  • @praveenyadam2617
    @praveenyadam2617 2 ปีที่แล้ว +1

    You are a wonder and this is what I was looking for...thanks much

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  2 ปีที่แล้ว

      Glad to know the video was helpful to you praveen yadam! Happy Learning :-)

  • @user-hk4sv1wx6z
    @user-hk4sv1wx6z ปีที่แล้ว

    sir thank you for this video this video helped me a lot ,your explaination is awesome, please keep doing this we will definitely support you sir

  • @user-hk4sv1wx6z
    @user-hk4sv1wx6z ปีที่แล้ว +1

    crystal clear explanation thank you bro

  • @MahendraSingh-sw1th
    @MahendraSingh-sw1th 2 ปีที่แล้ว +1

    That was awesome ! Precise !

  • @yamunau.yamuna5189
    @yamunau.yamuna5189 ปีที่แล้ว +1

    Thanks a lot Bro your video is awesome

  • @user-ft5ow9mb5z
    @user-ft5ow9mb5z ปีที่แล้ว +1

    Nice video. please share same for EMR without airflow.

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  ปีที่แล้ว

      Hello Ali Mir faisal, you can refer this video -- th-cam.com/video/oJ6TvZu6DqQ/w-d-xo.html Happy Learning

  • @puremjlee
    @puremjlee 2 ปีที่แล้ว +2

    4:10 says the glue job is executed by lambda but there was no lambda setting in the video. do we need to use lambda to call glue job?

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  2 ปีที่แล้ว

      Hello MJ Lee, I was explaining that we can trigger the glue job from Lambda based on certain event occurrence if required , if you want to run Glue Job from Lambda trigger , then you can check this video --
      th-cam.com/video/1tIM1jBmwD4/w-d-xo.html
      Hope this will be helpful! Happy Learning :-)

  • @amitprasad6982
    @amitprasad6982 8 หลายเดือนก่อน +1

    sirji in initial architecture you said glue will read data from s3 and apply some transformation and write it to snowflake , but later in the video you pulled data from snowflake and write back to snowflake and s3 .

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  8 หลายเดือนก่อน +1

      Hello Amit Prasad, at 4:15-4:37 , I have mentioned that in this video the focus is integration between AWS Glue (or PySpark) & Snowflake as s3 to lambda and then lambda to glue part already covered in separate video , as the primary focus of this video is Glue & Snowflake , so I explained the possible scenarios around this -- pulling data from snowflake and write back to snowflake & pulling data from snowflake and write to s3. If you want to explore s3 to lambda and then lambda to glue, then you can refer this video--th-cam.com/video/1tIM1jBmwD4/w-d-xo.htmlsi=dYoD7GHeG3hhWAei Hope this answers your doubt , if you have any doubt , please feel free to comment , will try to help as much as possible

  • @MrRajat769
    @MrRajat769 ปีที่แล้ว

    Plz make video to answer what u r doing in snowflake.....

  • @vaibhavverma1340
    @vaibhavverma1340 ปีที่แล้ว

    Hello bhaiya, I am getting errror following each step still getting error .. "py4j.protocol.py4jjavaerror: an error occurred while calling o90.load snowflake" ??? please help me out

  • @adithyabulusu8812
    @adithyabulusu8812 2 ปีที่แล้ว +1

    Thanks a lot bro. Can you also please share the video to load the data from S3 to snowflake by using lambda and glue

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  2 ปีที่แล้ว

      Hello Adithya , here I have explained AWS Glue and Snowflake integration and in the below video I have explained s3 , Lambda , Glue integration , you can club these together & customize as per your requirements --
      th-cam.com/video/1tIM1jBmwD4/w-d-xo.html
      Happy Learning :-)

  • @vikinist
    @vikinist 2 ปีที่แล้ว +1

    can you share the video link to s3 and Lamdbda trigger

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  2 ปีที่แล้ว +1

      Hello Vikram , if you want to trigger AWS Glue Job whenever some file lands in s3 (s3 to Lambda and then Lambda to AWS Glue Job) , you can refer this video --
      th-cam.com/video/1tIM1jBmwD4/w-d-xo.html
      Hope this will be helpful! Happy Learning :-)

  • @codewithsharath5988
    @codewithsharath5988 2 ปีที่แล้ว

    Awesome. can you make video on how to connect redshift using pyspark in similar way ?

  • @madhubhardwaj4512
    @madhubhardwaj4512 ปีที่แล้ว

    How can we find the compatible version for the jar files with the current spark version? Please reply.

  • @sumeetsawant3398
    @sumeetsawant3398 11 หลายเดือนก่อน

    Hi How do i do this for EMR on EKS . How do I add the jar files in that case ?

  • @ravikreddy7470
    @ravikreddy7470 ปีที่แล้ว

    Can you post one video with S3 -> Glue -> RS pipeline (not using pyspark)

  • @AY1986R
    @AY1986R ปีที่แล้ว

    Thank you very much for this video
    Please could you do an exemple with Oracle and Python ?

  • @krishnasanagavarapu4858
    @krishnasanagavarapu4858 ปีที่แล้ว

    can we create reverse integration, i.e. to fetch huge data (80 million rows) from snowflake to S3 without using stage. We have only "read only access to snowflake ?

  • @yogeshbharadwaj6200
    @yogeshbharadwaj6200 2 ปีที่แล้ว

    Tks a lot brother....very helpful...very good easy, clear explanation.... If I have a need to join 2 tables, can I specify table names as comma separated in "source_table_name" and perform the join in ".option("query","********")", pls help to suggest. Thanks.

  • @ketank344
    @ketank344 2 ปีที่แล้ว

    hello, i am getting connection refused error. any idea what could be the reason

  • @swarnadeepchowdhury563
    @swarnadeepchowdhury563 ปีที่แล้ว +1

    Is aws glue mandatory for running spark jobs on Snowflake?

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  ปีที่แล้ว

      Hello Swarnadeep Chowdhury, no it's not mandetory , you can use other services where spark can run like emr etc. Here is a reference video -- th-cam.com/video/oJ6TvZu6DqQ/w-d-xo.html
      Happy Learning

  • @laterlname7865
    @laterlname7865 2 ปีที่แล้ว +1

    Is it mandatory to have Spark to connect to Snowflake? Can’t we directly access data in Snowflake tables using SQL in AWS Glue’s python program? The reason I am asking this question is Spark is a big data analytics tool and not every application is meant for data analytics. Most business applications are Insert, Update, Select, Delete type SQL based programs. So can I embed these SQLs in AWS Glue’s Python scripts without using Spark in the code?

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  2 ปีที่แล้ว

      Hello Later Lname, you asked a very good question , I created this separate video to give the answer of your question --
      th-cam.com/video/OJM2IkcIW_o/w-d-xo.html
      Hope this will be helpful! Happy Learning :-)

  • @kittu1010
    @kittu1010 2 ปีที่แล้ว

    Hello Sir
    I am trying to perform many spark operations once i read the table ( just not group by ) . I used the same jars but i am getting the following error - "An error occurred while calling o94.load. scala/Product$class" . Do u know using which jar will solve this issue . thanks in advance.

  • @bishnupriyamukherjee4746
    @bishnupriyamukherjee4746 2 ปีที่แล้ว +2

    👌👌👌👌👌👌👌👌

  • @pachappagarimohanvamsi4641
    @pachappagarimohanvamsi4641 ปีที่แล้ว +1

    Hello .. this approach is not so useful it seems .. here we are processing the snowflake table and processing in spark and storing the data in snowflake again if I am right.. for we can use snowflake itself.. aws glue is extra cost 😅

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  ปีที่แล้ว +1

      Hello PACHAPPAGARI MOHAN VAMSI, yes your are right that this transformations can be done using compute power of Snowflake only , actually , this video fundamentally explains how to integrate Snowflake with Spark in AWS Glue platform , and to explain that I took a dummy transformation , the concept can be used for any other workloads which is not possible by snowflake only , for example , if the data is available in mysql rds (source) , then we can use spark to read the data from mysql and then write in snowflake(destination) , in that case , if we want to use AWS Glue as execution env, this video concepts can be useful for someone in that case ...

    • @pachappagarimohanvamsi4641
      @pachappagarimohanvamsi4641 ปีที่แล้ว +1

      @@KnowledgeAmplifier1 👍

  • @vikinist
    @vikinist 2 ปีที่แล้ว +1

    one doubt can you please answer when we will go for snowpipe and when we can go for glue ?

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  2 ปีที่แล้ว +1

      Hello Vikram , snowpipe is used for real-time data ingestion from datalake to snowflake using SQS or SNS kind off services .... AWS Glue you can use for any batch processing purpose , batch ingestion or for transforming your data , you can use AWS Glue / EMR

    • @vikinist
      @vikinist 2 ปีที่แล้ว +1

      @@KnowledgeAmplifier1 thanks for the quick reply

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  2 ปีที่แล้ว

      @@vikinist no problem .. Happy Learning

  • @anhdo7704
    @anhdo7704 ปีที่แล้ว

    may I ask what exactly is the username for snowflake this time? because I don't know where to find the user name

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1  ปีที่แล้ว

      Hello Anh Do, username is what you use to login in the Snowflake Web console , you might have setup while sign up or your admin team can confirm on this , if using OAuth , then , mostly there will be a dedicated user to connect with Python , PySpark etc. the admin team in your project can confirm on the same ...

  • @krishnashukla3638
    @krishnashukla3638 2 ปีที่แล้ว

    Hi Friend, How can I read data from RDS and ingest the same to snowflake using glue. Do you have any example for that, It will be really helpful for me. Thanks.

    • @rajeevranjan5913
      @rajeevranjan5913 ปีที่แล้ว

      Hi ,
      I am having exactly same requirement. Could you please help with the process if you have achieved the same.

  • @krishnasanagavarapu4858
    @krishnasanagavarapu4858 2 ปีที่แล้ว +2