Snowflake Lambda Data Loader - Example with AWS S3 Trigger

Knowledge Amplifier

มุมมอง 9 090

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 13 ธ.ค. 2024

ความคิดเห็น • 28

@prakashmudliyar4834 4 หลายเดือนก่อน ⁺¹
How to get Req file for Python version 3.10?
@KnowledgeAmplifier1 4 หลายเดือนก่อน
@prakashmudliyar4834, you can refer this link to get requirements file for diff python versions -- github.com/snowflakedb/snowflake-connector-python/tree/main/tested_requirements
@Karthikkumar-d6w 8 หลายเดือนก่อน
Can you do it for dynatrace
@ranadeepbasani8890 3 ปีที่แล้ว ⁺¹
Its very helpful, is there any video which has Glue ETL to snowflake ?
@KnowledgeAmplifier1 3 ปีที่แล้ว
Hello Ranadeep , sorry for the late reply , you can refer this video if you want to work with Spark & Snowflake in AWS Glue:
th-cam.com/video/7c6kcRKDxgQ/w-d-xo.html
And for connecting Snowflake with Python Shell Jobs in AWS Glue , you can refer this --
th-cam.com/video/OJM2IkcIW_o/w-d-xo.html
Hope this will be helpful :-)
@abhishek_grd 2 ปีที่แล้ว ⁺¹
Can you have references or worked on such activitiy?
Worked upon or have some tutorial or code snippet for -
Reading data through RestAPI , flattening the json and then loading into snowflake.
And will be using AWS lambda and AWS secret manager to store the key and password.
This side is new to me with AWS . Any help will really help me
@KnowledgeAmplifier1 2 ปีที่แล้ว
Hello abhishek kumar gupta, yes , have uploaded some videos on the scenario which you explained ... there are 2 videos which you can go through and then you can build the pipeline --
Step 1:
------------
Reading data through RestAPI , for this you will be using AWS lambda and AWS secret manager to store the key and password-- reference video (I used weather api to pull weather data and stored the api key in secret manager , so from this video you will have a clear idea on Rest API Call , Secret Manager integration with AWS Lambda etc)-- th-cam.com/video/xa2D4Hgjd9g/w-d-xo.html&feature=shares
Step 2:
------------
flattening the json and then loading into snowflake-- You can do this using Lambda code and using snowflake connector write in a table ...but we can leverage the power of schema on read of Snowflake and then for flattening also we can use snowflake , you can check this -- th-cam.com/video/ON-PU_buvFU/w-d-xo.html&feature=shares
Hope this will be helpful.. Happy Learning
@prakashmudliyar4834 4 หลายเดือนก่อน
Tried alot but continuously getting this error --> [ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': /lib64/libc.so.6: version `GLIBC_2.28' not found (required by /var/task/cryptography/hazmat/bindings/_rust.abi3.so) Traceback (most recent call last):
@gtrace1910 3 หลายเดือนก่อน
Hey I am getting this same exact error. Were you able to figure it out?
@prakashmudliyar4834 3 หลายเดือนก่อน
@@gtrace1910 No I guess we have to use snowpipe now...this video is outdated
@gtrace1910 3 หลายเดือนก่อน
@@prakashmudliyar4834 other option is REST api
@keshavamugulursrinivasiyen5502 2 ปีที่แล้ว ⁺¹
good one
@KnowledgeAmplifier1 2 ปีที่แล้ว
Thank You Keshava Mugulur Srinivas Iyengar! Happy Learning :-)
@shubhiisingh 3 ปีที่แล้ว ⁺¹
will this work for billions of data (20GB) ? how to handle lambda timeout (15min) and s3 5GB limitation
@KnowledgeAmplifier1 3 ปีที่แล้ว ⁺¹
Hello Shubhi , actually good question ...
Regarding the lambda timeout (15min) --
see copy into command is powerful one , it can load good volume too...but there is lambda's time constraint , for that , you can try the below permutation , combination --
copy into command in parallel fashion -- interworks.com/blog/2020/03/04/zero-to-snowflake-multi-threaded-bulk-loading-with-python/
Second option -- AWS Glue , all you need to do is creating a trigger from Lambda to run Glue , how to connect Glue and Snowflake you can get from here -- th-cam.com/video/7c6kcRKDxgQ/w-d-xo.html
Third Option -- If you want to process very big data based on this kind of Lambda Trigger , you can go with Transient EMR --th-cam.com/video/ETO_FFhzNic/w-d-xo.html
And for s3 GB limitation --
you can apply partitioning and split you data in multiple s3 and try 😊
Hope this will be helpful!
Happy Learning :-)
@shubhiisingh 3 ปีที่แล้ว ⁺¹
@@KnowledgeAmplifier1 thank you so much for responding so fast. Will try this definitely and let you know
@rupampathak8095 3 ปีที่แล้ว
It was very helpful. Thanks alot.
@KnowledgeAmplifier1 3 ปีที่แล้ว
Glad to hear that it was helpful Rupam Pathak! Happy Learning :-)
@venkataramana-yh3th ปีที่แล้ว
@@KnowledgeAmplifier1 hi sir, how are you passing 1.csv and 2.csv from code. Like we need to mention in the event right but you didn't show that part. Can you please share that
@FaizQureshi-m7p ปีที่แล้ว
Hi team, I'm getting this error: {
"errorMessage": "'Records'",
"errorType": "KeyError",
"stackTrace": [
" File \"/var/task/lambda_function.py\", line 33, in lambda_handler
for record in event['Records']:
"
]
}
can you fix it?
@gershomnc8201 3 ปีที่แล้ว
Hi Team,
What is the purpose of below line. Please advise.
s3_file_key = event['Records'][0]['s3']['object']['key'];
@KnowledgeAmplifier1 3 ปีที่แล้ว
Hello Gershom NC , that code basically tell you due to which filae , the event was created or lambda got triggered ...currently in this pipeline , that is not required ..you can remove that
@lb1496 2 ปีที่แล้ว
Hi
I am unable to implement Auto ingestion through Lamdba function.
Below are the errors.
[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named '_cffi_backend'
Traceback (most recent call last):
[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'lambda_function'
Traceback (most recent call last):
I followed the steps as shown above. My python version is 3.10
Below are the steps I followed:
Step1:
import snowflake.connector as sf
def run_query(conn, query):
cursor = conn.cursor()
cursor.execute(query)
cursor.close()
def lambda_handler(event, context):
s3_file_key = event['Records'][0]['s3']['object']['key']
user="aditya4u"
password=""
account=""
database="DEMO"
warehouse="COMPUTE_WH"
schema="PUBLIC"
role="ACCOUNTADMIN"
conn=sf.connect(user=user,password=password,account=account)
statement_1='use warehouse '+warehouse
statement3="use database "+database
statement4="use role "+role
run_query(conn,statement_1)
run_query(conn,statement3)
run_query(conn,statement4)
sql_query = "copy into demo.PUBLIC.HEALTHCARE_CSV from @demo.PUBLIC.snow_simple FILE_FORMAT=(FORMAT_NAME=my_csv_format)"
run_query(conn, sql_query)
Step2:
I saved the file as "lambda_function.py"
Step3: I copied the file into Deployment_zip folder and ziped the files.
Step4: In Lambda s3 bucket I uploaded the zip file.
@FaizQureshi-m7p ปีที่แล้ว
It is only compatible with python version 3.8, see that you install correct version of snowflake connector for your correct python version, also better option will be if you create a layer with those dependencies and code in code tab on aws lamda
@haliltaylan6181 3 ปีที่แล้ว
what is the other guy in the background doing?
@eedrisakinade 2 ปีที่แล้ว ⁺²
Hi
I am unable to implement Auto ingestion through Lamdba function.
Below are the errors.
[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named '_cffi_backend'
Traceback (most recent call last):
[ERROR] Runtime.ImportModuleError: Unable to import module 'lambda_function': No module named 'lambda_function'
Traceback (most recent call last):
I followed the steps as shown above. My python version is 3.10
Below are the steps I followed:
Step1:
import snowflake.connector as sf
def run_query(conn, query):
cursor = conn.cursor()
cursor.execute(query)
cursor.close()
def lambda_handler(event, context):
s3_file_key = event['Records'][0]['s3']['object']['key']
user="aditya4u"
password=""
account=""
database="DEMO"
warehouse="COMPUTE_WH"
schema="PUBLIC"
role="ACCOUNTADMIN"
conn=sf.connect(user=user,password=password,account=account)
statement_1='use warehouse '+warehouse
statement3="use database "+database
statement4="use role "+role
run_query(conn,statement_1)
run_query(conn,statement3)
run_query(conn,statement4)
sql_query = "copy into demo.PUBLIC.HEALTHCARE_CSV from @demo.PUBLIC.snow_simple FILE_FORMAT=(FORMAT_NAME=my_csv_format)"
run_query(conn, sql_query)
Step2:
I saved the file as "lambda_function.py"
Step3: I copied the file into Deployment_zip folder and ziped the files.
Step4: In Lambda s3 bucket I uploaded the zip file.
@KnowledgeAmplifier1 2 ปีที่แล้ว ⁺¹
Hello eedris, different modules of Python are specific to some specific python version , so if you use same versions of the snowflake and other dependent packages mentioned in the video for some other python version env , then the code will not work ,that might be the reason of your error ..
@FaizQureshi-m7p ปีที่แล้ว
It is only compatible with python version 3.8, see that you install correct version of snowflake connector for your correct python version, also better option will be if you create a layer with those dependencies and code in code tab on aws lamda

ต่อไป

เล่นอัตโนมัติ

Transfer Data from S3 bucket to Snowflake via pandas | Complete Code & Installation from Scratch