How to load reference data to database with Python ETL Pipeline | Excel to Postgres

BI Insights Inc

มุมมอง 12 987

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 29 ม.ค. 2025
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น •

@BiInsightsInc 2 ปีที่แล้ว ⁺⁵
Videos in this series:
Build ETL pipeline: th-cam.com/video/dfouoh9QdUw/w-d-xo.html&t
Automate ETL Pipeline: th-cam.com/video/eZfD6x9FJ4EE/w-d-xo.htmlTL
Incremental Data Load (Source Change Detection): th-cam.com/video/32ErvH_m_no/w-d-xo.html&t
ETL Incremental Data Load (Destination Change Comparison): th-cam.com/video/a_T8xRaCO60/w-d-xo.html
@khaledtellopalacios3072 4 หลายเดือนก่อน
Great
@ОлегПустовалов-л6ы 2 ปีที่แล้ว ⁺¹
Can I get a link to the project! Your GIT link is broken
@BiInsightsInc 2 ปีที่แล้ว ⁺¹
Thanks for the notification. Git link is updated. Here is the direct link to the source code.
github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/etl_load_reference_data.py
@ОлегПустовалов-л6ы 2 ปีที่แล้ว
@@BiInsightsInc thx bro
@wahyuferryansyah9667 ปีที่แล้ว ⁺²
hello sir, im interested with this video. but can show how to transform it? and maybe until data mart. please.
@BiInsightsInc ปีที่แล้ว
Hi Wahyu, if you are looking for a transformation example with Python then check out this video: th-cam.com/video/eZfD6x9FJ4E/w-d-xo.html&t
Data Mart is a concept, a subset of your data warehouse aimed at a particular subject area or department. In this scenario data is staged and you can create a subsequent table/view to expose it for reporting purpose.
@wahyuferryansyah9667 ปีที่แล้ว
@@BiInsightsInc thank you sir
@konnli1 2 ปีที่แล้ว
i am getting this name 'load' is not defined when i run the extract function. what could be the problem?
@BiInsightsInc 2 ปีที่แล้ว
Hi Kon, it seems you're missing the "load" function in your code. You can grab the load function from the complete code at GitHub. Also, try moving the "load" function before "extract" function. Here is the link:
github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/etl_load_reference_data.py
@konnli1 2 ปีที่แล้ว
@@BiInsightsInc later on it worked. I was working on ipynb but anyway i just put it in .py form and its fine
@zamanganji1262 ปีที่แล้ว
If we need to execute this subject in Dagster, How can we do that?
@BiInsightsInc ปีที่แล้ว
You can take this code and covert the functions to "op" or and/or "asset" with the help of Dagster decorators. I have covered how to covert a Python script to "op" in this video here:
th-cam.com/video/t8QADtYdWEI/w-d-xo.html&t
Video on assets is here:
th-cam.com/video/f1TbVGdhmYg/w-d-xo.html&t
@zamanganji1262 ปีที่แล้ว
@@BiInsightsInc Thanks a lot my hero
@satishmajji481 2 ปีที่แล้ว ⁺²
PySpark is widely used over Pandas for large datasets, right??. Please create an end to end ETL pipeline for complex JSON file with PySpark. Also, It would be more helpful if you can include real time Transformation in your videos.
@BiInsightsInc 2 ปีที่แล้ว
Thanks for the suggestion. I will try and cover pyspark next. In the meantime, checkout the AWS Glue videos. Glue is a distributed system, runs on a spark cluster, designed to process big data.
@satishmajji481 2 ปีที่แล้ว
@@BiInsightsInc Sure, Thank you so much for considering my request. Could you please do dedicated series on AWS Glue ETL and Athena to create ETL pipeline and automate it?
@juicetin942 ปีที่แล้ว
What about data rollback incase of error?
@BiInsightsInc ปีที่แล้ว
You can use a with clause and engine.begin() to handle transactions. The block will automatically rollback if an error occurs, otherwise it will commit. It's a good idea for a future video. I will cover it in a future video.
@juicetin942 ปีที่แล้ว
@@BiInsightsInc I want to do upsert query, wo execute statement se ho rha hai, but not able to rollback with execute
@BiInsightsInc ปีที่แล้ว ⁺¹
@@juicetin942 You should use a transaction to execute your upsert statement. Postgres takes care of the commit/rollback in case of error implicitly when you are inside a transaction. Below is straight from their docs.
"Transactions are a fundamental concept of all database systems. The essential point of a transaction is that it bundles multiple steps into a single, all-or-nothing operation. The intermediate states between the steps are not visible to other concurrent transactions, and if some failure occurs that prevents the transaction from completing, then none of the steps affect the database at all."
@juicetin942 ปีที่แล้ว
@@BiInsightsInc thanks
@vpnath75 2 ปีที่แล้ว ⁺²
Nice content but please select a better color-scheme for pyCharm 🙂
@BiInsightsInc 2 ปีที่แล้ว
Thanks. I have been using the default for PyCharm. I will explore other settings going forward.
@rickycamilo4488 ปีที่แล้ว
lmaooooo
@hipphipphurra77 ปีที่แล้ว
overall nice.
you do not need explicitly sqlalchemy neither do yo need create_engine.
pandas to_sql accepts a connection string instead of the create_engine.
@BiInsightsInc ปีที่แล้ว ⁺¹
Thanks on both account. I will try it with connection string. It will help make the code concise.

ต่อไป

เล่นอัตโนมัติ

How to build and automate your Python ETL pipeline with Airflow | Data pipeline | Python