Hey Jay, thank you for the video. I'd be happy to see you doing more ELT pipelines and focus on your thought's process ( I can watch longer format 1-2 hours) - why you do things in that way, why is it important and whatnot; and you can throw some explainers to anything else you do and the reason behind it. I think senior DE and others with experience do things bit automatically and it takes time for the newbies to pick up on those skills. So, your thought process for doing things instead of just doing the things is priceless for anyone watching, including me. Appreciate your video, dude :)
This code along session (starting from scratch with environment setup, codebase structure ...) is soooooooo helpful. Hope to see more examples like this. Keep up the work my man
I watched the video "How you start learning Data Engineering..." and wondering that can you do a live coding that step through all those aspects (from SQL, command lines... to Kafka...) in 1 project? I think it would help a lot...
Glad to hear it's helpful! 👍 It's great to hear feedback on what type of live coding videos you find insightful. Will keep note on Kafka and Command lines
Awesome video! I already recommended this to my entire team. Please make more like this, they are extremely helpful. Idea for next video: dbt for Snowflake (again) but with Data Vault 2.0 modeling. I would love to see the logic behind creating dim and fact tables, how you define the stg files for creating the hubs/satellites/links.
Oof yea I did consider doing a Data Vault model where we showcase how hubs, satellites and links work but didn't think ppl would be interested. Thanks for raising 👍
This video has the exact answer to my questions as I'm diving into data modeling for analytics. I guaranteee to everyone who is doing this for their first time that they will find this video super helpful. Would be cool to see dbt with Cosmos for smoother operation 👌 EDIT: I was literally just getting into the Deployment part of the video, and there you introduce using Cosmos for Airflow. Kudos!!
I have been struggling with dbt and airflow for a long time. For some reason I could not connect the dots. Having some mixture of knowledge - I landed on this tutorial and it just glued all my scattered dots well. Thanks Jayzern!!! Really appreciate the efforts :)
This video is like a gold mine for building a portfolio especially for someone starting out as a Data Engineer like me!... Manny Thanks and Kudos to you!.. Love from India
thank you so much it is 100% worth and useful... expect some more videos in detail.... like prod deployment through git and git interation with airflow
This is such an amazing video @jayzern! The project taken was not overly complex but also not barebones and covered a lot of important stuff! Thanks for being thoughtful and including the code along link (else some of formatting issues would have bugged many newbies)! I think you should keep creating more videos as you are a good teacher. Only suggestion I have is may be include a bit more explanation, which will help beginners even more! Kudos!
Hi! really enjoy your tutorial, would like to see a tutorial how to create data CI/CD pipeline starting from pulling latest branch, running data test on staging, and deploy changes to production after test is complete since not lot of youtuber explaining this
Need more content like this!!! Really amazing video. Just one suggestion I would like to make before diving into the coding part it would be better if you could provide a real world scenario and reference that while writing you code. Thanks
WOW!! ,Thank you so much for this wonderful video, Please keep making dbt + airflow videos, I have one doubt, I can see that one task in airflow which is stg_tpch_orders have run + test in your dag, But it is not showing up in mine, Have you added any tests on stg_tpch_orders ? but maybe missed to show it into the video ?
Hmm it's hard to tell without looking at ur code, but there is a generic test for stg_tpch_orders that looks at the relationship between fct_orders and stg_tpch_orders. Check your generic_tests.yml file to confirm Thanks for the support man!
So to my understanding, the singular tests really mean to check if nothing is the result of the query been tested. If the test is true, then nothing equates to the query been tested - Great your data is fine. If false, you should run that query to see what exactly are those rows. Confusing at first but makes sense now.
Hi Jay. Question: Once you have created the Fact table, how does this process work if I run it again? Is it going to append new records and update the existing ones? Or is it going to drop and create the Fact table over again?
Jay! Thanks for the video and content very cool to see. Curious why Airflow over something like FiveTran besides the ability to self host? Any gotchas?
FiveTran is not really an orchestration tool - it's really meant for the "Extract Load" part only. It's great because of Unix philosophy, i.e. "do one thing, do one thing well only", whereas Airflow is more of a generalist, task-based orchestrator. Another thing is FiveTran is super expensive, unless you're working on something enterprise-y
No worries, I realized you used it from the Cosmos github repo so I managed to find it there and finally was able to wire up everyhing and deploy it. 🤓 Thanks Jay. It's a super helpful tutorial. @@jayzern
26:00 item_discount_amount is supposed to be negative because the macro defined it as such. I also checked the data on snowflake and they're all negative amounts. Did I miss something?
I cannot run my dbt project. I’m still a beginner but I do not understand why this happens, considering that my macros directory is empty except for a .gitkeep file: Compilation Error dbt found two macros named "materialization_table_default" in the project "dbt". To fix this error, rename or remove one of the following macros: - macros/materializations/models/table/table.sql - macros/materializations/models/table.sql
You can Dockerize it at the beginning, or once you have a baseline model working. I've seen cases where Data engineers start with Docker, or Dockerize it halfway! I personally prefer the latter
Hey, thanks for the project tutorial. i was wondering if there is the best way to deploy airflow on a cloud enviroment... I see a lot of Ec2 or EKS (kubernetes). But maybe i could work on ECS + Fargate? Which deploy method would you please recomend regarding a production scenario? (like beyond studies, thinking about a daily job task). Thank you mate
Airflow + EKS is probably the most common in the industry because of cost reasons and vertical scaling. You could use ECS + Fargate too, but fargate is really expensive! I don't have any recs atm, but will try to create more examples on production DAGs next time. Check out th-cam.com/video/Xe8wYYC2gWQ/w-d-xo.html in the meantime!
Error solved!!!! for anyone facing this error: Runtime Error Database error while listing schemas in database "dbt_db" Database Error 250001: Could not connect to Snowflake backend after 2 attempt(s).Aborting Try the second method to update account name for your project inside profile.yml file. account_locator-account_name
Hi Jay, thanks for the video. I'm having an issue connecting to Snowflake backend at the stage you first perform 'dbt run' @ 14:50 . This is the error I get: 15:17:54 Encountered an error: Runtime Error Database error while listing schemas in database "dbt_db" Database Error 250001: Could not connect to Snowflake backend after 2 attempt(s).Aborting I've checked the profiles.yml file and all details are correct. Please help!
@MalvinSiew I solved one of the two errors I was facing. I did not have Git installed in my system. You can simply ask AI for prompts to guide you through the installation process.
Had the same problem, when passing the account_value with 'dbt init' I wasn't able to connecto using the ccount url value, only with the second option which was the - value
Hi I am trying your proect and got stuk here can you here 21:32:24 Unable to do partial parsing because saved manifest not found. Starting full parse. 21:32:25 Encountered an error: Compilation Error Model 'model.DATA_PIPELINE.stg_tpch_orders' (models/staging/stg_tpch_orders.sql) depends on a source named 'tpch.orders' which was not found
Hello.. thanks for the tutorial. I know airflow runs the tasks/dags however I cannot follow one thing; how do we determine the order of the action items at 35:36 within dbt (I believe it is determined on dbt side) since we have only one dag running on this example? I appreciate if anyone replies.
couldn't run int_order_items.sql because it ruturns a strange error: it says: "The selection criterion 'int_order_items.sql' does not match any enabled nodes". And if aI run "dbt run" it says: " unexpected '.' in line 1" at 20:22
hi ! I'm having trouble connecting to snowflake. can someone please help me resolve it . I just started learning dbt and snowflake . Runtime Error Database error while listing schemas in database "dbt_db" Database Error 250001: Could not connect to Snowflake backend after 2 attempt(s).Aborting
I materialized marts as tables but int_order_items, int_order_items_summary and fct_orders are created as views instead of tables. How do I convert these views to tables?
hi guys kindly help me out, does only snowflakes and dbt is enought are i have to learn hadoop, spark etc i am working as data analyst for last 1 year and planning to switch to de
Hi Jay, good one..am trying same way but getting below error " 1 of 1 ERROR creating view model dbt_schema.stg_tpch_line_items................. [ERROR in 0.04s] 06:17:33 06:17:33 Finished running 1 view model in 2.02s. 06:17:33 06:17:33 Completed with 1 error and 0 warnings: 06:17:33 06:17:33 Compilation Error in model stg_tpch_line_items (models\staging\stg_tpch_line_items.sql) 06:17:33 'dict object' has no attribute 'type_string' 06:17:33 06:17:33 > in macro generate_surrogate_key (macros\sql\generate_surrogate_key.sql) 06:17:33 > called by macro default__generate_surrogate_key (macros\sql\generate_surrogate_key.sql) 06:17:33 > called by model stg_tpch_line_items (models\staging\stg_tpch_line_items.sql)"
Try checking if your dbt_utils version is correct. There seems to be a compile time error with calling generate surrogate key. The code is available in notion page.
If your company only uses dbt and no other tooling, dbt cloud works too However in the real world, it's hard to control your CRON schedule when you have many tools in your stack. Orchestrators job is to focus on scheduling. Linux philosophy of do one thing, do one thing well TLDR
Yea that's great question! In theory dbt cloud can trigger jobs too, but in practice you'd want to decouple your orchestration tool away from your transformation tool for a myriad of reasons: ability to orchestrate other tools together with dbt, avoid vendor lock from dbt, many companies are comfortable with Airflow etc. It really depends on your tech stack
Hello, thanks for this tutorial. At the very beginning, when trying to run the "dbt deps" command I'm getting this error : "Encountered an error loading local configuration: dbt_cloud.yml credentials file for dbt Cloud not found. Download your credentials file from dbt Cloud to `C:\Users\a.schirina\.dbt`". I'm using dbt command locally and my profiles.yml in the .dbt folder is data_pipeline: target: dev outputs: dev: type: snowflake account: jpb45436 # User/password auth user: alices password: mypassword role: dbt_role database: dbt_db warehouse: dbt_wh schema: dbt_schema threads: 4 client_session_keep_alive: False Does anyone know the problem?
(venv) PS C:\Users\hsrak\Desktop\DataManagemet2 ew\data_pipeline\dbt-dag> brew install astro brew : The term 'brew' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again. At line:1 char:1 + brew install astro + ~~~~ + CategoryInfo : ObjectNotFound: (brew:String) [], CommandNotFoundException + FullyQualifiedErrorId : CommandNotFoundException Please help me
I double checked the link and it's working, try this bittersweet-mall-f00.notion.site/Code-along-build-an-ELT-Pipeline-in-1-Hour-dbt-Snowflake-Airflow-cffab118a21b40b8acd3d595a4db7c15?pvs=74 Let me know what error you see
Hello did anyone else face this error at Airflow after @32:50 Broken DAG: [/usr/local/airflow/dags/dbt-dag.py] Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/cosmos/operators/base.py", line 361, in __init__ self.full_refresh = full_refresh ^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 1198, in __setattr__ if key in self.__init_kwargs: ^^^^^^^^^^^^^^^^^^ AttributeError: 'DbtRunLocalOperator' object has no attribute '_BaseOperator__init_kwargs'. Did you mean: '_BaseOperator__instantiated'? please send help
Ok, so I think I was able to find the thread related to this issue.. Its still open as of 8/18/2024 11pm PT.. github.com/astronomer/astronomer-cosmos/issues/1161
Hey Jay, thank you for the video. I'd be happy to see you doing more ELT pipelines and focus on your thought's process ( I can watch longer format 1-2 hours) - why you do things in that way, why is it important and whatnot; and you can throw some explainers to anything else you do and the reason behind it. I think senior DE and others with experience do things bit automatically and it takes time for the newbies to pick up on those skills. So, your thought process for doing things instead of just doing the things is priceless for anyone watching, including me. Appreciate your video, dude :)
Thank you! Will try to create more useful content
Completely agree 😊
honestly never knew about dbt and glad to learn it here thank you
When ETL came about the Cloud did not exist, I was writing shell scripts and SQL almost 30 years ago to do ETL. Useful video thanks!
This code along session (starting from scratch with environment setup, codebase structure ...) is soooooooo helpful. Hope to see more examples like this. Keep up the work my man
I watched the video "How you start learning Data Engineering..." and wondering that can you do a live coding that step through all those aspects (from SQL, command lines... to Kafka...) in 1 project? I think it would help a lot...
Glad to hear it's helpful! 👍
It's great to hear feedback on what type of live coding videos you find insightful. Will keep note on Kafka and Command lines
Concise and to the point. It was very helpful. Thanks, please show more end to end complex projects like this
Awesome video! I already recommended this to my entire team. Please make more like this, they are extremely helpful.
Idea for next video: dbt for Snowflake (again) but with Data Vault 2.0 modeling. I would love to see the logic behind creating dim and fact tables, how you define the stg files for creating the hubs/satellites/links.
Oof yea I did consider doing a Data Vault model where we showcase how hubs, satellites and links work but didn't think ppl would be interested. Thanks for raising 👍
This video has the exact answer to my questions as I'm diving into data modeling for analytics. I guaranteee to everyone who is doing this for their first time that they will find this video super helpful.
Would be cool to see dbt with Cosmos for smoother operation 👌
EDIT: I was literally just getting into the Deployment part of the video, and there you introduce using Cosmos for Airflow. Kudos!!
I have been struggling with dbt and airflow for a long time. For some reason I could not connect the dots. Having some mixture of knowledge - I landed on this tutorial and it just glued all my scattered dots well. Thanks Jayzern!!! Really appreciate the efforts :)
Thanks for sharing this dbt tutorial! It’s definitely super hot rn and useful to learn. 🎉
wait till you learn about sqlmesh
very good session, helped me get a much more concrete idea about how those tools look like and how they work together
i'm new to snowflake, dbt and airflow,
this is awesome tutorial, got to learn a lot
thank you jayzern
AN ABSOLUTE GOLDMINE OF AN INFORMATION WHICH NOT AY UDEMY OR TH-cam TUTOR HAS PROVIDED YET!
Extremely useful content, i especially liked live googling and debugging parts
Thank you for the support! Hope other people find it useful too.
thank you so much for this tutorial. hope you have more videos in the future
Thanks man!
Great video and explanation. we need more videos from you.
This video is like a gold mine for building a portfolio especially for someone starting out as a Data Engineer like me!... Manny Thanks and Kudos to you!.. Love from India
Hey how did you use snowflake? Did you buy it because it shows me that it is a paid software
@@adityakulkarni3798 I am wondering the same thing
Thank you very much. This is very nice and concise tutorial, exactly what I need.
Thanks @jayzern. This tutorial is awesome. I will be recommending it to folks who struggle with connecting dbt with any database engine.
Thank you, thank you THANK YOU! This was so helpful, easy to follow and made perfect sense.
Great video.
I would love to see a complex ETL pipelines.
Please post more videos, your videos are awesome and very instructive
thank you so much it is 100% worth and useful... expect some more videos in detail.... like prod deployment through git and git interation with airflow
This is such an amazing video @jayzern! The project taken was not overly complex but also not barebones and covered a lot of important stuff! Thanks for being thoughtful and including the code along link (else some of formatting issues would have bugged many newbies)!
I think you should keep creating more videos as you are a good teacher. Only suggestion I have is may be include a bit more explanation, which will help beginners even more! Kudos!
Can you please post more videos like this? Really appreciate it. Helps me understand the Dbt/Snowflake/Airflow a lot
Yes sir am working on future videos right now!
Hi! really enjoy your tutorial, would like to see a tutorial how to create data CI/CD pipeline starting from pulling latest branch, running data test on staging, and deploy changes to production after test is complete since not lot of youtuber explaining this
This is actually a brilliant idea, thanks for the rec!
so supportive and completing the project .
Need more content like this!!! Really amazing video. Just one suggestion I would like to make before diving into the coding part it would be better if you could provide a real world scenario and reference that while writing you code. Thanks
Appreciate the feedback man 🙏 will try to incorporate more real-world context before and during the live coding part, that's a great idea
@@jayzern thanks a lot, waiting for some more tutorials😃
It's beautiful! Thx man!
Thanks very much for posting this! Definately earned another subscriber/viewer
I haven't a lot from this tutorial...
Thank you
Amazingly explained 👌
Thank you for the video jayzern. When I push code into Git, should I push code of dbt only, or I need to push all code of dbt-dag ?
Thanks bro for your efforts ❤
WOW!! ,Thank you so much for this wonderful video, Please keep making dbt + airflow videos,
I have one doubt, I can see that one task in airflow which is stg_tpch_orders have run + test in your dag, But it is not showing up in mine,
Have you added any tests on stg_tpch_orders ? but maybe missed to show it into the video ?
Hmm it's hard to tell without looking at ur code, but there is a generic test for stg_tpch_orders that looks at the relationship between fct_orders and stg_tpch_orders. Check your generic_tests.yml file to confirm
Thanks for the support man!
Excellent tutorial!!!
Thank you so much for this, I've been trying to learn how to do this and you helped me solve this
Do you have trainings!!
Thanks man! Yea I'm working on live trainings too so stay tuned 🙌
Great tutorial, i've learning a lot thanks!
love this! thanks for sharing this tutorial, very useful
Thank you, love your work
Dude this is so good :)
WOW! That's is amazing tutorial, thanks a lot.
Great video! What text editor are you using?
So to my understanding, the singular tests really mean to check if nothing is the result of the query been tested.
If the test is true, then nothing equates to the query been tested - Great your data is fine.
If false, you should run that query to see what exactly are those rows.
Confusing at first but makes sense now.
well done! great tutorial!
Pleas make complete videos on DBT WITH snowflake migration project with real time scenario videos bro thnk u❤ nice explaind
Thank you man! Will take that into consideration
Hi Jay. Question: Once you have created the Fact table, how does this process work if I run it again? Is it going to append new records and update the existing ones? Or is it going to drop and create the Fact table over again?
Great video Jay
Hii mr.prasad garu are you data engineer too?
@@RohithPatelKanchukatla Hi there. I am a Data Scientist
Thanks Jayzern,! if I can be of some help for your next video let me know!
excellent video, thank you
Jay! Thanks for the video and content very cool to see. Curious why Airflow over something like FiveTran besides the ability to self host? Any gotchas?
FiveTran is not really an orchestration tool - it's really meant for the "Extract Load" part only. It's great because of Unix philosophy, i.e. "do one thing, do one thing well only", whereas Airflow is more of a generalist, task-based orchestrator. Another thing is FiveTran is super expensive, unless you're working on something enterprise-y
Thanks Jay! Could you also upload into the Notion document the code for the dbt_dag.py file for the Airflow deployment? That's still missing 🙏🏻
Totally forgot about that, thanks for the reminder!
No worries, I realized you used it from the Cosmos github repo so I managed to find it there and finally was able to wire up everyhing and deploy it. 🤓 Thanks Jay. It's a super helpful tutorial. @@jayzern
26:00 item_discount_amount is supposed to be negative because the macro defined it as such. I also checked the data on snowflake and they're all negative amounts. Did I miss something?
Thanks Jayzern
I cannot run my dbt project. I’m still a beginner but I do not understand why this happens, considering that my macros directory is empty except for a .gitkeep file:
Compilation Error
dbt found two macros named "materialization_table_default" in the project
"dbt".
To fix this error, rename or remove one of the following macros:
- macros/materializations/models/table/table.sql
- macros/materializations/models/table.sql
This is great! At what point would you need to dockerize the files though? Sorry, new to data engineering. Thank you!
You can Dockerize it at the beginning, or once you have a baseline model working. I've seen cases where Data engineers start with Docker, or Dockerize it halfway! I personally prefer the latter
Just wonder in the real world scenario, where are all raw data stored? In AWS s3?
Hey, thanks for the project tutorial. i was wondering if there is the best way to deploy airflow on a cloud enviroment... I see a lot of Ec2 or EKS (kubernetes). But maybe i could work on ECS + Fargate? Which deploy method would you please recomend regarding a production scenario? (like beyond studies, thinking about a daily job task). Thank you mate
Airflow + EKS is probably the most common in the industry because of cost reasons and vertical scaling. You could use ECS + Fargate too, but fargate is really expensive!
I don't have any recs atm, but will try to create more examples on production DAGs next time. Check out th-cam.com/video/Xe8wYYC2gWQ/w-d-xo.html in the meantime!
I'm struggling within the step to load dbt data_pipeline, it did not show in the airflow dag. How could I be wrong, can you support?
Error solved!!!!
for anyone facing this error:
Runtime Error
Database error while listing schemas in database "dbt_db"
Database Error
250001: Could not connect to Snowflake backend after 2 attempt(s).Aborting
Try the second method to update account name for your project inside profile.yml file.
account_locator-account_name
Thank you !
Thank you!
Hey! How did you go about updating the account name(or resolving the error)? I can't find the profile.yml file.
thank you very much
Dude, where did you even mention about dbt_project.yml file, in part 2 of the video, you directly jump to vscode
what are the details ??
100% worth it
Hi Jay, thanks for the video. I'm having an issue connecting to Snowflake backend at the stage you first perform 'dbt run' @ 14:50 .
This is the error I get:
15:17:54 Encountered an error:
Runtime Error
Database error while listing schemas in database "dbt_db"
Database Error
250001: Could not connect to Snowflake backend after 2 attempt(s).Aborting
I've checked the profiles.yml file and all details are correct. Please help!
facing the same issue!!!!!! can anyone please help I've restarted and tried everything possible to figure out but failed
@MalvinSiew
I solved one of the two errors I was facing. I did not have Git installed in my system. You can simply ask AI for prompts to guide you through the installation process.
Had the same problem, when passing the account_value with 'dbt init' I wasn't able to connecto using the ccount url value, only with the second option which was the - value
did u solve it? I have the same problem. what is the solution?
@@oreschz could you solve it?
Hi I am trying your proect and got stuk here can you here
21:32:24 Unable to do partial parsing because saved manifest not found. Starting full parse.
21:32:25 Encountered an error:
Compilation Error
Model 'model.DATA_PIPELINE.stg_tpch_orders' (models/staging/stg_tpch_orders.sql) depends on a source named 'tpch.orders' which was not found
Nice Explaination
I'm struggling with airflow connection to snowflake, can you make another video to elaborate it more?
For sure, I didn't explain the airflow integration with snowflake as much as I wanted to
Hello.. thanks for the tutorial.
I know airflow runs the tasks/dags however I cannot follow one thing; how do we determine the order of the action items at 35:36 within dbt (I believe it is determined on dbt side) since we have only one dag running on this example? I appreciate if anyone replies.
Is data engineering dead with advent of AI ? What is the future of data engineering careers in your opinion ?
hi, I would like to know about singular test, we want to check negative value in test, why we use the condition as positive?
couldn't run int_order_items.sql because it ruturns a strange error: it says: "The selection criterion 'int_order_items.sql' does not match any enabled nodes". And if aI run "dbt run" it says: " unexpected '.' in line 1" at 20:22
At 32:21, how did you copy the dbt folders to airflow project?
make video related star and dimension modeling
hi ! I'm having trouble connecting to snowflake. can someone please help me resolve it . I just started learning dbt and snowflake .
Runtime Error
Database error while listing schemas in database "dbt_db"
Database Error
250001: Could not connect to Snowflake backend after 2 attempt(s).Aborting
hey, I have a small request
can you please make a video on how to make use of pyspark efficiently in low spec system with huge amount of data
Low compute Spark + high volumes of data is challenging but will take note. Thx for the suggestion
I materialized marts as tables but int_order_items, int_order_items_summary and fct_orders are created as views instead of tables. How do I convert these views to tables?
Tell me one thing , is data engineering good job profile for freshers
I need a longer video. Please give me.
nice
hi guys kindly help me out, does only snowflakes and dbt is enought are i have to learn hadoop, spark etc i am working as data analyst for last 1 year and planning to switch to de
any prerequisites for this
Hi Jay, good one..am trying same way but getting below error " 1 of 1 ERROR creating view model dbt_schema.stg_tpch_line_items................. [ERROR in 0.04s]
06:17:33
06:17:33 Finished running 1 view model in 2.02s.
06:17:33
06:17:33 Completed with 1 error and 0 warnings:
06:17:33
06:17:33 Compilation Error in model stg_tpch_line_items (models\staging\stg_tpch_line_items.sql)
06:17:33 'dict object' has no attribute 'type_string'
06:17:33
06:17:33 > in macro generate_surrogate_key (macros\sql\generate_surrogate_key.sql)
06:17:33 > called by macro default__generate_surrogate_key (macros\sql\generate_surrogate_key.sql)
06:17:33 > called by model stg_tpch_line_items (models\staging\stg_tpch_line_items.sql)"
Try checking if your dbt_utils version is correct. There seems to be a compile time error with calling generate surrogate key. The code is available in notion page.
I got the same error. How did you solve it?
Overall great, the airflow orchestration felt a bit clunky especially given that the source code had to be kept in the same directory.
Thx for the feedback 👍 ideally should wrap this in a container image, but for simplicity decided to keep it as code
@@jayzern Makes sense, any good resources on self hosting dbt core?
can u tell why have we used airflow since dbt cloud has feature to schedule the jobs?
If your company only uses dbt and no other tooling, dbt cloud works too
However in the real world, it's hard to control your CRON schedule when you have many tools in your stack. Orchestrators job is to focus on scheduling. Linux philosophy of do one thing, do one thing well TLDR
One question here, As we have dbt jobs feature available in dbt cloud and it is very easy to create job here then why it is need to use airflow?
Yea that's great question! In theory dbt cloud can trigger jobs too, but in practice you'd want to decouple your orchestration tool away from your transformation tool for a myriad of reasons: ability to orchestrate other tools together with dbt, avoid vendor lock from dbt, many companies are comfortable with Airflow etc. It really depends on your tech stack
Hi @jayzern, thanks a lot for your video, really valuable content!
Hello, thanks for this tutorial. At the very beginning, when trying to run the "dbt deps" command I'm getting this error : "Encountered an error loading local configuration: dbt_cloud.yml credentials file for dbt Cloud not found. Download your credentials file from dbt Cloud to `C:\Users\a.schirina\.dbt`". I'm using dbt command locally and my profiles.yml in the .dbt folder is
data_pipeline:
target: dev
outputs:
dev:
type: snowflake
account: jpb45436
# User/password auth
user: alices
password: mypassword
role: dbt_role
database: dbt_db
warehouse: dbt_wh
schema: dbt_schema
threads: 4
client_session_keep_alive: False
Does anyone know the problem?
is do i need to pay on astro ? if i want to use this for prod env
You should check out Meltano
I've heard great things about Meltano!
How could i get the project folder structure?
How did he stat? did he create a wroksheet? I tried it but it di not work, the very first steps ?? what arethey?
yes you need to write the queries in a worksheet
Good
(venv) PS C:\Users\hsrak\Desktop\DataManagemet2
ew\data_pipeline\dbt-dag> brew install astro
brew : The term 'brew' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the
spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:1
+ brew install astro
+ ~~~~
+ CategoryInfo : ObjectNotFound: (brew:String) [], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
Please help me
I am not sure why I cannot open the notes, can anyone help?
I double checked the link and it's working, try this
bittersweet-mall-f00.notion.site/Code-along-build-an-ELT-Pipeline-in-1-Hour-dbt-Snowflake-Airflow-cffab118a21b40b8acd3d595a4db7c15?pvs=74
Let me know what error you see
Hello did anyone else face this error at Airflow after @32:50
Broken DAG: [/usr/local/airflow/dags/dbt-dag.py]
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/cosmos/operators/base.py", line 361, in __init__
self.full_refresh = full_refresh
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 1198, in __setattr__
if key in self.__init_kwargs:
^^^^^^^^^^^^^^^^^^
AttributeError: 'DbtRunLocalOperator' object has no attribute '_BaseOperator__init_kwargs'. Did you mean: '_BaseOperator__instantiated'?
please send help
I am facing the exact same error. Please post a reply, if you were able to figure out the fix. I'll do the same if I find a solution.
Ok, so I think I was able to find the thread related to this issue.. Its still open as of 8/18/2024 11pm PT..
github.com/astronomer/astronomer-cosmos/issues/1161
19395 Bryana Station
90' it's a long Time
1.5 hours?
how do you know your username>? jayzer? like I went back to my profile but it did not work. where can I find the name of my user?
In Snowflake you should see your user name in the bottom left corner. It'll be the top bolded value
Roberts Isle
Ziemann Bypass
Good vid but move your facecam out of the terminal