Airflow with DBT tutorial - The best way!

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 พ.ย. 2024

ความคิดเห็น • 111

  • @derikRoby
    @derikRoby ปีที่แล้ว +12

    To anyone following the video now,
    The DBTDeps module has been depreciated. Deps are automatically installed if they are present in packages.yml files inside your dbt project. Follow the official docs.

    • @jonasl3683
      @jonasl3683 ปีที่แล้ว +1

      Does that mean i have to put the gcc and python3 inside the packages.yml or can i just delete the packages.txt file in the Astro folder?

    • @ФархадЗамалетдинов-е5м
      @ФархадЗамалетдинов-е5м ปีที่แล้ว +2

      Is any tutorial available? please provide some link or further explanation

    • @johnnote7
      @johnnote7 ปีที่แล้ว

      work on astronomer-cosmos[dbt.all]==0.6.0

  • @okonvictor8711
    @okonvictor8711 8 หลายเดือนก่อน

    Cool videos.. for DBT cloud you can define the job and then use a post request to trigger via air flow. You can also set dependencies between jobs

  • @datalearningsihan
    @datalearningsihan ปีที่แล้ว +3

    I was actually learning from the best of the bests on Udemy. I had no idea. I am enjoying your teaching as well.

    • @MarcLamberti
      @MarcLamberti  ปีที่แล้ว +1

      You’re the best 🫶

    • @zahabkhan6832
      @zahabkhan6832 6 หลายเดือนก่อน

      @@MarcLamberti where can i find the code files u mentioned u will put in the discription

  • @rattaponinsawangwong5482
    @rattaponinsawangwong5482 ปีที่แล้ว +5

    Great content!! I try to follow along with this content and it works fine, like 95% of it. Just a few additional settings in case you might face some problem with the module name "pytz" (I got the module name pytz not found error while trying to run the dag), You could just add pytz into the requirements.txt file then it would work perfectly.

    • @amansharma-gj7eu
      @amansharma-gj7eu ปีที่แล้ว

      did you get below error during execution of jaffle_shop dag?
      improper relation name (too many dotted names): public.***.public.customers__dbt_backup

    • @rattaponinsawangwong5482
      @rattaponinsawangwong5482 ปีที่แล้ว

      No, I don’t. But I worked with this tutorial 8 months ago. So, maybe the tutorial was updated with something I never try.
      Based on error message, I think it about naming of some parameters. You might cross-check if it matched with the tutorial.

    • @khrs2077
      @khrs2077 5 หลายเดือนก่อน

      hai i got this error , i just add pytz==2022 but doesnt work for me

  • @emilsp2028
    @emilsp2028 ปีที่แล้ว +5

    "ModuleNotFoundError: No module named 'cosmos.providers'" when trying to import the DAG. Which package should i install and in which --configuration file should i put it (packages, requirements, dbt-requirements or the Dockerfile???

    • @dffffffawsefdsfgvsef
      @dffffffawsefdsfgvsef ปีที่แล้ว +1

      I am getting the same error. Any solution found for this ?

    • @renatomoratti5947
      @renatomoratti5947 ปีที่แล้ว

      Same here, did you find the solution already?

    • @johnnote7
      @johnnote7 ปีที่แล้ว +3

      in requirements.txt change astronomer-cosmos[dbt.all]==0.6.0

    • @amansharma-gj7eu
      @amansharma-gj7eu ปีที่แล้ว

      did you got below error during execution of jaffle_shop dag?
      improper relation name (too many dotted names): public.***.public.customers__dbt_backup

  • @jonasl3683
    @jonasl3683 ปีที่แล้ว +6

    Hey Marc, thanks for the great tut!!! :)
    But i cant really get it to work, i get the error message "ModuleNotFoundError: No module named 'cosmos.providers'" when trying to import the DAG. Which package should i install and in which --configuration file should i put it (packages, requirements, dbt-requirements or the Dockerfile???) i am kinda confused why there are two requirements files...

    • @dffffffawsefdsfgvsef
      @dffffffawsefdsfgvsef ปีที่แล้ว

      I am getting the same error. Any solution found for this ?

    • @carolinabtt
      @carolinabtt ปีที่แล้ว

      I got the same error. If anyone knows how to fix it let us know! thanks

    • @renatomoratti5947
      @renatomoratti5947 ปีที่แล้ว

      Same error here, did you find a solution?

    • @amansharma-gj7eu
      @amansharma-gj7eu ปีที่แล้ว

      did you got below error during execution of jaffle_shop dag?
      improper relation name (too many dotted names): public.***.public.customers__dbt_backup
      @@renatomoratti5947

    • @amansharma-gj7eu
      @amansharma-gj7eu ปีที่แล้ว

      did you got below error during execution of jaffle_shop dag?
      improper relation name (too many dotted names): public.***.public.customers__dbt_backup
      @@dffffffawsefdsfgvsef

  • @nonojinomo
    @nonojinomo ปีที่แล้ว +6

    Great video! Would you have any example of how to run only a specific model, or any other commands, instead of the whole project?
    Couldn't find it on the docs!

  • @GitHubertP
    @GitHubertP ปีที่แล้ว

    Great video, there are some changes I had to make to have this example working but in the end it helped me a lot, thank you :)

    • @MarcLamberti
      @MarcLamberti  ปีที่แล้ว

      Thank you! Could you tell me which one so I can pin that in comment?

    • @GitHubertP
      @GitHubertP ปีที่แล้ว +1

      In your notion there is a definition for a jaffle_shop DAG that throws errors in current state during import (I took code from Notion provided in description):
      TypeError: DAG.__init__() got an unexpected keyword argument 'dbt_executable_path' #1
      TypeError: DAG.__init__() got an unexpected keyword argument 'conn_id' #2
      TypeError: DbtToAirflowConverter.__init__() missing 1 required positional argument: 'profile_config' #3
      TypeError: DbtToAirflowConverter.__init__() missing 1 required positional argument: 'project_config' #4
      So instead of defining conn_id and dbt_executable_path creating DbtDag it should be done this way for example:
      from airflow.datasets import Dataset
      from datetime import datetime
      from cosmos import DbtDag, ProjectConfig, ProfileConfig, ExecutionConfig
      from cosmos.profiles import PostgresUserPasswordProfileMapping
      profile_config = ProfileConfig(
      profile_name="demo_dbt",
      target_name="dev",
      profile_mapping=PostgresUserPasswordProfileMapping(
      conn_id="postgres",
      profile_args={"schema": "public"},
      ),
      )
      config = ProjectConfig("/usr/local/airflow/dbt/my_project")
      exec_config = ExecutionConfig(dbt_executable_path="/usr/local/airflow/dbt_venv/bin/dbt")
      dbt_model = DbtDag(
      dag_id="dbt_model",
      start_date=datetime(2023, 1, 1),
      schedule=[Dataset(f"SEED://seed_dataset")],
      profile_config=profile_config,
      project_config=config,
      execution_config=exec_config, # default exec mode ExecutionMode.LOCAL
      )
      dbt_model
      I am defining ProjectConfig, ProfileConfig, ExecutionConfig separetly and then passing all necessary config to DbtDag, same stuff I did in part with seeds but there is no problem with passed values straight into DbtRunOperationOperator and DbtSeedOperator so change in tutorial is not needed right now :)

    • @GitHubertP
      @GitHubertP ปีที่แล้ว

      I have different names for the profile and dataset, etc., but the logic is the same as on the Notion site

    • @amansharma-gj7eu
      @amansharma-gj7eu ปีที่แล้ว

      did you got below error during execution of jaffle_shop dag?
      improper relation name (too many dotted names): public.***.public.customers__dbt_backup
      @@GitHubertP

    • @amansharma-gj7eu
      @amansharma-gj7eu ปีที่แล้ว

      did you got below error during execution of jaffle_shop dag?
      improper relation name (too many dotted names): public.***.public.customers__dbt_backup
      @@MarcLamberti

  • @sabaokangan
    @sabaokangan ปีที่แล้ว

    Thank you so much for sharing this with us on TH-cam

  • @martrom0
    @martrom0 11 หลายเดือนก่อน

    Excellent !! Thanks a lot!

  • @InfinitesimallyInfinite
    @InfinitesimallyInfinite ปีที่แล้ว

    This is awesome! Thanks for sharing! Subscribed. 👍🏼

  • @agnitchatterjee
    @agnitchatterjee ปีที่แล้ว

    Great read. Has anyone installed the cosmos package without the Astro CLI and get the dbt dags working?

  • @SoumilShah
    @SoumilShah 9 หลายเดือนก่อน +2

    Does not work

  • @ignaciovinuales8235
    @ignaciovinuales8235 ปีที่แล้ว +1

    Thank you for the video! I have airflow in production on a kubernetes cluster (deployed it using the official helm charts). Is there any straight-forward way of integrating cosmos with git-sync?

  • @chayakiraneng
    @chayakiraneng ปีที่แล้ว +1

    Thanks for this walkthrough. It's very helpful.
    While I use the BashOperator, I could specify the threads and run multiple models in parallel.
    When I use the cosmos package and DBTTaskGroup, there doesn't seem to be any such config to run models in parallel. This increases our run times. Am I missing some config to run in parallel?

  • @ĐạtTrầnQuốc-p9i
    @ĐạtTrầnQuốc-p9i หลายเดือนก่อน

    hi @Marc, how can I only run with some specific models, tests.... ?

  • @Cantblendthis
    @Cantblendthis หลายเดือนก่อน

    This is exactly what I need as I'm starting my Airflow journey. dbt and Dagster are already running on my Windows, but I like to learn Airflow (in WSL2) as well. Question though, normally dbt needs a Python venv, activated, to run, compile, etc, but the venv isn't part of a repo push. So aren't you missing the entire venv that actually runs the dbt models?

  • @samsoneromonsei9368
    @samsoneromonsei9368 2 หลายเดือนก่อน

    Does cosmos package only integrate with Astro airflow version,because I use yaml file to deploy my airflow containers

  • @gopikiran2950
    @gopikiran2950 ปีที่แล้ว

    Please make video on dataform and airflow

  • @karthikrajashekaran
    @karthikrajashekaran ปีที่แล้ว

    When will this dbt-core with airflow be supported as standards in Airflow

  • @SudipAdhikari-q9u
    @SudipAdhikari-q9u ปีที่แล้ว

    What version of astonomer-comos were you using while creating this tutorial? The module is actively developing and its changing so cant follow thorughly.

  • @palaner
    @palaner ปีที่แล้ว

    Hi! Could someone give me the answer on the next question?
    Is it possible to use full refresh with cosmos package?

  • @KirillP-b1v
    @KirillP-b1v ปีที่แล้ว

    When is support for Clickhouse expected?

  • @palaner
    @palaner ปีที่แล้ว +1

    Cool integration. But can someone explain me please is it possible to generate dbt docs somehow using this approach?

    • @MarcLamberti
      @MarcLamberti  ปีที่แล้ว +1

      Here astronomer.github.io/astronomer-cosmos/configuration/generating-docs.html 🫶

  • @detemegandy
    @detemegandy 11 หลายเดือนก่อน

    Amazing! now make one for cloud instead of local? :D

  • @dffffffawsefdsfgvsef
    @dffffffawsefdsfgvsef ปีที่แล้ว

    To Anyone, can we do this setup using AWS managed service Airflow? where we don't have the access to get to the command line. Any idea. Please share your thoughts.

  • @andress121
    @andress121 2 หลายเดือนก่อน

    in 16:53, why did you need to re-run Import Seeds if those tables are already in the DB? Thank you!

  • @h.ibrahimannadnc3673
    @h.ibrahimannadnc3673 6 หลายเดือนก่อน +1

    hi i got an error on airflow ui like this. Is there any ideas about this error?
    ModuleNotFoundError: No module named 'cosmos.providers'

    • @thomasbooij6239
      @thomasbooij6239 4 หลายเดือนก่อน

      i got the same, did you find a solution?

    • @HuyPham-id7us
      @HuyPham-id7us หลายเดือนก่อน

      i got the same error, did you find a solution?

  • @ornachshon1
    @ornachshon1 9 หลายเดือนก่อน

    If I want to add Cosmos to my existing Airflow.
    Is it possible?
    How?

  • @heshamh96
    @heshamh96 ปีที่แล้ว

    This is mind-blowing man ... But Amazing as it is ... ... We still need to execute dbt commands ... 1 by 1 😅... But again... Great video

    • @MarcLamberti
      @MarcLamberti  ปีที่แล้ว

      No you won’t 🥹 Comos translates your dbt project into a DAG with tasks corresponding to your models, tests etc. It’s a much better integration than running *indeed* one command at a time with the BashOperator. Thank you for your kind words 🙏

    • @maximilianopadula5470
      @maximilianopadula5470 ปีที่แล้ว

      @@MarcLamberti Thanks a lot for the video, my question is. can you run task at different schedules? ie. I'd like my stg models to run every 5 minutes but my intermediate every day. I couldn't find an answer in the cosmos documentation. Many thanks

    • @amansharma-gj7eu
      @amansharma-gj7eu ปีที่แล้ว

      did you got below error during execution of jaffle_shop dag?
      improper relation name (too many dotted names): public.***.public.customers__dbt_backup
      @@maximilianopadula5470

  • @StephenRayner
    @StephenRayner 4 หลายเดือนก่อน

    What about dbt+meltano+airfow+cosmos?

  • @scharif
    @scharif 2 หลายเดือนก่อน

    Can someone please explain how the github workflow is integrated here?
    Is Airflow linked to the Master branch in Github?
    Also, How does CI/CD work with this? For example, I push the project to a branch and I want to know if anything will break in prod

  • @andress121
    @andress121 2 หลายเดือนก่อน

    in 12:26, why do you have to drop the seeds before running the dbt seed command? In dbt, that command would simply override the existing seeds, why do we need to drop them first? thank you for clarifying

  • @datalearningsihan
    @datalearningsihan 6 หลายเดือนก่อน

    I have an etl process in place in the ADF. In our team, we wanted to implement the table and views transformation and implementation with dbt core. We were wondering if we could orchestrate the dbt with Azure. If so, then how? One of the approaches I could think of was to use Azure Managed Airflow Instance. But, will it allow us to install astronomer cosmos? I have never implemented dbt this way before, so needed to know if this would be the right approach or is there anything else you would suggest me?

    • @sridharstreakssri
      @sridharstreakssri 5 หลายเดือนก่อน

      guess dbt doesn't have something for azure. But if u have access to fabric u could take a look at it as it offers a complete analytics platform. But if u r looking for making SQL dynamic the way dbt does using jinja templating then idk.

  • @Omzodijacky
    @Omzodijacky ปีที่แล้ว

    Great video , very informative !
    one question , does the Cosmos allow us to run specific model in DBT or a specific tag in the dbt model ?

  • @jwc7663
    @jwc7663 ปีที่แล้ว

    Great videos. Thanks. One question, What If I have to use k8sExecutor? In this case, `dbt deps` should be precedented on every dbt tasks(because each container task in a pod will loose very first dbt deps context). How can I handle this?

  • @RajeshKumar-re8tj
    @RajeshKumar-re8tj ปีที่แล้ว +1

    This will be added in you airflow course in Udemy

  • @luisjuarez-lg9xp
    @luisjuarez-lg9xp ปีที่แล้ว +1

    can this work with airflow in aws MWAA?

    • @maximilianopadula5470
      @maximilianopadula5470 ปีที่แล้ว

      Interested on this too. I imagine it can? mostly curious on the CI CD part which i guess will be a cosmos build to s3.

  • @MrMal0w
    @MrMal0w ปีที่แล้ว

    Nice and well explained video ! Do you have plan to do a DBT + Dagster intégration vidéo ? It could be interesting :)

    • @MarcLamberti
      @MarcLamberti  ปีที่แล้ว +1

      I didn’t try Dagster yet but why not 🤓

  • @Pegasus1311
    @Pegasus1311 ปีที่แล้ว

    So it's like legianires? Air flow and database...

  • @elteixeiras
    @elteixeiras ปีที่แล้ว

    Thank you very much Mark, for your generous initiative. A single point is that the udemy link in the video details returns an error.

    • @MarcLamberti
      @MarcLamberti  ปีที่แล้ว

      Hi Luciano,
      Where? In the email I sent?

    • @farisazhan6428
      @farisazhan6428 ปีที่แล้ว +1

      @@MarcLamberti in this video's description. the udemy link doesn't work

    • @elteixeiras
      @elteixeiras ปีที่แล้ว

      @@MarcLamberti Where it says BECOME A PRO:

    • @MarcLamberti
      @MarcLamberti  ปีที่แล้ว

      Fixed! Thank you guys ❤️

    • @KirillP-b1v
      @KirillP-b1v ปีที่แล้ว

      @@MarcLamberti When is support for Clickhouse expected?

  • @imosolar
    @imosolar ปีที่แล้ว

    Good

  • @ideal176
    @ideal176 ปีที่แล้ว

    Is this a full replacement for dbt cloud?

    • @MarcLamberti
      @MarcLamberti  ปีที่แล้ว +2

      Nop but it helps to integrate dbt core in Airflow :)

  • @WalterHoekstra-e6x
    @WalterHoekstra-e6x ปีที่แล้ว

    hey Marc, it seems the API for this package has changed quite a bit recently, and I'm having a really hard time getting the Execution Modes figured out given the lack of a proper example that uses the most current version of cosmos. Is there any chance you could do a deepdive on how to configure the latest version of cosmos with Docker / K8s executors?

    • @MarcLamberti
      @MarcLamberti  ปีที่แล้ว

      Yes! I will make an updated video. What execution modes are you referring to?

    • @WalterHoekstra-e6x
      @WalterHoekstra-e6x ปีที่แล้ว

      @@MarcLamberti thanks for getting back to me. I'm specifically referring to the ExecutionMode.DOCKER and ExecutionMode.KUBERNETES. My company generally prefers keeping their airflow instances as clean as possible and running everything on k8s where possible

    • @amansharma-gj7eu
      @amansharma-gj7eu ปีที่แล้ว

      did you got below error during execution of jaffle_shop dag?
      improper relation name (too many dotted names): public.***.public.customers__dbt_backup
      @@MarcLamberti

  • @as_sulthoni
    @as_sulthoni ปีที่แล้ว

    i did integration like this before (but i built my own dbt loader), but it comes up with Memory Error in airflow because too many concurrent job in dbt models run. what do you suggest to tweak it?

    • @MarcLamberti
      @MarcLamberti  ปีที่แล้ว +1

      Is that an issue you have with Cosmos or is it with your own dbt loader?

    • @as_sulthoni
      @as_sulthoni ปีที่แล้ว

      @@MarcLamberti i use my own dbt loader, so technically my airflow (Cloud Composer) was crashed because the RAM and CPU usage was spiked. Ideally i can increase my RAM and CPU, but unfortunately it was not possible due to cost limitation on my side. So my current solution is deploy standalone dbt to on-prem server (google CE). the integration is looks like Cloud Run integration.

  • @solidnyysnek
    @solidnyysnek ปีที่แล้ว

    magic

  • @capVermillon
    @capVermillon ปีที่แล้ว

    Hi, the tutorial looks good but it doesn't work anymore. Can you please share the versions you are using in it please? Thanks a lot!

    • @amansharma-gj7eu
      @amansharma-gj7eu ปีที่แล้ว

      did you got below error during execution of jaffle_shop dag?
      improper relation name (too many dotted names): public.***.public.customers__dbt_backup

  • @samsonleul7667
    @samsonleul7667 6 หลายเดือนก่อน

    cosmos has a very poor documentation .do not recommend to anyone

    • @MarcLamberti
      @MarcLamberti  6 หลายเดือนก่อน

      Anything you were looking for specifically?

    • @samsonleul7667
      @samsonleul7667 6 หลายเดือนก่อน

      @@MarcLamberti from cosmos.providers.dbt.core.operators import (
      DbtDepsOperator,
      DbtRunOperationOperator,
      DbtSeedOperator,
      ) this imports do no work on the latest version of cosmos and couldn't find their alternatives

  • @nicholasbonn
    @nicholasbonn ปีที่แล้ว

    Unfortunately outdated & useless

    • @MarcLamberti
      @MarcLamberti  ปีที่แล้ว

      How useless? Doesn’t work anymore?

    • @CarbonsHDTuts
      @CarbonsHDTuts 11 หลายเดือนก่อน

      @@MarcLambertiany updates ?

  • @kartheekgummaluri7430
    @kartheekgummaluri7430 8 หลายเดือนก่อน

    I'm getting this error
    Broken DAG: [/usr/local/airflow/dags/import-seeds.py]
    Traceback (most recent call last):
    File "", line 241, in _call_with_frames_removed
    File "/usr/local/airflow/dags/import-seeds.py", line 7, in
    from cosmos.providers.dbt.core.operators import (
    ModuleNotFoundError: No module named 'cosmos.providers'

    • @HuyPham-id7us
      @HuyPham-id7us หลายเดือนก่อน

      I got the same problem, Have you solved the error yet?

  • @SoumilShah
    @SoumilShah 9 หลายเดือนก่อน

    Broken DAG: [/usr/local/airflow/dags/import-seeds.py] Traceback (most recent call last):
    File "", line 241, in _call_with_frames_removed
    File "/usr/local/airflow/dags/import-seeds.py", line 6, in
    from cosmos.providers.dbt.core.operators import (
    ModuleNotFoundError: No module named 'cosmos.providers'

    • @kartheekgummaluri7430
      @kartheekgummaluri7430 8 หลายเดือนก่อน

      I'm also getting the same error
      Broken DAG: [/usr/local/airflow/dags/import-seeds.py]
      Traceback (most recent call last):
      File "", line 241, in _call_with_frames_removed
      File "/usr/local/airflow/dags/import-seeds.py", line 7, in
      from cosmos.providers.dbt.core.operators import (
      ModuleNotFoundError: No module named 'cosmos.providers'

    • @kartheekgummaluri7430
      @kartheekgummaluri7430 8 หลายเดือนก่อน

      @marclamberti please help

  • @AnWempe
    @AnWempe 2 หลายเดือนก่อน

    9626 Bartholome Junction

  • @amansharma-gj7eu
    @amansharma-gj7eu ปีที่แล้ว

    did you get below error during execution of jaffle_shop dag?
    improper relation name (too many dotted names): public.***.public.customers__dbt_backup

  • @quangthangnguyen199
    @quangthangnguyen199 4 หลายเดือนก่อน

    Broken DAG: [/usr/local/airflow/dags/import-seeds.py] Traceback (most recent call last):
    File "", line 228, in _call_with_frames_removed
    File "/usr/local/airflow/dags/import-seeds.py", line 6, in
    from cosmos.providers.dbt.core.operators import (
    ModuleNotFoundError: No module named 'cosmos.providers'

  • @qingsun6566
    @qingsun6566 2 หลายเดือนก่อน

    Broken DAG: [/usr/local/airflow/dags/import-seeds.py]
    Traceback (most recent call last):
    File "", line 488, in _call_with_frames_removed
    File "/usr/local/airflow/dags/import-seeds.py", line 6, in
    from cosmos.providers.dbt.core.operators import (
    ModuleNotFoundError: No module named 'cosmos.providers'