Backfill your DAGs in Apache Airflow: Everything you need to know

แชร์
ฝัง
  • เผยแพร่เมื่อ 14 ต.ค. 2024

ความคิดเห็น • 20

  • @andriifadieiev9757
    @andriifadieiev9757 3 หลายเดือนก่อน

    Amazing episode, thank you for sharing!

    • @MarcLamberti
      @MarcLamberti  3 หลายเดือนก่อน

      Glad you liked it :)

  • @KasperBirkelund
    @KasperBirkelund 2 หลายเดือนก่อน

    Thanks man. Good videos!
    I would like to suggest that you create a video guide for setting up Airflow on an EC2 with all the bells and whistles that are needed for a robust scalable production environment. Together with an example dag. What do you think?

    • @MarcLamberti
      @MarcLamberti  2 หลายเดือนก่อน

      I think it’s a great idea, however, I would never use Airflow on an EC2 instance for setting up a scalable robust production environment. There are much better alternatives IMHO

    • @KasperBirkelund
      @KasperBirkelund 2 หลายเดือนก่อน

      @@MarcLamberti Ok cool whatever you recommend I am ready to learn :D

    • @KasperBirkelund
      @KasperBirkelund 2 หลายเดือนก่อน

      Maybe also something about how the local environment using astro CLI is useful for testing when also having a more advanced production set up in the cloud. Not sure I really understand that work flow.

  • @mason1189
    @mason1189 2 หลายเดือนก่อน

    Microsoft has removed the 365 connectors can I ask that do you have any alternatives for incomingwebhook to trigger alerts 😢?

  • @LoganMerazzi
    @LoganMerazzi 3 หลายเดือนก่อน

    Hi Marc, Thanks for sharing! I have a question... Those commands works even when I'm on Cloud Composer, on Google Cloud?

    • @MarcLamberti
      @MarcLamberti  3 หลายเดือนก่อน +1

      That's good question. Honestly, I've never used Cloud Composer so I can't tell. But if you have access to the CLI then yes, you can do it.

  • @vargas4762
    @vargas4762 3 หลายเดือนก่อน +1

    Do u think that DAG's should be design to running in backfilling mode ?
    I say that, because if we have an example DAG that make queries based on the current date (maybe using SQL function or datetime) the backfilling process will not work as spected, at least with those functions, but we can use the execution_date.

    • @kotvitskypeter3369
      @kotvitskypeter3369 3 หลายเดือนก่อน +3

      And that's the reason you don't backfill DAGs that are not idempotent. And if you want your DAGs idempotent, you use data_interval_end macro instead of current_date

    • @MarcLamberti
      @MarcLamberti  3 หลายเดือนก่อน +2

      Exactly! That's another crucial topic I didn't cover in this video but I do in my Udemy course.
      Your pipelines must be idempotent or you won't be able to backfill them.

  • @TejaswaJain-p5n
    @TejaswaJain-p5n 3 หลายเดือนก่อน

    I have Airflow running on my local docker. When I log into the worker container and run the backfill for a certain DAG, the process starts running for all the other failed DAGs. Am I missing something?? I am using the same command you showed in the demo.

    • @MarcLamberti
      @MarcLamberti  3 หลายเดือนก่อน

      Do you mean failed DAGs or failed DAGs Runs?

    • @TejaswaJain-p5n
      @TejaswaJain-p5n 3 หลายเดือนก่อน

      @@MarcLamberti Failed DAGs, the DAG i was trying to backfill had no failed DAG runs.

    • @MarcLamberti
      @MarcLamberti  3 หลายเดือนก่อน

      @@TejaswaJain-p5n If you specify the dag id to the backfill command, only this DAG will be backfilled.

    • @TejaswaJain-p5n
      @TejaswaJain-p5n 2 หลายเดือนก่อน

      @@MarcLamberti FIgured it out, my schedule was blank. Once I added it things started running

  • @marellasrikanth-us
    @marellasrikanth-us 3 หลายเดือนก่อน

    But for production how do we access the command line to run airflow commands? Wish airflow provides the cli through the UI to make it easy.

    • @MarcLamberti
      @MarcLamberti  3 หลายเดือนก่อน +1

      Even in prod you should be able to access the CLI. Or the platform admin can.

    • @marellasrikanth-us
      @marellasrikanth-us 3 หลายเดือนก่อน

      @@MarcLamberti Thanks for your response. In our organization, the platform team manages the infrastructure, and data engineers won't have access to the containers where Airflow is running to access cli. Since data engineers will have full access to the Airflow UI, I wish we had an alternative solution to perform backfills through the UI.