Astronomer
Astronomer
  • 188
  • 436 823
Powering Finance With Advanced Data Solutions at Ramp with Ryan Delgado
Data is the backbone of every modern business, but unlocking its full potential requires the right tools and strategies. In this episode, Ryan Delgado, Director of Engineering at Ramp, joins us to explore how innovative data platforms can transform business operations and fuel growth. He shares insights on integrating Apache Airflow, optimizing data workflows and leveraging analytics to enhance customer experiences.
Key Takeaways:
(01:52) Data is the lifeblood of Ramp, touching every vertical in the company.
(03:18) Ramp’s data platform team enables high-velocity scaling through tailored tools.
(05:27) Airflow powers Ramp’s enterprise data warehouse integrations for advanced analytics.
(07:55) Centralizing data in Snowflake simplifies storage and analytics pipelines.
(12:08) Machine learning models at Ramp integrate seamlessly with Airflow for operational excellence.
(14:11) Leveraging Airflow datasets eliminates inefficiencies in DAG dependencies.
(17:22) Platforms evolve from solving narrow business problems to scaling organizationally.
(18:55) ClickHouse enhances Ramp’s OLAP capabilities with 100x performance improvements.
(19:47) Ramp’s OLAP platform improves performance by reducing joins and leveraging ClickHouse.
(21:46) Ryan envisions a lighter-weight, more Python-native future for Airflow.
Resources Mentioned:
Ryan Delgado -
www.linkedin.com/in/ryan-delgado-69544568/
Ramp -
www.linkedin.com/company/ramp/
Apache Airflow -
airflow.apache.org/
Snowflake -
www.snowflake.com/
ClickHouse -
clickhouse.com/
dbt -
www.getdbt.com/
Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
มุมมอง: 32

วีดีโอ

Quickstart ETL with Airflow (Step 9 of 9)
มุมมอง 7714 วันที่ผ่านมา
Sign up for a free Astro trial! To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?) See this GitHub repository for the full code (github.com/astronomer/quickstart-etl-with-airflow-videos).
Quickstart ETL with Airflow (Step 8 of 9)
มุมมอง 5314 วันที่ผ่านมา
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?) CLI commands: psql -p 5434 -U postgres -d airflow_db (this will vary based on your Postgres setup) SELECT * FROM weather_data.sunset_table; See this GitHub repository for the full code (github...
Quickstart ETL with Airflow (Step 7 of 9)
มุมมอง 4921 วันที่ผ่านมา
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?) See this GitHub repository for the full code (github.com/astronomer/quickstart-etl-with-airflow-videos/blob/main/include/vid7_code.py).
Exploring the Power of Airflow 3 at Astronomer with Amogh Desai
มุมมอง 15821 วันที่ผ่านมา
What does it take to go from fixing a broken link to becoming a committer for one of the world’s leading open-source projects? Amogh Desai, Senior Software Engineer at Astronomer, takes us through his journey with Apache Airflow. From small contributions to building meaningful connections in the open-source community, Amogh’s story provides actionable insights for anyone on the cusp of their op...
Quickstart ETL with Airflow (Step 6 of 9)
มุมมอง 7321 วันที่ผ่านมา
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?) See this GitHub repository for the full DAG code (github.com/astronomer/quickstart-etl-with-airflow-videos/blob/main/include/vid6_code.py) and SQL statement (github.com/astronomer/quickstart-e...
Quickstart ETL with Airflow (Step 5 of 9)
มุมมอง 5621 วันที่ผ่านมา
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?) Connection UI fields: Connection ID: my_postgres_conn Connection Type: Postgres Host: your Postgres host Database: your Postgres database Login: your Postgres login Password: your Postgres pas...
Quickstart ETL with Airflow (Step 4 of 9)
มุมมอง 6021 วันที่ผ่านมา
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?) See this GitHub repository for the full code (github.com/astronomer/quickstart-etl-with-airflow-videos/blob/main/include/vid4_code.py).
Quickstart ETL with Airflow (Step 3 of 9)
มุมมอง 6721 วันที่ผ่านมา
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?) See this GitHub repository for the full code (github.com/astronomer/quickstart-etl-with-airflow-videos/blob/main/include/vid3_code.py).
Quickstart ETL with Airflow (Step 2 of 9)
มุมมอง 8221 วันที่ผ่านมา
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?). See this GitHub repository for the full code (github.com/astronomer/quickstart-etl-with-airflow-videos/blob/main/include/vid2_code.py).
Quickstart ETL with Airflow (Step 1 of 9)
มุมมอง 20221 วันที่ผ่านมา
To learn more about best practices for ETL with Airflow get the Apache Airflow® Best Practices for ETL and ELT Pipelines eBook. (www.astronomer.io/ebooks/apache-airflow-best-practices-etl-elt-pipelines/?) Commands and code from the video: docker ps brew install astro astro dev init astro dev start Log in at localhost:8080 with admin for the username and password
Using Airflow To Power Machine Learning Pipelines at Optimove with Vasyl Vasyuta
มุมมอง 89หลายเดือนก่อน
Data orchestration and machine learning are shaping how organizations handle massive datasets and drive customer-focused strategies. Tools like Apache Airflow are central to this transformation. In this episode, Vasyl Vasyuta, R&D Team Leader at Optimove, joins us to discuss how his team leverages Airflow to optimize data processing, orchestrate machine learning models and create personalized c...
Maximizing Business Impact Through Data at GlossGenius with Katie Bauer
มุมมอง 96หลายเดือนก่อน
Bridging the gap between data teams and business priorities is essential for maximizing impact and building value-driven workflows. Katie Bauer, Senior Director of Data at GlossGenius, joins us to share her principles for creating effective, aligned data teams. In this episode, Katie draws from her experience at GlossGenius, Reddit and Twitter to highlight the common pitfalls data teams face an...
Optimizing Large-Scale Deployments at LinkedIn with Rahul Gade
มุมมอง 89หลายเดือนก่อน
Scaling deployments for a billion users demands innovation, precision and resilience. In this episode, we dive into how LinkedIn optimizes its continuous deployment process using Apache Airflow. Rahul Gade, Staff Software Engineer at LinkedIn, shares his insights on building scalable systems and democratizing deployments for over 10,000 engineers. Rahul discusses the challenges of managing larg...
How Uber Manages 1 Million Daily Tasks Using Airflow, with Shobhit Shah and Sumit Maheshwari
มุมมอง 112หลายเดือนก่อน
When data orchestration reaches Uber’s scale, innovation becomes a necessity, not a luxury. In this episode, we discuss the innovations behind Uber’s unique Airflow setup. With our guests Shobhit Shah and Sumit Maheshwari, both Staff Software Engineers at Uber, we explore how their team manages one of the largest data workflow systems in the world. Shobhit and Sumit walk us through the evolutio...
Building Resilient Data Systems for Modern Enterprises at Astrafy with Andrea Bombino
มุมมอง 1122 หลายเดือนก่อน
Building Resilient Data Systems for Modern Enterprises at Astrafy with Andrea Bombino
Introduction to Data Products
มุมมอง 1192 หลายเดือนก่อน
Introduction to Data Products
How to use SLAs for Data Pipelines
มุมมอง 872 หลายเดือนก่อน
How to use SLAs for Data Pipelines
Actionable Pipeline Insights with Astro Observe
มุมมอง 1152 หลายเดือนก่อน
Actionable Pipeline Insights with Astro Observe
Inside Airflow 3: Redefining Data Engineering with Vikram Koka
มุมมอง 1862 หลายเดือนก่อน
Inside Airflow 3: Redefining Data Engineering with Vikram Koka
Building a Data-Driven HR Platform at 15Five with Guy Dassa
มุมมอง 742 หลายเดือนก่อน
Building a Data-Driven HR Platform at 15Five with Guy Dassa
The Intersection of AI and Data Management at Dosu with Devin Stein
มุมมอง 1113 หลายเดือนก่อน
The Intersection of AI and Data Management at Dosu with Devin Stein
AI-Powered Vehicle Automation at Ford Motor Company with Serjesh Sharma
มุมมอง 1504 หลายเดือนก่อน
AI-Powered Vehicle Automation at Ford Motor Company with Serjesh Sharma
From Task Failures to Operational Excellence at GumGum with Brendan Frick
มุมมอง 1414 หลายเดือนก่อน
From Task Failures to Operational Excellence at GumGum with Brendan Frick
Building Modern Data Apps:Choosing the Right Foundation and Tools
มุมมอง 874 หลายเดือนก่อน
Building Modern Data Apps:Choosing the Right Foundation and Tools
From Sensors to Datasets: Enhancing Airflow at Astronomer with Maggie Stark and Marion Azoulai
มุมมอง 1464 หลายเดือนก่อน
From Sensors to Datasets: Enhancing Airflow at Astronomer with Maggie Stark and Marion Azoulai
Mastering Data Orchestration with Airflow at M Science with Ben Tallman
มุมมอง 1284 หลายเดือนก่อน
Mastering Data Orchestration with Airflow at M Science with Ben Tallman
Welcome to The Data Flowcast
มุมมอง 914 หลายเดือนก่อน
Welcome to The Data Flowcast
Enhancing Business Metrics With Airflow at Artlist with Hannan Kravitz
มุมมอง 1024 หลายเดือนก่อน
Enhancing Business Metrics With Airflow at Artlist with Hannan Kravitz
Cutting-Edge Data Engineering at Teya with Alexandre Magno Lima Martins
มุมมอง 4445 หลายเดือนก่อน
Cutting-Edge Data Engineering at Teya with Alexandre Magno Lima Martins

ความคิดเห็น

  • @tablit.
    @tablit. 4 วันที่ผ่านมา

    Great webinar! Thanksss

  • @boldganbaatar7023
    @boldganbaatar7023 6 วันที่ผ่านมา

    how to handle zombie task? for large file copy

  • @walterppk1989
    @walterppk1989 21 วันที่ผ่านมา

    The title is misleading. This video is not about aieflow 3. It's about an individual contributors journey to becoming an airflow contributor. That's cool, but not what I came here for. Please do better in the future.

  • @likithb3726
    @likithb3726 23 วันที่ผ่านมา

    ma'am when i run the command astro dev start i get the following error Error: error building, (re)creating or starting project containers: Error response from daemon: error while creating mount source path '/host_mnt/Users/Bingumalla Likith/Desktop/MLOPS/airflow-astro/dags': mkdir /host_mnt/Users/Bingumalla Likith/Desktop: operation not permitted can you help me out with it. Im using mac.

  • @marceloribeiro2548
    @marceloribeiro2548 หลายเดือนก่อน

    how about the logs ?

  • @dhruvtyagi6118
    @dhruvtyagi6118 หลายเดือนก่อน

    I am able to install astro but getting access denied error when using astro dev init or any astro command

  • @Klifhunger
    @Klifhunger 2 หลายเดือนก่อน

    Insightful 🙏

  • @mranderson7306
    @mranderson7306 2 หลายเดือนก่อน

    ​ @Astronomer, Hello! Could you please tell me how you open the .html documentation that is generated inside the Airflow Docker container through the web interface? When I navigate to "data_docs_url": file:///opt/airflow/gx/uncommitted/data_docs/local_site/index.html," I get a 404 error.

  • @MariaMartin-q8d
    @MariaMartin-q8d 3 หลายเดือนก่อน

    Gonzalez Betty White Scott Anderson Jennifer

  • @MariaMartin-q8d
    @MariaMartin-q8d 3 หลายเดือนก่อน

    Thompson Helen Martinez Helen Lee Laura

  • @MariaMartin-q8d
    @MariaMartin-q8d 3 หลายเดือนก่อน

    Hernandez Brian Lewis Angela Clark Thomas

  • @shadabbigdel5017
    @shadabbigdel5017 3 หลายเดือนก่อน

    The issue with the KubernetesExecutor is that you cannot view the task logs in the Airflow UI because, with KubernetesExecutor, workers are terminated after their job finishes. This issue is not present with the Celery or CeleryKubernetesExecutor. I tried different solutions with Persistent Volumes (PV) and Persistent Volume Claims (PVC), but they didn’t work for me. At the end of the video, Marc also presented the issue, but no solution was provided. Does anyone here know how to resolve it?

    • @Astronomer
      @Astronomer 3 หลายเดือนก่อน

      Hey there, thanks for commenting. It is absolutely possible to get task logs in the Airflow UI when using K8s Executor. If you're working with OSS Airflow, you will need to either enable remote logging so Airflow grabs the logs before the pod spins down, or use a persistent volume to store them. With Astronomer, this is all handled automatically in our Astro Runtime. I'd recommend reading more in the docs here: airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/kubernetes_executor.html

  • @aditya234567
    @aditya234567 3 หลายเดือนก่อน

    Say I have a dag run happening and dynamically I updated the dag tasks.. Will it break the existing dag run? Say that particular dag run has 10 tasks to do, and I update dag when it's doing 1st task. Will it implement old tasks and newly run dag run implement new updated tasks??

  • @PaulDeveaux
    @PaulDeveaux 4 หลายเดือนก่อน

    What I would love to see is how to do this when the dbt and Airflow are in separate repositories. Due to different dependencies between airflow and dbt, this seems like a common use case.

  • @ricardomagnomartins
    @ricardomagnomartins 4 หลายเดือนก่อน

    Que bacana, meu filho!!! Parabéns.

  • @ramdasvk0716
    @ramdasvk0716 4 หลายเดือนก่อน

    The best❤️‍🔥

  • @deborathaisrodriguesdelima5866
    @deborathaisrodriguesdelima5866 5 หลายเดือนก่อน

    Excelente 👏👏

  • @ledinhanhtan
    @ledinhanhtan 5 หลายเดือนก่อน

    Hi, will the dag.test() applied for complex tasks such as SparkSubmitOperator() ? 🙏

  • @goodmanshawnhuang
    @goodmanshawnhuang 5 หลายเดือนก่อน

    Great job, thanks for sharing it.

  • @marcin2x4
    @marcin2x4 6 หลายเดือนก่อน

    Are presented examples available in any code repo?

  • @shadabbigdel5017
    @shadabbigdel5017 6 หลายเดือนก่อน

    Thank you very much for the great presentation and hands-on session. We are going to use Airflow in EKS, and our development Team needed a way to simulate their local environment to test their DAGs during development and become familiar with airflow on Kubernetes. Your guide was extremely helpful.

  • @bryanpolito8576
    @bryanpolito8576 6 หลายเดือนก่อน

    Gracias

  • @AnchalGupta-ek3wr
    @AnchalGupta-ek3wr 7 หลายเดือนก่อน

    After adding the python file and html file, and restarting the web server plugin details are visible in Admin > Plugins path. But the View is not populating in cloud composer. Is there anything else need to be performed?

  • @AnchalGupta-ek3wr
    @AnchalGupta-ek3wr 7 หลายเดือนก่อน

    After adding the python file and html file, and restarting the web server and postgres from docker. But the View is not populating in my local airflow, Is there anything else need to be performed? Running airflow from docker setup, My airflow version is 1.10.15, pretty old, but can't switch to newer version right now

  • @ap2394
    @ap2394 7 หลายเดือนก่อน

    HI Is it possible to schedule the Task using dataset ? or its controlled at Dag level. I mean if i hv 2 task in downstream Dag , do I hv option to customised the schedule on the basis of Task's upstream dataset

  • @spikeydude114
    @spikeydude114 7 หลายเดือนก่อน

    Do you have LinkedIn?

  • @pgrvloik
    @pgrvloik 7 หลายเดือนก่อน

    Great!

  • @rkenne1391
    @rkenne1391 7 หลายเดือนก่อน

    Can you provide more context on the batch inference pipeline ? Airflow is an orchestrator, you will need a different framework to perform batch inference ?

  • @snehal4520
    @snehal4520 7 หลายเดือนก่อน

    Very informative, thank you!

  • @amirhosseinsharifinejad7752
    @amirhosseinsharifinejad7752 8 หลายเดือนก่อน

    Really helpful thank you😍

  • @PaulChung-rg6jv
    @PaulChung-rg6jv 8 หลายเดือนก่อน

    Tons of information. Any chance this can be thrown in a github for us engineers who need more time to digest?

  • @munyaradzimagodo3983
    @munyaradzimagodo3983 8 หลายเดือนก่อน

    thank you, well explained. Created an express application to create DAGs programatically but the endpoints are not working

  • @CarbonsHDTuts
    @CarbonsHDTuts 8 หลายเดือนก่อน

    This is really awesome and I love the entire video and always love content from you guys and girls but could I please give some constructive feedback?

  • @mettuvamshidhar1389
    @mettuvamshidhar1389 8 หลายเดือนก่อน

    Is it possible to get the list of variables pushed through xcom push in first task (here extracting lets say) And can we pull that varibales list xcom_pull and have it as a group Dynamically (instead of A, B, C)??

  • @bilalmsd07
    @bilalmsd07 8 หลายเดือนก่อน

    what about if any of the subtasks fails ? how to trigger the error than but also the remining parallel tasks to be run.

  • @yevgenym9204
    @yevgenym9204 9 หลายเดือนก่อน

    @Astronomer Please share a direct link to the CLI library you mention (for proper files strcuture) th-cam.com/video/zVzBVpbgw1A/w-d-xo.htmlsi=HiJa9Afi-53yLZOG&t=873

    • @Astronomer
      @Astronomer 9 หลายเดือนก่อน

      You can find documentation on the Astro CLI, including download instructions, here: docs.astronomer.io/astro/cli/overview

  • @rohitnath5545
    @rohitnath5545 9 หลายเดือนก่อน

    Do we have a video on how to run airflow using docker on cloud containers. Running locally is fine to learn and test. But the real work is to see how on cloud. Am a consultant and for my clients easier setup is the goal. With airflow i dont see that

    • @Astronomer
      @Astronomer 9 หลายเดือนก่อน

      Astronomer provides a managed service for running Airflow at scale and in the cloud. You can learn more at astronomer.io/try-astro

  • @marehmanmarehman9431
    @marehmanmarehman9431 9 หลายเดือนก่อน

    great work, keep it up.

  • @ryank8463
    @ryank8463 9 หลายเดือนก่อน

    Hi, this video is really beneficial. I have some question about the best practive of handling data transmission btw tasks. I am building MLops using airflow. In my model training dag, it contains data preprocess-> model training. So there would be massive data transmission btw this 2 dags. I am using Xcom to transmit data btw them. But there's like a 2G limitation in Xcom. So what's the best practice to deal with this problem? Using a S3 to sned/pull data from tasks? Or should I simply combine these 2 tasks(data preprocess-> model training)? Thank you.

    • @Astronomer
      @Astronomer 9 หลายเดือนก่อน

      Thank you! For passing larger amounts of data between tasks you have two main options: a custom XCom backend or writing to intermediary storage directly from within the tasks. In general we recommend a custom XCom backend as a best practice in these situations, because you can keep your DAG code the same, the change happens in how the data sent to and retrieved from XCom is processed. You can find a tutorial on how to set up a custom XCom backend here: docs.astronomer.io/learn/xcom-backend-tutorial. Merging the tasks is generally not recommended because it makes it harder to get observability and rerun individual actions.

    • @ryank8463
      @ryank8463 8 หลายเดือนก่อน

      @@Astronomer Hi, Thanks for your valuable reply. I would also like to ask what level of granularity should we aim for when allocating tasks. Since the more tasks there are, the more push/pull data from the external storage happens, and when the data is large, it brings some level of network overhead.

  • @christianfernandez5717
    @christianfernandez5717 9 หลายเดือนก่อน

    Great video. Would also be interested in a webinar regarding scaling the Airflow database since I'm having some difficulties of my own with that.

    • @Astronomer
      @Astronomer 9 หลายเดือนก่อน

      Noted, thanks for the suggestion! If it's helpful, you can check out our guide on the metadata db docs.astronomer.io/learn/airflow-database. Using a managed service like Astro is also one way many companies avoid scaling issues with Airflow.

  • @dan-takacs
    @dan-takacs 10 หลายเดือนก่อน

    great video. I'm trying to make this work with LivyOperator do you know if it can be expanded or partial arguments supplied to it?

    • @Astronomer
      @Astronomer 10 หลายเดือนก่อน

      It should work. Generally you can map over any type of operator, but not that some parameters can't be mapped over (e.g. BaseOperator params). More here: docs.astronomer.io/learn/dynamic-tasks

  • @looklook6075
    @looklook6075 10 หลายเดือนก่อน

    32:29 why "test' connection button is disabled. SO frustrating. Aifrflow makes it so hard to connect to anything. Not intuitive at all. And your video just skipped on how to enable "test". And ask me to contact my deployment admin. lol, I am the deployment admin. Can you show me how? I checked its website and the documentation is not helpful at all. I have been stuck for over a week on how to connect airflow to an MSSQL Sever.

    • @Astronomer
      @Astronomer 10 หลายเดือนก่อน

      The `test` connection button is disabled by default starting in Airflow 2.7 for security reasons. You can enable it by setting the test_connection core config to Enabled. docs.astronomer.io/learn/connections#test-a-connection. We also have some guidance on connecting to an MSSQL server, although the process can vary depending on your exact setup: docs.astronomer.io/learn/connections/ms-sqlserver

    • @quintonflorence6492
      @quintonflorence6492 8 หลายเดือนก่อน

      @@Astronomer Hi, where can I find the core config to make this update? I'm currently using Astro CLI. I'm not seeing this setting in the two .yaml files in the project. Thank you.

  • @saritabasye5254
    @saritabasye5254 10 หลายเดือนก่อน

    *promosm* 💔

  • @pichaibravo
    @pichaibravo 10 หลายเดือนก่อน

    Is it good to return df many times in Airflow?

    • @Astronomer
      @Astronomer 10 หลายเดือนก่อน

      It's generally fine to pass dataframes in between your Airflow tasks, as long as you make sure your infrastructure can support the size of your data. If you use XCom, it's a good idea to consider a custom XCom backend for managing dataframes as Airflow's metadata db isn't set up for this specifically.

  • @ziedsalhi4503
    @ziedsalhi4503 10 หลายเดือนก่อน

    Hi, I have already an existing airflow project, so how can use Astro CLI to run my project ?

  • @greatotool
    @greatotool 10 หลายเดือนก่อน

    is the git repository public?

    • @Astronomer
      @Astronomer 10 หลายเดือนก่อน

      Yes! You can find it here: github.com/astronomer/webinar-demos/tree/best-practices-prod

    • @greatotool
      @greatotool 10 หลายเดือนก่อน

      Thakns!!🙂@@Astronomer

  • @KirillP-b1v
    @KirillP-b1v 10 หลายเดือนก่อน

    please, share repository

    • @Astronomer
      @Astronomer 10 หลายเดือนก่อน

      The repo is here: github.com/astronomer/webinar-demos/tree/best-practices-prod

  • @mcpiatkowski
    @mcpiatkowski 10 หลายเดือนก่อน

    That is great intro and overview of Airflow for beginners! I very much like the datasets concepts and the ability to see data lineage. However, I haven't found the solution for how to make a triggered pipe, that is dataset aware, to be executed with the parent dag execution date. Is it even possible at the moment?

    • @Astronomer
      @Astronomer 10 หลายเดือนก่อน

      Thanks! And that is a great question. It is not possible to have the downstream Dataset-triggered DAG have the same logical_date (the new paramater equivalent to the old execution_date ) as the DAG that caused the update to the dataset, but it is possible to pull that date from the downstream DAG by accessing context["triggering_dataset_events"]: @task def print_triggering_dataset_events(**context): triggering_dataset_events = context["triggering_dataset_events"] for dataset, dataset_list in triggering_dataset_events.items(): print(dataset, dataset_list) print(dataset_list[0].source_dag_run.logical_date) print_triggering_dataset_events() If you use the above in your downstream DAG you can get that logical_date/execution_date to use in your Airflow tasks. For more info and an example with Jinja templating see: airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/datasets.html#fetching-information-from-a-triggering-dataset-event .

    • @mcpiatkowski
      @mcpiatkowski 10 หลายเดือนก่อน

      @@Astronomer That is amazing! You are my hero for life! Thank you!

  • @veereshk6065
    @veereshk6065 11 หลายเดือนก่อน

    Hi, Thank you for detailed demo. I just started exploring dynamic task mapping and I have below requirement where I need to get the data from metadata table and create list of dictionary. [ { 'colA' : 'valueA', 'colB' : 'valueB', 'colC' : 'valueC', 'colD' : 'valueD', }, { 'colA' : 'valueA', 'colB' : 'valueB', 'colC' : 'valueC', 'colD' : 'valueD', }, { 'colA' : 'valueA', 'colB' : 'valueB', 'colC' : 'valueC', 'colD' : 'valueD', }, ] The above structure can be generated using fetch_metadata_task (combination of BigQueryHook and PythonOperator). Now the Question is, how do I generate the dynamic tasks using the above list of dictionary. for each dictionary I want to perform set of tasks ex:GCSToBigQueryOperator, BigQueryValueCheckOperator, BigQueryToBigQueryCopyOperator etc. The sample dag dependancy look like this: start_task >> fetch_metadata_task fetch_metadata_task >> [GCSToBigQueryOperator_table1 >> BigQueryValueCheckOperator_table1 >> BigQueryToBigQueryCopyOperator_table1 >> connecting_dummy_task ] fetch_metadata_task >> [GCSToBigQueryOperator_table2 >> BigQueryValueCheckOperator_table2 >> BigQueryToBigQueryCopyOperator_table2 >> connecting_dummy_task ] fetch_metadata_task >> [GCSToBigQueryOperator_table3 >> BigQueryValueCheckOperator_table3 >> BigQueryToBigQueryCopyOperator_table3 >> connecting_dummy_task ] connecting_dummy_task >> BigQueryExecuteTask >> end_task

  • @veereshk6065
    @veereshk6065 11 หลายเดือนก่อน

    Hi All, Thank you for detailed demo. I just started exploring dynamic task mapping and I have below requirement where I need to get the data from metadata table and create list of dictionary. [ { 'colA' : 'valueA', 'colB' : 'valueB', 'colC' : 'valueC', 'colD' : 'valueD', }, { 'colA' : 'valueA', 'colB' : 'valueB', 'colC' : 'valueC', 'colD' : 'valueD', }, { 'colA' : 'valueA', 'colB' : 'valueB', 'colC' : 'valueC', 'colD' : 'valueD', }, ] The above structure can be generated using fetch_metadata_task (combination of BigQueryHook and PythonOperator). Now the Question is, how do I generate the dynamic tasks using the above list of dictionary. for each dictionary I want to perform set of tasks ex:GCSToBigQueryOperator, BigQueryValueCheckOperator, BigQueryToBigQueryCopyOperator etc. The sample dag dependancy look like this: start_task >> fetch_metadata_task fetch_metadata_task >> [GCSToBigQueryOperator_table1 >> BigQueryValueCheckOperator_table1 >> BigQueryToBigQueryCopyOperator_table1 >> connecting_dummy_task ] fetch_metadata_task >> [GCSToBigQueryOperator_table2 >> BigQueryValueCheckOperator_table2 >> BigQueryToBigQueryCopyOperator_table2 >> connecting_dummy_task ] fetch_metadata_task >> [GCSToBigQueryOperator_table3 >> BigQueryValueCheckOperator_table3 >> BigQueryToBigQueryCopyOperator_table3 >> connecting_dummy_task ] connecting_dummy_task >> BigQueryExecuteTask >> end_task