There's a few different approaches, check out this link that goes through various ways you can set up CICD to use github to store your DAG code! docs.astronomer.io/astro/ci-cd-templates/github-actions
Can Airflow be used to orchestrate a spark streaming YARN job that pulls data from Kafka and writes to HDFS?.. the idea is if the spark streaming job queues and it can be monitored/alerted/detected and restarted automatically by Airflow?
Oh definitely they can! Check out this link for the different options you have for managing Spark via Airflow, you'll probably want to use a Spark hook registry.astronomer.io/providers/apache-airflow-providers-apache-spark/versions/4.1.5
Very useful thanks
Thanks for watching!
If possible, Could you share with us the repo (github) link of Dag's
what would be the github integration look like for if companies want to keep the dag code on github
There's a few different approaches, check out this link that goes through various ways you can set up CICD to use github to store your DAG code! docs.astronomer.io/astro/ci-cd-templates/github-actions
This is Marc Lamberti's webinar about sensors: th-cam.com/video/8J0h-Vlc_44/w-d-xo.html (you mentioned at 11:30)
Can Airflow be used to orchestrate a spark streaming YARN job that pulls data from Kafka and writes to HDFS?.. the idea is if the spark streaming job queues and it can be monitored/alerted/detected and restarted automatically by Airflow?
Oh definitely they can! Check out this link for the different options you have for managing Spark via Airflow, you'll probably want to use a Spark hook registry.astronomer.io/providers/apache-airflow-providers-apache-spark/versions/4.1.5