Thanks a lot for creating this video. For the orchestration. Adf triggers are the only option. With Airflow orchestration we customize on the scheduling part.
First i want to thank you for the content. Been searching a lot for this and you are the only guy who made it clear and concise. I have some questions: 1) Why do i need to connect the managed airflow with datafactory if the managed airflow enviroment was created in the datafactory? That seems strange. 2) Let's say i don't want to manage any adf pipeline using the managed airflow enviroment, i just created it to execute DAG's that carry transformations... In this case do i really need to connect it with datafactory or just run the DAGs in the airflow? 3) How the pricing works? I read in the documentation the pricing but not understand it would you be so kind to explain that to me? Is it based on the amount of hours the managed airflow is running?
Thank you mate, I appreciate your support! 1) well, I am not sure if I understand the question correctly but you do have to navigate from the Data Factory UI to the Airflow environment, this is how they built it. 2) Yes you can execute DAGs that carry transformations outside of Data Factory. For example you can transform data which are in an Azure Database without involving Data Factory. But you need to set up the connection to the Azure SQL database. 3)The pricing is pretty standard learn.microsoft.com/en-us/azure/data-factory/airflow-pricing you are charged based on the hour and the node size. Small node = 0.5 per hour * 24 hours = 12 dollars per day.
Good explanation.. Can we schedule different pipelines from different data factory instances ? Example: Datafactory1_Pipeline1 ->> Datfactory2_Pipeline2
I haven't tried it yet so I can't provide a definite answer. I don't know if you can achieve it directly from the dag but certainly you can achieve it by triggering an ADF pipeline from the DAG which triggers another ADF pipeline (on a different Data Factory) using Web Activity.
I have an etl process in place in the ADF. In our team, we wanted to implement the table and views transformation and implementation with dbt core. We were wondering if we could orchestrate the dbt with Azure. If so, then how? One of the approaches I could think of was to use Azure Managed Airflow Instance. But, will it allow us to install astronomer cosmos? I have never implemented dbt this way before, so needed to know if this would be the right approach or is there anything else you would suggest me?
Unfortunately I haven't tried this approach either so I cannot tell. It seems astronomer cosmos works well with Apache airflow (github.com/astronomer/astronomer-cosmos) so in theory it should work with Azure Managed Airflow instance too. That being said, I haven't tried it, better give it a try and see.
Good content and nicely explained.could you please share how the airflow job can be triggered automatically.ij the video example you have ran the airflow job manually,how can we do it automatically
You either trigger the job manually as I did or you let the scheduler do it. When you define the DAG there is a parameter called schedule. You can specify the exact time using cron syntax if you like. See here for more details: hevodata.com/learn/trigger-airflow-dags/
I have found that managed airflow requires adding drivers for ODBC connections to be made, but the CLI cannot be accessed for the installation of drivers. Additionally, I cannot get a sight of memory allocation or add to it. When working with 6gb CSVs my dag breaks :)
Thanks a lot for creating this video. For the orchestration. Adf triggers are the only option. With Airflow orchestration we customize on the scheduling part.
It was simple, but I'm new to Azure and spent a lot of time on it.
Thank you so much for this video
really good this video! Very clear explanation and example
Great content and step by step explanantion!
Very comprehensive and basics thanks for the tutorial
Great! Thanks for the content
First i want to thank you for the content. Been searching a lot for this and you are the only guy who made it clear and concise.
I have some questions:
1) Why do i need to connect the managed airflow with datafactory if the managed airflow enviroment was created in the datafactory? That seems strange.
2) Let's say i don't want to manage any adf pipeline using the managed airflow enviroment, i just created it to execute DAG's that carry transformations... In this case do i really need to connect it with datafactory or just run the DAGs in the airflow?
3) How the pricing works? I read in the documentation the pricing but not understand it would you be so kind to explain that to me? Is it based on the amount of hours the managed airflow is running?
Thank you mate, I appreciate your support!
1) well, I am not sure if I understand the question correctly but you do have to navigate from the Data Factory UI to the Airflow environment, this is how they built it.
2) Yes you can execute DAGs that carry transformations outside of Data Factory. For example you can transform data which are in an Azure Database without involving Data Factory. But you need to set up the connection to the Azure SQL database.
3)The pricing is pretty standard learn.microsoft.com/en-us/azure/data-factory/airflow-pricing you are charged based on the hour and the node size. Small node = 0.5 per hour * 24 hours = 12 dollars per day.
but why we need airflow when we can do the same work directly using adf ..could you explain please benefits of using it ?
Throwing the error workflow orchestrations manager is not allowed to Azure integration runtime. May I know what could be the reason here..?
Good explanation.. Can we schedule different pipelines from different data factory instances ?
Example:
Datafactory1_Pipeline1 ->> Datfactory2_Pipeline2
I haven't tried it yet so I can't provide a definite answer. I don't know if you can achieve it directly from the dag but certainly you can achieve it by triggering an ADF pipeline from the DAG which triggers another ADF pipeline (on a different Data Factory) using Web Activity.
I have an etl process in place in the ADF. In our team, we wanted to implement the table and views transformation and implementation with dbt core. We were wondering if we could orchestrate the dbt with Azure. If so, then how? One of the approaches I could think of was to use Azure Managed Airflow Instance. But, will it allow us to install astronomer cosmos? I have never implemented dbt this way before, so needed to know if this would be the right approach or is there anything else you would suggest me?
Unfortunately I haven't tried this approach either so I cannot tell. It seems astronomer cosmos works well with Apache airflow (github.com/astronomer/astronomer-cosmos) so in theory it should work with Azure Managed Airflow instance too. That being said, I haven't tried it, better give it a try and see.
Good content and nicely explained.could you please share how the airflow job can be triggered automatically.ij the video example you have ran the airflow job manually,how can we do it automatically
You either trigger the job manually as I did or you let the scheduler do it. When you define the DAG there is a parameter called schedule. You can specify the exact time using cron syntax if you like. See here for more details: hevodata.com/learn/trigger-airflow-dags/
Would it be simpler to use AD authentication as it guaranttes single sign on?
I have found that managed airflow requires adding drivers for ODBC connections to be made, but the CLI cannot be accessed for the installation of drivers.
Additionally, I cannot get a sight of memory allocation or add to it. When working with 6gb CSVs my dag breaks :)