Complete Azure Data Factory CI/CD Process (DEV/UAT/PROD) with Azure Pipelines
ฝัง
- เผยแพร่เมื่อ 5 ต.ค. 2023
- This video goes over how to write code to package up and promote a DEV Azure Data Factory to a UAT Data Factory and PROD Data Factory.
Links:
-GitHub repo code: github.com/DataEngineeringWit...
-Data Factory automated publishing CI/CD documentation: learn.microsoft.com/en-us/azu...
-Npm Data Factory utilities package: www.npmjs.com/package/@micros... - วิทยาศาสตร์และเทคโนโลยี
I really appreciate the layout and completeness of .yaml files with Azure Pipeline definition. It gives an idea how real-live project could implement it (unlike some toy examples that are extremely popular in these kind of tutorials)
absolutely stellar! Thank you for the clear instructions and examples. This really helped me understand!!
Great efforts and explanation!!!
Thanks a lot for sharing your knowledge
Great Content!
really great work
This is awesome, I would love one for Synapse as well as things are slightly different
Yoooo thanks
How would you get this to work with Azure Databricks as a linked service. The connection details aren't parameterised so therefore can pass in values such as the databricks workspace URL
Curious why there is no displayName in the .yml powershell sections?
How do I sett pipeline arguments different in dev, uat and prod?
can we use terraform script instead of using ARM templates?
If I use all the yaml files and other supporting files, do I still need to create the flow under Release under pipeline section of Azure Devops or do I need to still create Environment?
No, you don’t need a separate release pipeline flow. The yaml files does all the CICD required for ADF
did you created that yaml files manually or using ADF
Using ADF automatically. Then you just replace your connection/linked service/global parameters in the cicd/adf-cicd template parameters files for the uat and prod environments
Can I use the Library Groups inplace of the KeyVault for subscriptions IDs or any variable which has the highest confidentiality?
Yes, you can use secret variables or library groups or whatever you’d like.
How to override global params when selecting "Include global parameters in ARM template"? Do I have to override parameters in AzureResourceManagerTemplateDeployment@3 somehow?
In my example I override them in the template-parameters.json files. For example, see the cicd/adf-cicd adf-prod-template-parameters.json and adf-uat-template-parameters.json files. I override the global parameters (default_properties_GL_STRING_value and default_properties_GL_NUMBER_value) in those files. Those are global parameters that are updated (different values) in each environment.
You can also override parameters in the AzureResourceManagerTemplateDeployment@3 (if you didn't want to use template-parameter files) using the overrideParameters input. For reference: learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/azure-resource-manager-template-deployment-v3?view=azure-pipelines
please suggest if I need to run 10 pipelines everyday at 2 pm, what is approach, should we go with schedule trigger or any other approach
Schedule trigger.
Yes I am interested to see how we can use linked templates for deployment. Could you please share it.
+1
Thanks for the comment. I've seen a couple of comments for this. Currently working on a linked template deployment for Data Factory video :)
Please! Thanks
Hi, I had created this cicd pipeline long back running fine without a single bug since deployed.
Currently arm template size is more than 4mb and getting the error. Would you please do create video for that?
Yes, you'll need to use linked templates. I've seen a couple of comments about this and currently working on a video for it :)
I have tried this but it doesn't create the Global Params in UAT/PROD. Any idea what I could be doing wrong?
Did you by chance forget to click the include global parameters button in the Manage/Arm Template section of your Dev data factory instance?
This is one of the best videos on this topic! I have one question and very high hopes that I will get my answer here.
Why is dev ADF in the picture when the build is triggered on main branch? Previously in the manual process, when the dev branch is merged with main, one would "switch" to the main branch (ie collaboration branch) in ADF to click the publish button. This ensures generation of arm templates on main branch. However, with npm package method as shown in the video has both main branch as a trigger but also referencing Dev ADF. This confused me.
The reason I would like to know this is because we have a slightly different setup. Our main goes into prod ADF. We have a feature branch for collaboration as I have more than one data engineers working on their own Dev branches. Once their codes merge into feature branch, we intend to deploy it in our test environment. Only upon successful testing, it will be merged into main which will then trigger prod deployment. Please help!
Great question. The build (packaging up the ADF code) is done from the repo (main branch in this example) where all of the ADF JSON files are. The DEV ADF /subscriptions/.../adf name in the code Validate And Generate ADF ARM Template And Scripts seems to only help set the default values/info for the artifacts (ARM template, etc.) from what I've seen.
For example, I've tested completely deleting the actual DEV ADF (with the ADF JSON files still in the repo main branch) and the pipeline still builds it. I've also tested using a random ADF DEV name that doesn't exist and it still builds it using the repo JSON files but will use the default value in the ARM Template and files as the ADF name that doesn't exist.
So whatever your collaboration branch is (the feature branch you mentioned for example), as long as you checkout that branch in your build pipeline, you should be fine as that's where the code is. Then in a different pipeline (or different stage in the same pipeline) you can checkout the main branch and deploy to PROD separately. Hope this helps.
@@dataengineeringwithnick7532 thank you so much for getting back. I'm going to try it shortly and will let you know.
@@dataengineeringwithnick7532 Starting to implement and stuck at one place. Do we really need to use the adf-uat-template-parameters.json and prod files? One thing that I liked about the previous manual process was I didn't have to worry about all the Linked Services etc details of my ADF (we've many ADLS, Lakehouse, SQL, KV). It autogenerated everything. I used to enter override values for the parameters on the ADO UI. Is it not possible here? How to achieve that by say totally not referencing these files? Thank you again! Please keep up the great work!
Could you please explain how to implement parameter replace like link service sql connection string?
Look at the cicd/adf-cicd folder, the uat and prod template parameters files is where you’d replace those values.
@@dataengineeringwithnick7532 Do i need to make variables in devops?
I dont have an UAT environment, can i use this for dev and prod only?
Yes you can. You would just remove the Deploy to UAT code in the pipeline.
@@dataengineeringwithnick7532 Thanks, did u add the variables also in the pipeline? The DevSubscriptionID and productionID?
@@mainuser98those are actually secret variables (so the values don’t get logged/show in plain text).
To create secret variables, in Azure Pipelines click on your pipeline then click edit and then variables.
Do u have discord channel?
Error: ##[warning]Can't find loc string for key: Info_GotAndMaskAuth in the pipeline
This is just a warning from the Npm@1 task and has recently been updated by the Microsoft team (just not released yet). See here: github.com/microsoft/azure-pipelines-tasks/issues/20120 and here: github.com/microsoft/azure-pipelines-tasks-common-packages/pull/345. Code update should be in the next release (github.com/microsoft/azure-pipelines-tasks-common-packages/releases). Either way, it should resolve itself and there's no code changes needed on the ADF pipeline.
How to configure the service connection for uat and prod?
In the cicd/adf-cicd folder, there’s 2 files called adf-uat-template-parameters.json and the prod one. Replace your connection strings for each environment there. See the repo link in the description to get to the files.
@@dataengineeringwithnick7532 should i add the service principal to data factory also as a member? Or just the connection strings in the files
@@dataengineeringwithnick7532 Should I do something in adf with the service principal like add it as a member? Or just the connection strings in the files. I am referring to 23:15 because you didnt show in details what we should do with the azure resource manager
@@mainuser98The Azure Resource Manager (via a service principal) would need to be an RBAC Contributor role (or enough permissions to deploy an ARM template to a resource group) at the resource group level. You don’t need to add the service principal info to any of the template parameter files.
@@dataengineeringwithnick7532 can u send me documentation how to configure this