Databricks CI/CD: Azure DevOps Pipeline + DABs

Dustin Vannoy

มุมมอง 5 045

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 1 ธ.ค. 2024

ความคิดเห็น • 16

@gangadharneelam3107 2 หลายเดือนก่อน ⁺¹
Hey Dustin,
We're currently exploring DABs, and it feels like this was made just for us!😅
Thanks a lot for sharing it!
@benjamingeyer8907 3 หลายเดือนก่อน
Now do it in Terraform ;)
Great video as always!
@DustinVannoy 3 หลายเดือนก่อน ⁺¹
🤣🤣 it may happen one day, but not today. I would probably need help from build5nines.com
@moncefansseti1907 หลายเดือนก่อน
Hey Dustin, if we want to add more ressources like adls bronze silver and gold storage do we need to add it to the envi variables?
@thusharr7787 3 หลายเดือนก่อน
Thanks, one question I have some metadata files in the project folder, I need to copy this to a volume in Unity catlog. Is it possible through this deploy process ?
@DustinVannoy 3 หลายเดือนก่อน
Using Databricks CLI path, you can have command that copies data up to volume. Replace all the curly brace { } parts with your own values.
databricks fs cp --overwrite {local_path} dbfs:/Volumes/{catalog}/{schema}/{volume_name}/{filename}
@albertwang1134 3 หลายเดือนก่อน
I am learning DABs at this moment. So lucky that I found this video. Thank you, @DustinVannoy. Do you mind if I ask a couple of questions?
@DustinVannoy 3 หลายเดือนก่อน
Yes, ask away. I'll answer what I can.
@albertwang1134 3 หลายเดือนก่อน
Thank you, @@DustinVannoy. I wonder whether the following development progress does make sence. And if there any thing we could improve it.
Background:
(1) We have two Azure Databricks workspaces, one is for development, one is for production.
(2) I am the only Data Engineer in our team, and we don't have dedicate QA. I am responsible to development and test. Who consume the data will do UAT.
(3) We use Azure DevOps (repository and pipelines).
Process:
(1) Initialization
(1.1) Create a new project by using `databricks bundle init`
(1.2) Push the new project to Azure DevOps
(1.3) On development DBR workspace, create a GIT Folder under `/Users/myname/` and link to the Azure DevOps repository
(2) Development
(2.1) Create a feature branch on DBR workspace
(2.2) Do my development and hand test
(2.3) Create a unit test job and the scheduled daily job
(2.4) Create a pull request from the feature branch to the main branch on DBR workspace
(3) CI
(3.1) An Azure CI pipeline (build pipeline) will be trigerred after the pull request is created
(3.2) The CI pipeline will check out the feature branch, and do `databricks bundle deploy` and `databricks bundle run --job the_unit_test_job` on the development DBR workspace by using Service Principal.
(3.3) The test result will show on the pull request
(4) CD
(4.1) If everything looks good, the pull request will be approved
(4.2) Manually trigger an Azure CD pipeline (release pipeline). Checkout the main branch, do `databricks bundle deploy` to the production DBR workspace by using Service Principal
Explanation:
(1) Because we are a small team and I am the only person who works on this, we do not have a `release` branch to simply the process
(2) Due to the same reason, we also do not have a staging DBR workspace
@DustinVannoy 3 หลายเดือนก่อน ⁺¹
Overall process is good. It’s typical not to have a separate QA person. I try to use yaml pipeline for the release step so code would look pretty similar to what you use to automate deploy to dev. I recommend having unit tests you can easily run as you build which is why I try to use Databricks-connect to run a few specific unit tests at a time. But, running workflows on all-purpose or serverless isn’t too bad an option for quick testing as you develop.
@albertwang1134 2 หลายเดือนก่อน
Hi Dustin, have you tried to configure and deploy a single node cluster by using Databricks Bundle?
@DustinVannoy 2 หลายเดือนก่อน
Yes, it is possible. It looks something like this:
job_clusters:
- job_cluster_key: job_cluster
new_cluster:
spark_version: 14.3.x-scala2.12
node_type_id: m6gd.xlarge
num_workers: 0
data_security_mode: SINGLE_USER
spark_conf:
spark.master: local[*, 4]
spark.databricks.cluster.profile: singleNode
custom_tags: {"ResourceClass": "SingleNode"}
@albertwang1134 2 หลายเดือนก่อน
@@DustinVannoy Thanks a lot! This cannot be found in the Databricks documentation.
@unilmittakola 2 หลายเดือนก่อน
Hey Dustin,
We're currently implementing data bricks asset bundles using Azure DevOps to deploy workflows. The bundles we are using storing it in the GitHub. Can you please help me with the YAML script for it.
@fb-gu2er 2 หลายเดือนก่อน
Now do AWS 😂
@DustinVannoy 2 หลายเดือนก่อน
Meaning AWS account with GitHub Actions? If not, what combo of tools are you curious about for the deployment?

ต่อไป

เล่นอัตโนมัติ

Databricks CI/CD: Intro to Databricks Asset Bundles (DABs)