Databricks CI/CD: Intro to Databricks Asset Bundles (DABs)
ฝัง
- เผยแพร่เมื่อ 12 มิ.ย. 2024
- Databricks Asset Bundles provide a way to use the command line to deploy and run a set of Databricks assets - like notebooks, Python code, Delta Live Tables pipelines, and workflows. This is useful both for running jobs that are being developed locally and for automating CI/CD processes that will deploy and test code changes. In this video I explain why Databricks Asset Bundles are a good option for CI/CD and demo how to initialize a project and setup your first GitHub Action using DABs.
Blog post with extra examples: dustinvannoy.com/2023/10/03/d...
* All thoughts and opinions are my own *
References:
Datakickstart DABs repo: github.com/datakickstart/data...
Data & AI Summit Presentation: www.databricks.com/dataaisumm...
Data & AI Summit Repo: github.com/databricks/databri...
More from Dustin:
Website: dustinvannoy.com
LinkedIn: / dustinvannoy
Github: github.com/datakickstart
CHAPTERS
00:00 Intro
02:10 Why use Asset Bundles?
5:45 Get started with Bundle Init
10:58 GitHub Action deploy and run
19:09 Outro - วิทยาศาสตร์และเทคโนโลยี
Hey Dustin,
Really appreciate the video on DAB's , If possible can you please make a video on using DAB's for CICD with Azure Devops.
Thanks !
Thank you for the session!
Thanks Dustin.
Hello Dustin, Thank you for posting this video. This was very helpful!!! Pardon my ignorance but I have a question about initializing the Databricks bundle. The first step when you initialize the databricks bundle through CLI, does it create the required files in the databricks workspace folder. Additionally do we push the files from the databricks workspace to our git feature branch so that we can clone it to your local so that we can make the change in the configurations and push it back to git for deployment.
is it possible to add approvers in asset bundle based code promotion ? Say one does not want the same dev to promote to prod, as prod could be maintained by other teams; or if the dev has to do cod promotion, it should go through an approval process. Also is it possible to add code scanning using something like sonarcube ?
how does this work within a team with multiple projects? How do I apply multiple projects in github actions? Am I creating a bundle folder for project? Or do I have a mono folder with everything databricks in it?
You can have different subfolders in your repo each with their own bundle yaml or you could have one at a root level and import different resource yaml files. It should only deploy the assets that have changed so I tend to suggest one bundle if everything can be deployed at the same time.
Great Video, !
What shoud be the best approach to switch between dev and prod inside the codes ?
example:
df_test.write.format('delta').mode('overwrite').saveAsTable("dev_catalog.schema.table")
how can i parametrize this to automatically change to this:
df_test.write.format('delta').mode('overwrite').saveAsTable("prod_catalog.schema.table")
environment = os.environ["ENV"]
Attach env on the cluster level
in the DAB
spark_env_vars:
ENV: ${var.ENV}
Hi Dustin, i want to send a dataframe with streaming logs that im listening from an eventhub and send them to log analytics, but im no recieving any data on the log analytics workspace or azure monitor? which may be the problem? do i need to create a custom table before hand? DCR or MMA? I dont know why im not getting any data or what im doing wrong...
Is this still an issue? If so, is it related to using spark-monitoring library? I have a quick mention of how to troubleshoot that towards the end of this new video: th-cam.com/video/CVzGWWSGWGg/w-d-xo.html