Databricks CI/CD: Intro to Databricks Asset Bundles (DABs)

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 มิ.ย. 2024
  • Databricks Asset Bundles provide a way to use the command line to deploy and run a set of Databricks assets - like notebooks, Python code, Delta Live Tables pipelines, and workflows. This is useful both for running jobs that are being developed locally and for automating CI/CD processes that will deploy and test code changes. In this video I explain why Databricks Asset Bundles are a good option for CI/CD and demo how to initialize a project and setup your first GitHub Action using DABs.
    Blog post with extra examples: dustinvannoy.com/2023/10/03/d...
    * All thoughts and opinions are my own *
    References:
    Datakickstart DABs repo: github.com/datakickstart/data...
    Data & AI Summit Presentation: www.databricks.com/dataaisumm...
    Data & AI Summit Repo: github.com/databricks/databri...
    More from Dustin:
    Website: dustinvannoy.com
    LinkedIn: / dustinvannoy
    Github: github.com/datakickstart
    CHAPTERS
    00:00 Intro
    02:10 Why use Asset Bundles?
    5:45 Get started with Bundle Init
    10:58 GitHub Action deploy and run
    19:09 Outro
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 12

  • @asuretril867
    @asuretril867 6 หลายเดือนก่อน +7

    Hey Dustin,
    Really appreciate the video on DAB's , If possible can you please make a video on using DAB's for CICD with Azure Devops.
    Thanks !

  • @rum81
    @rum81 หลายเดือนก่อน

    Thank you for the session!

  • @user-vq7er5ft6r
    @user-vq7er5ft6r 7 หลายเดือนก่อน

    Thanks Dustin.

  • @isenhiem
    @isenhiem หลายเดือนก่อน

    Hello Dustin, Thank you for posting this video. This was very helpful!!! Pardon my ignorance but I have a question about initializing the Databricks bundle. The first step when you initialize the databricks bundle through CLI, does it create the required files in the databricks workspace folder. Additionally do we push the files from the databricks workspace to our git feature branch so that we can clone it to your local so that we can make the change in the configurations and push it back to git for deployment.

  • @saurabh7337
    @saurabh7337 3 หลายเดือนก่อน

    is it possible to add approvers in asset bundle based code promotion ? Say one does not want the same dev to promote to prod, as prod could be maintained by other teams; or if the dev has to do cod promotion, it should go through an approval process. Also is it possible to add code scanning using something like sonarcube ?

  • @user-lr3sm3xj8f
    @user-lr3sm3xj8f 7 หลายเดือนก่อน

    how does this work within a team with multiple projects? How do I apply multiple projects in github actions? Am I creating a bundle folder for project? Or do I have a mono folder with everything databricks in it?

    • @DustinVannoy
      @DustinVannoy  5 หลายเดือนก่อน

      You can have different subfolders in your repo each with their own bundle yaml or you could have one at a root level and import different resource yaml files. It should only deploy the assets that have changed so I tend to suggest one bundle if everything can be deployed at the same time.

  • @user-eg1hd7yy7k
    @user-eg1hd7yy7k 8 หลายเดือนก่อน

    Great Video, !
    What shoud be the best approach to switch between dev and prod inside the codes ?
    example:
    df_test.write.format('delta').mode('overwrite').saveAsTable("dev_catalog.schema.table")
    how can i parametrize this to automatically change to this:
    df_test.write.format('delta').mode('overwrite').saveAsTable("prod_catalog.schema.table")

    • @benjamingeyer8907
      @benjamingeyer8907 8 หลายเดือนก่อน +4

      environment = os.environ["ENV"]
      Attach env on the cluster level
      in the DAB
      spark_env_vars:
      ENV: ${var.ENV}

  • @NaisDeis
    @NaisDeis 7 หลายเดือนก่อน

    Hi Dustin, i want to send a dataframe with streaming logs that im listening from an eventhub and send them to log analytics, but im no recieving any data on the log analytics workspace or azure monitor? which may be the problem? do i need to create a custom table before hand? DCR or MMA? I dont know why im not getting any data or what im doing wrong...

    • @DustinVannoy
      @DustinVannoy  5 หลายเดือนก่อน

      Is this still an issue? If so, is it related to using spark-monitoring library? I have a quick mention of how to troubleshoot that towards the end of this new video: th-cam.com/video/CVzGWWSGWGg/w-d-xo.html