Thanks. Technically, I am PySpark almost exclusively but leveraging the SQL API from it. You should be able to use the PySpark API directly if you prefer not to call the SQL API. The Databricks docs should cover that.
Not much to go on there. Sounds like something is wrong with your cluster. Are you using Databricks? Also, did you create the Azure SQL database, etc.?
Hi, excellent information. Need your help. I am working on one project in which I am calling 200+ notebooks in hybrid (parallel and sequential) manner. Whenever any failure occurs at any point I am fixing that issue and running all notebooks from start. Can you please help me in building control framework which will run notebooks from failure point only
Hi Gaurav, This blog may help medium.com/analytics-vidhya/building-a-notebook-based-etl-framework-with-spark-and-delta-lake-b0eee85a8527 Building a framework is not a small task and is something I sometimes do as a consultant.
Thank you for this video, Bryan! If possible, can you share with me how you can connect Azure SQL database with Databricks via Azure Active Directory ?
Hi Tien, First, you need to make sure Active Directory Authentication has been configured in the Azure SQL Database. docs.microsoft.com/en-us/azure/azure-sql/database/authentication-aad-overview I'm not sure the Windows Authentication type access is supported by the JDBC driver for Spark. This documentation explains ways to use Azure Active Directory but looks like it requires setting up a service principal. docs.microsoft.com/en-us/sql/connect/spark/connector?view=sql-server-ver15 Alternatively, using SQL Server based credentials can be made secure by using Azure Key Vault. Here's my video on that. th-cam.com/video/geCM32t_VWE/w-d-xo.html
Bryan, I am a bit confused... So you create a local spark table (cmd8) that points to AzureSQL. Then you create a local spark_customer/adress_extrect spark table (cmd13) to persist the data based on a select statement on the earlier created local spark table? Isn't is possible to use the SalesCustomer local spark table directly to insert back in the AzureSQL Sales.CustomerExtract table? That spark_customerextract seems redundant or is the SalesCustomer local table just a reference (view) to the AzureSQL table and the data is not persisted in Spark there. Hope this question makes sense :-).
Yeah. It seems a bit unusual at first. Local Spark tables are great but they only work on Spark. Many times you need to get data from a SQL Database and you may need to store data back to a SQL Database. That's where this comes in handy. Just makes it easier to use SQL databases tables. Note: Spark is NOT a SQL Database, it just lets you read data with the SQL language. Java Database Connectivity (JDBC) translate what you ask for in Spark SQL into the backend database SQL.
@@BryanCafferky right.. you state thst you persist records locally to not disturb the source DB. Is there a limit on the amount of records you can persist? Or is thst dependant on your cluster memory config?
@@Boedie92 No limit logically. It will be stored as a parquet files on Blob storage. In my example, it uses the Databroicks workspace built-in Azure Blob storage which may have its limits. In which case, you could create an Azure Storage account to save the data to.
Very useful, thanks to that tutorial I built a much more complex project one year ago. Now it seems unity catalog is coming. Do you think that thanks to Unity Catalog, we'll be able to also view the spark tables in Databricks SQL IDE and manage access there as well?
Great! Azure SQL database is a separate service from Databricks and can be accessed independently so I don't see Databricks managing security of Azure SQL databases but perhaps can help govern Databricks usage of them. Thanks!
@@BryanCafferky I just figured out that the CREATE TABLE USING JDBC statement is not supported in Azure Databricks Unity Catalog Feature. That means it's not possible to create a pointer to the database table. Instead, data extraction is necessary =(
Thanks for the video, is it possible to do it with pyspark commands? Where can I get more documentations about this topic?
Thanks. Technically, I am PySpark almost exclusively but leveraging the SQL API from it. You should be able to use the PySpark API directly if you prefer not to call the SQL API. The Databricks docs should cover that.
Thanks Bryan for your videos!
Appreciate the appreciation and you're welcome. Glad they are helpful.
We can do read and write to azure sql table and other tables in databricks via JDBC. But how can we do update or delete on the same tables?
I had the same question.. i use pyodbc to for deletes and updates..I hope they will make it possible with just the spark connector
Do you mean the table schema or just table data?
Hi Brian, when I execute the query at @8:31. I get a Parse Exception from spark.sql. Can you tell me why?
Not much to go on there. Sounds like something is wrong with your cluster. Are you using Databricks? Also, did you create the Azure SQL database, etc.?
@@BryanCafferky I am using Databricks, and connecting to Azure Synapse DW. What do you think could be wrong with the cluster?
@@RajeshPhanindra That's outside the scope of my video. Sounds like an issue with the Databricks interface to Synapse. Good luck!
@@BryanCafferky Thanks Bryan. Your videos are very useful.
Hi, excellent information. Need your help. I am working on one project in which I am calling 200+ notebooks in hybrid (parallel and sequential) manner. Whenever any failure occurs at any point I am fixing that issue and running all notebooks from start. Can you please help me in building control framework which will run notebooks from failure point only
Hi Gaurav, This blog may help medium.com/analytics-vidhya/building-a-notebook-based-etl-framework-with-spark-and-delta-lake-b0eee85a8527
Building a framework is not a small task and is something I sometimes do as a consultant.
Thank you for this video, Bryan! If possible, can you share with me how you can connect Azure SQL database with Databricks via Azure Active Directory ?
Hi Tien, First, you need to make sure Active Directory Authentication has been configured in the Azure SQL Database.
docs.microsoft.com/en-us/azure/azure-sql/database/authentication-aad-overview
I'm not sure the Windows Authentication type access is supported by the JDBC driver for Spark. This documentation explains ways to use Azure Active Directory but looks like it requires setting up a service principal.
docs.microsoft.com/en-us/sql/connect/spark/connector?view=sql-server-ver15
Alternatively, using SQL Server based credentials can be made secure by using Azure Key Vault. Here's my video on that.
th-cam.com/video/geCM32t_VWE/w-d-xo.html
Hope that helps!
Bryan, I am a bit confused...
So you create a local spark table (cmd8) that points to AzureSQL.
Then you create a local spark_customer/adress_extrect spark table (cmd13) to persist the data based on a select statement on the earlier created local spark table?
Isn't is possible to use the SalesCustomer local spark table directly to insert back in the AzureSQL Sales.CustomerExtract table? That spark_customerextract seems redundant or is the SalesCustomer local table just a reference (view) to the AzureSQL table and the data is not persisted in Spark there.
Hope this question makes sense :-).
Yeah. It seems a bit unusual at first. Local Spark tables are great but they only work on Spark. Many times you need to get data from a SQL Database and you may need to store data back to a SQL Database. That's where this comes in handy. Just makes it easier to use SQL databases tables. Note: Spark is NOT a SQL Database, it just lets you read data with the SQL language. Java Database Connectivity (JDBC) translate what you ask for in Spark SQL into the backend database SQL.
@@BryanCafferky right.. you state thst you persist records locally to not disturb the source DB. Is there a limit on the amount of records you can persist? Or is thst dependant on your cluster memory config?
@@Boedie92 No limit logically. It will be stored as a parquet files on Blob storage. In my example, it uses the Databroicks workspace built-in Azure Blob storage which may have its limits. In which case, you could create an Azure Storage account to save the data to.
@@BryanCafferky thanks for the clarity and thanks for your videos, quality content!
@@Boedie92 YW. Glad it helps!
Very useful, thanks to that tutorial I built a much more complex project one year ago.
Now it seems unity catalog is coming. Do you think that thanks to Unity Catalog, we'll be able to also view the spark tables in Databricks SQL IDE and manage access there as well?
Great! Azure SQL database is a separate service from Databricks and can be accessed independently so I don't see Databricks managing security of Azure SQL databases but perhaps can help govern Databricks usage of them. Thanks!
@@BryanCafferky I just figured out that the CREATE TABLE USING JDBC statement is not supported in Azure Databricks Unity Catalog Feature. That means it's not possible to create a pointer to the database table. Instead, data extraction is necessary =(
@@arnoldsouza326 Thanks for the update on that. I wonder if they may add support later.
What is the command to access delta table in pyspark?
just use spark.sql(). That does not relate this video though?