If you need to load data from Azure SQL server to Azure Databricks you have several options.
Using JDBC
Using JDBC you can simple paste username/password, however this only works with sql authentication i.e. NOT with Azure Active Directory (AAD) Auth.
A guide is available here: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/sql-databases
However, I recommend using the Spark Connector as described below
Using Spark Connector for SQL Server
This method supports AAD authentication both with and without Service Principal. It is documented (poorly) here:
https://docs.microsoft.com/en-us/sql/connect/spark/connector?view=sql-server-ver16
In both cases you will need to install this in your cluster:
com.microsoft.azure:spark-mssql-connector_2.12_3.0:1.0.0-alpha
Simply edit your cluster and under Libraries select Install new. Select maven and paste the string above and click Install. Wait for install to complete.
With Service Principal
A better guide for using it with a service principal (App registration) can be found here:
https://www.thedataswamp.com/blog/databricks-connect-to-azure-sql-with-service-principal
Example code
import adal
# Service principal version:
tenant = "a7ed0222-1883-488c-8bbb-6ee4f043da6d"
# Lacking a service principal:
service_principal_id = "REPLACE_WITH_YOUR_APP_ID"
service_principal_secret = "REPLACE_WITH_YOUR_SECRET"
servername = "hhan-sql"
jdbcHostname = f"{servername}.database.windows.net"
jdbcPort = 1433
jdbcDatabase = "sql-test" # Replace with DatabaseName
url = "jdbc:sqlserver://{0}:{1};database={2}".format(jdbcHostname, jdbcPort, jdbcDatabase)
table_name = "SalesLT.Product" # REPLACE with your table
table_name = "SalesLT.Product"
authority = f"https://login.microsoftonline.com/{tenant}"
resource_app_id_url = f"https://database.windows.net/" # Magic encantation!
context = adal.AuthenticationContext(authority)
token = context.acquire_token_with_client_credentials(resource_app_id_url, service_principal_id, service_principal_secret)
access_token = token["accessToken"]
jdbc_db = spark.read \
.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", url) \
.option("dbtable", table_name) \
.option("accessToken", access_token) \
.option("encrypt", "true") \
.option("hostNameInCertificate", "*.database.windows.net") \
.load()
Without Service Principal
If you do not have a service principal, but your personal AD user has access you can instead use device code flow to gain access:
https://medium.com/analytics-vidhya/interactive-azure-ad-authentication-with-python-64df3173a81c
The important points from the above guide is this:
import adal # You can also use msal which is never and fancier ;)
tenant = "REPLACE_WITH_TENANT" # Your tenant id REPLACE
authority = f"https://login.microsoftonline.com/{tenant}" # Authority (if single tenant)
context = adal.AuthenticationContext(authority)
servername = "REPLACE_WITH_SERVER_NAME"
jdbcHostname = f"{servername}.database.windows.net"
jdbcPort = 1433
jdbcDatabase = "sql-test" # Replace with DatabaseName
jdbcUrl = "jdbc:sqlserver://{0}:{1};database={2}".format(jdbcHostname, jdbcPort, jdbcDatabase)
table_name = "SalesLT.Product" # REPLACE with your table
client_id = "04b07795-8ddb-461a-bbee-02f9e1bf7b46" # Client ID for Azure Cloud Shell (you do not have to replace this)
resource_app_id_url = f"https://database.windows.net/" # Magic encantation!
code = context.acquire_user_code(resource_app_id_url, client_id)
print(code['message'])
token = context.acquire_token_with_device_code(resource_app_id_url, code, client_id)
#token = context.acquire_token_with_client_credentials(resource_app_id_url, service_principal_id, service_principal_secret)
access_token = token["accessToken"]
jdbc_db = spark.read \
.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", url) \
.option("dbtable", table_name) \
.option("accessToken", access_token) \
.option("encrypt", "true") \
.option("hostNameInCertificate", "*.database.windows.net") \
.load()
Note that this code snippet asks for a new device flow code every time. This should not be necessary. Only when your tokens expire.
Access_tokens typically expires after 70 minutes.