Connecting from Azure Databricks to Azure SQL with AAD auth (with and without service principal)

If you need to load data from Azure SQL server to Azure Databricks you have several options.

Using JDBC

Using JDBC you can simple paste username/password, however this only works with sql authentication i.e. NOT with Azure Active Directory (AAD) Auth.

A guide is available here: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/sql-databases

However, I recommend using the Spark Connector as described below

Using Spark Connector for SQL Server

This method supports AAD authentication both with and without Service Principal. It is documented (poorly) here:

https://docs.microsoft.com/en-us/sql/connect/spark/connector?view=sql-server-ver16

In both cases you will need to install this in your cluster:

com.microsoft.azure:spark-mssql-connector_2.12_3.0:1.0.0-alpha

Simply edit your cluster and under Libraries select Install new. Select maven and paste the string above and click Install. Wait for install to complete.

With Service Principal

A better guide for using it with a service principal (App registration) can be found here:

https://www.thedataswamp.com/blog/databricks-connect-to-azure-sql-with-service-principal

Example code

import adal
# Service principal version:
tenant = "a7ed0222-1883-488c-8bbb-6ee4f043da6d"

# Lacking a service principal:
service_principal_id = "REPLACE_WITH_YOUR_APP_ID"
service_principal_secret = "REPLACE_WITH_YOUR_SECRET"

servername = "hhan-sql"
jdbcHostname = f"{servername}.database.windows.net"
jdbcPort = 1433
jdbcDatabase = "sql-test" # Replace with DatabaseName
url = "jdbc:sqlserver://{0}:{1};database={2}".format(jdbcHostname, jdbcPort, jdbcDatabase)

table_name = "SalesLT.Product" # REPLACE with your table

table_name = "SalesLT.Product"

authority = f"https://login.microsoftonline.com/{tenant}"
resource_app_id_url = f"https://database.windows.net/" # Magic encantation!
context = adal.AuthenticationContext(authority)
token = context.acquire_token_with_client_credentials(resource_app_id_url, service_principal_id, service_principal_secret)
access_token = token["accessToken"]

jdbc_db = spark.read \
        .format("com.microsoft.sqlserver.jdbc.spark") \
        .option("url", url) \
        .option("dbtable", table_name) \
        .option("accessToken", access_token) \
        .option("encrypt", "true") \
        .option("hostNameInCertificate", "*.database.windows.net") \
        .load()

Without Service Principal

If you do not have a service principal, but your personal AD user has access you can instead use device code flow to gain access:

https://medium.com/analytics-vidhya/interactive-azure-ad-authentication-with-python-64df3173a81c

The important points from the above guide is this:

import adal # You can also use msal which is never and fancier ;)
tenant = "REPLACE_WITH_TENANT" # Your tenant id REPLACE
authority = f"https://login.microsoftonline.com/{tenant}" # Authority (if single tenant)
context = adal.AuthenticationContext(authority)

servername = "REPLACE_WITH_SERVER_NAME"
jdbcHostname = f"{servername}.database.windows.net"
jdbcPort = 1433
jdbcDatabase = "sql-test" # Replace with DatabaseName
jdbcUrl = "jdbc:sqlserver://{0}:{1};database={2}".format(jdbcHostname, jdbcPort, jdbcDatabase)

table_name = "SalesLT.Product" # REPLACE with your table
client_id = "04b07795-8ddb-461a-bbee-02f9e1bf7b46" # Client ID for Azure Cloud Shell (you do not have to replace this)
resource_app_id_url = f"https://database.windows.net/" # Magic encantation!

code = context.acquire_user_code(resource_app_id_url, client_id)
print(code['message'])
token = context.acquire_token_with_device_code(resource_app_id_url, code, client_id)
#token = context.acquire_token_with_client_credentials(resource_app_id_url, service_principal_id, service_principal_secret)
access_token = token["accessToken"]


jdbc_db = spark.read \
        .format("com.microsoft.sqlserver.jdbc.spark") \
        .option("url", url) \
        .option("dbtable", table_name) \
        .option("accessToken", access_token) \
        .option("encrypt", "true") \
        .option("hostNameInCertificate", "*.database.windows.net") \
        .load()

Note that this code snippet asks for a new device flow code every time. This should not be necessary. Only when your tokens expire.

Access_tokens typically expires after 70 minutes.