Databricks offers secure and robust JDBC connectivity to Spark 2.x cluster for Professional and Enterprise Tier Databricks customers. (For Spark 1.x connectivity, see the resources section at the end of this article.)
There are several advantages of using this method for connecting to Databricks Spark clusters:
- The connection is secured using SSL
- There is no need to determine and specify the master node of the cluster each time it is started
###Steps for creating a connection
Additional detail can be found in the Databricks documentation
Step 1: Determine the Spark cluster
After creating the cluster, use the Databricks UI to find the JDBC/ODBC settings under Advanced settings.
It will look something like this:
Make a note of the following:
- Server hostname
- HTTP path
Step 2: Fill out the connection settings
In Looker, navigate to Admin > Connections > New Database Connection
Fill out the fields in the connection page according to the instructions below and using the image below as an example:
Specify the name of the connection and how it will be referred to in LookML projects.
Select Apache Spark 2.0.
Specify the server hostname and port from the previous step.
Specify the default database that you would like modeled. Other databases can be accessed, but Looker treats this database as the default database.
A Databricks username login.
The Databricks password associated with that login.
Persistent Derived Tables
Check this and specify a Temp Database if you are going to use the Persistent Derived Tables feature in Looker.
This is important for accurately specifying the Databricks connection. There are three parameters (plus
;as a delimiter) that need to be specified. Concatenate them together on the Additional Params line.
####Step 3: Modify the following options as desired or leave them at their default values
By default, the max connections is set to 30. See this page for more information about the Max Connections field.
Connection Pool Timeout
Specify the connection pool timeout. By default, the timeout is set to 120 seconds. See this page for more information on the Connection Pool Timeout field.
Database Time Zone
Specify the timezone used by the dates in the database. Leave this field blank if you do not want timezone conversion. See this page for more information on the Database Time Zone field.
Query Time Zone
The desired timezone for Looker queries. Leave this field blank if you do not want timezone conversion. See this page for more information about the Query Time Zone field.
####Step 4: Test the connection settings
Use the Test These Settings button to make sure that Looker can connect through Databricks’s infrastructure. If successful, it will display a JDBC URL similar to the one shown in the JDBC settings for your cluster.
####Step 5: Save the connection settings
Add Connections will create the connection with this configuration. It will also run additional tests against the Spark cluster.
If successful, it will display something similar to this: