Creating a Connection to Amazon Athena

athena

(Todd Nemet) #1

As of version 4.6, Looker supports Amazon Athena, an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage. You are charged only for the queries that are run.

This article describes how to connect to an Amazon Athena instance.

  1. Before starting, make sure that you have:
  • A pair of Amazon AWS access keys

  • An S3 bucket
    The Amazon AWS access keys must have read-write access to this bucket.

  • Knowledge of where your Amazon Athena instance data is located. The Region Name can be found in the upper right hand portion of the Amazon console as shown below.

  1. In Looker, go to Admin —> Connections —> New Database Connection.
    Looker displays the form for specifying database connection details

  2. Fill out the database connection details:

  • Name
    Specify the name of the connection and how it will be referred to in LookML projects.

  • Dialect
    Select Amazon Athena.

  • Host:Port
    Specify the name of the host and port. As described in the Athena documentation on the JDBC URL format, the host should be a valid Amazon endpoint (like athena.eu-west-1.amazonaws.com), and the port should stay at 443. An up-to-date list of endpoints that support Athena can be found here.

  • Database
    Specify the default database that you would like modeled. Other databases can be accessed, but Looker treats this database as the default database.

  • Username
    Specify the AWS Access key ID.

  • Password
    Specify the AWS Secret access key.


    For example, the following settings set up an Athena connection using the US East data center:


  1. In the last field, Additional Parameters, you must configure the bucket that was used for staging by using the s3_staging_dir parameter:

        s3_staging_dir=s3://staging-bucket/

    where staging-bucket is the name of the staging bucket. See the Athena documentation on JDBC Driver Options for more information and a list of all available JDBC driver options.

  2. Modify the following options as desired or leave them at their default values:

  • Max connections
    By default, the max connections is set to 5 because this is the maximum number of concurrent queries to Amazon Athena per account. See the Athena service limits documentation for more details about the service limits. See this page for more information about the Max Connections field.

  • Connection Pool Timeout
    Specify the connection pool timeout. By default, the timeout is set to 120 seconds. See this page for more information on the Connection Pool Timeout field.

  • Database Time Zone
    Specify the timezone used by the dates in the database. Leave this field blank if you do not want timezone conversion. See this page for more information on the Database Time Zone field.

  • Query Time Zone
    The desired timezone for Looker queries. Leave this field blank if you do not want timezone conversion. See this page for more information about the Query Time Zone field.

Resources:

Debugging

Amazon provides log_level and log_path JDBC driver options for debugging connections. To use them, add &log_level=DEBUG&log_path=/tmp/athena_debug.log to the end of the Additional Params field and test the connection again.

If Looker is hosting the instance, then support or your analyst will need to retrieve this file to continue debugging.

Feature Support

Looker’s ability to provide some features depends on the database dialect’s ability to support that functionality. In the current Looker release, Amazon Athena supports the following Looker features:


AWS Athena support
Looker 4.6 Release Notes
(Justin Burnham) #2

Hi,

Are there any plans to support using IAM Roles to connect to Athena instead of having to pass a hardcoded access key and secret key? Our machines running Looker run on EC2 with a role that already gives them access to Athena and it would be nice and safer if Looker could just use that.

Thanks


(Morgan Imel) #3

Hi Justin,

It looks like you already got your answer from chat, but for the benefit of everyone else, this isn’t something we support right now but we’ve passed the feedback along!


(kenneth.vinson) #4

Just added the “Feature Support” section to this article.


(Openbridge) #5

Since Looker has native connections to AWS Athena you may be interested in how best to leverage Athena and Looker. Using a columnar storage format is ket to optimized performance and lowers per query costs. If you are looking to get started with Athena, this may be of interest. It has automated the setup and optimization of data pipelines for Looker users, including converting data to Apache Parquet format.

https://blog.openbridge.com/aws-athena-automated-60-second-setup-zero-administration-and-automatic-optimization-eba474e9897a)

If there is any interest, please let me know.


(brettg) #6

In looker version 5.20, the Athena driver has been upgraded to use the Athena JDBC API 4.2 (v2.0.2). The extra jdbc params setup in the looker connection will need to be updated to conform to the new connection URL construction:

s3_staging_dir=s3://staging-bucket&log_level=DEBUG&log_path=/tmp/athena_debug.log

Would become:

s3_staging_dir=s3://staging-bucket;LogLevel=DEBUG;LogPath=/tmp/athena_debug.log

See here for more details if needed to update custom JDBC configs or other specifics to your connection configuration.


(brettg) #7

Update to the comment above - The athena driver v2.0.5 is used in looker v5.22+. If you have the right policy attached to your athena user, you can add ;UseResultsetStreaming=1 to the end of your additional params in the connection setup to significantly improve the performance of large result set extraction (this parameter is set to 0 by default).