Automating Frequently Changing Schemas with AWS Lambda

webhook
platform
git

(Bryan Weber) #1

Why would you need this article?

When you have tables and table schemas that are constantly in flux, but still want your LookML models to be kept up to date with the changing schemas, this article is for you. The following are basic instructions on how to deploy an AWS Lambda function and set up your Looker instance to poll an initiate changes to your LookML model through the Lambda function.

How long will this take to implement?

With no modifications and access to the necessary components listed below in Requirements, setup takes less than 15 minutes. Then your LookML will always be kept in sync with your frequently changing tables. :boom:

Note:

This is intended to be a basic generation of LookML. The package is open sourced to make modifications as needed and instructions to do so are in development. Modifications may include more advanced LookML generation logic or deploying on other serverless platforms like Google Cloud Functions, Azure Functions or IBM OpenWhisk

This is in active development, please submit any issues or requests on the project’s GitHub.

How are we doing this?

  1. Looker will poll your information schema filtered on a table through its scheduler
  2. When Looker detects a change in schema, it will fire a webhook to the Lambda function
  3. The webhook will contain column names, types and comments for the given table as well as what LookML project to update and what filename it should store it as
  4. The Lambda function will generate the LookML based off the query results from the webhook Payload
  5. The Lambda function will use GitHub’s API to push the file to the master branch of the repository for the project.
  6. The Lambda function will use the Looker Deploy Webhook for the project to pull changes to your Looker instance from remote.
  7. Your Looker Explore built off of the auto-generated files will be updated in production.

Process Flow

Requirements

  • An AWS account with the ability to deploy a Lambda application, Create an API Gateway and create a limited IAM role.
  • A Looker instance with the ability to create a new model and retrieve instance information
  • A GitHub repository connected to your Looker instance
  • A GitHub Personal API Token that can push to master on the attached repository

Deploy the Lambda Application

Create the GenerateLookML Lambda function from the AWS Serverless Application Repository

Click Here to Create from the AWS Serverless Application Repository

Deploy the Lambda function through the AWS Console. This will create a CloudFormation Stack with the following:
Lambda Function
Environment Variable in the Lambda function called project_config
An API endpoint allowing a POST Method
An IAM Role to access the necessary resources

You can check the resources granted and what permissions were granted on the CloudFormation Page for the Stack

After its deployed, open up the GenerateLookML Lambda function in the console. We will need the API endpoint to put into Looker. But first we need to modify the environment variable.

Setting up the project_config

This section has details on the necessary values to update the project_config environment variable. Example of the deployed default is here and a full example is here.

In the console for the function, you can find the environment variable here:

Gather Looker Instance Information

Outgoing Webhook Token

Found in your Looker Admin Panel in Admin > General > Settings. This is top-level key in the project_config.

If you have multiple Looker instances, you can put in each one as a key.

base_url

Found in your Looker Admin Panel in Admin > General > Settings

Gather Looker Project Information

This function supports many projects on the same instance by providing an array of project metadata.

project_id

This is the name of the project as seen in the Develop dropdown.

repository

The GitHub repository that is attached to the project_id. This is in the format of organization/repository. If you need to find out what it is, you can find it in the project settings Remote Origin URL. For example: https://customer.looker.com/projects/<project_id>/edit

(optional) Webhook Deploy Secret

You can have Looker validate headers of incoming deploy webhooks by providing the project metadata with a value for X-Looker-Deploy-Secret. If you have it enabled in your project settings (https://customer.looker.com/projects/<project_id>/edit), then place the value of it in this key.

Github Personal API Token

The Lambda function communicates with Github’s API and a Personal API Token is required to make modifications to the repository. Details on obtaining an API token are here. The user must have the ability to commit and push to master on the repository.

Update the Lambda Function’s Environment Variable

Using the information collected above, update the project_config environment variable, an example of a completed config is here.

Deploy the LookML model and Scheduled Webhook

Now that everything is setup on the AWS side, we can focus on the Looker side. First we need to create an explore for querying the information schema, then we need to schedule Looker to run it periodically and fire off the webhook.

Deploy the model

Copy to a project

Place generate_lookml.model.lkml in its own project or any currently deployed project. Then deploy the new model to production.

Note: Different dialects have different datatypes, please validate dimension: type_convert {...}: Link to GitHub here. Make necessary changes to map to your databases datatypes to Looker datatypes.

Add a Model Configuration

You will need to tell Looker that this model is allowed to query a connection. To do that you need to Add the Model Configuration

Create the Webhook

Explore a Table

Once you’ve deployed to production and added a model configuration, you can explore the dataset by going to this link. https://<your host>/explore/generate_lookml/schema_table_search?fields=schema_table_search.required*

Save and Schedule

Enter your schema and table into the Explore filters and validate the results, you can Save and Schedule the Explore from the gear icon in the top right. Give it a Title and put it in a Space.

Schedule the webhook

The Looker scheduler modal will open and there’s a few changes you will need to make.

  1. Change the delivery method to Webhook.

  2. In the Webhook URL, you are going to use the API Endpoint provided in AWS found within the function.

  3. Add query parameters to your webhook URL; the Lambda function requires two - project_id and filename. Your Webhook URL will look like this:
    https://<your API slug>.amazonaws.com/Prod/schema_table_search?project_id=thelook&filename=users_autogen

  4. Change format data as in the scheduler to JSON - Simple

  5. Set up your trigger schedule, you can have Looker poll every five minutes if you wish.

  6. Open Advanced Options and change settings for Send this schedule if to there are results and check the box and results changed since last run. This will have Looker only send the webhook if the results of the SQL query have changed.

  7. Send a Test before saving. You can check for success in Admin > Scheduled Jobs to see success or failure messages. You can also navigate to the project and check for the new .view.lkml in your LookML.

Example Scheduler:

You can now duplicate this schedule in Looker’s UI for as many combinations of tables/files you want to be updated. Change the schema/table name in filters and filename/project_id in the webhook url for new table/file pairs.

Embellishing the Auto-Generated LookML

Now that you have a LookML file that’s being auto-generated, you will want to make use of it. You will more than likely want to add dimensions and measures, or add properties to auto-generated LookML dimensions. In either case you will make use of Looker’s ability to extend view files. You can check out examples of extended views here.