Update(write-back) data on BigQuery from Looks?

actions
bigquery

(Krishna Potluri) #1

Is there a way to let users update data on DB right from looks?
I read about actions here but I found it very abstract.
Can someone please explain the process for this?

Thanks,
Krish.


Export the Results of a Looker Query to BigQuery
(jesse.carah) #2

Hey Krishna,

I’ve been able to write data from Looker to BigQuery using both Data Actions as well as the Looker Action Hub. In either case, you’ll need to push data from Looker to some middleware that will interpret the webhook from Looker and perform the necessary operations to then stream the data to BigQuery.

Luckily, Google has a great service called Google Cloud Functions that makes this really easy. Like AWS’s Lambda, Cloud Functions let you deploy code that gets executed based off of some event. With a data action, you can push JSON containing data from Looker as well as user-defined form parameters to a Cloud Function endpoint. The Cloud Function then parses the JSON, extracts the relevant values, and calls on the BigQuery SDK to stream the results to BigQuery.

Here’s a quick overview of how to use Cloud Functions to stream data from Looker to BigQuery. In this example, we’ll create a data action and cloud function that lets an end user persist an annotation to BigQuery:

Create the Data Action

In this example, we’re going to attach a data action to field, and allow end-users to mark whether or not a name is a cool name.

 dimension: name {
    type: string
    sql: ${TABLE}.name ;;
    action: {
      label: "Cool Name?"
      url: ""

      param: {
        name: "name"
        value: "{{ value }}"
      }
      form_param: {
        name: "annotation"
        type: select
        label: "Cool name?"
        default: "No"
        description: "Do you think that this name is a cool name?"
        option: {
          name: "No"
        }
        option: {
          name: "Yes"
        }
      }
    }
  }

Note: We’re going to leave the url blank for now. Once we’ve spun up the cloud function we’ll paste the endpoint in.

Configure the Cloud Function

  1. Follow the first three steps here to select your GCP project and enable API access for Cloud Functions and BigQuery.
  2. Navigate to https://console.cloud.google.com/functions and ensure that you’ve selected the same project in which BigQuery resides.
  3. Click Create Function, and give it a name. Select a memory allocation (in my experience, you can select the minimum for this type of operation).
  4. Select HTTP as your trigger
  5. Select your preferred runtime (for this example, I will use Python 3.7, but versions of Node.js are also supported).

Creating your Cloud Function

We’re now going to write a simple Python function that writes the user selected annotation to BigQuery, and place it in main.py

import google.cloud.bigquery as bigquery
import datetime
import time

def annotation(request):
    r = request.get_json() # Fetch the data action JSON
    
    client = bigquery.Client()
    dataset_id = '' # Replace with name of the BQ dataset 
    table_id = ''  # replace with your table ID
    table_ref = client.dataset(dataset_id).table(table_id)
    table = client.get_table(table_ref)  # API request

    # request variables
    name = r['data']['name']
    annotation = r['form_params']['annotation']
    
    # system variables
    sys_time = int(time.time()) 
    
    row_to_insert = [
            (
             name, 
             annotation,
             datetime.datetime.fromtimestamp(sys_time).strftime('%Y-%m-%d %H:%M:%S')
            )
        ]
    row = client.insert_rows(table, row_to_insert)  # API request to insert row
    return '{"looker": {"success": true,"refresh_query": true}}' # return success response to Looker

Additional things to configure:

  • The ‘Function to Execute’ is annotation
  • Make sure to include a line for google-cloud-bigquery==1.5.0 in requirements.txt
  • Click the ‘Trigger’ tab and copy the URL. Paste this in the action that you setup in the first step.

The End Result:

Caveats:

  • Right now, this function is open to the internet, so anyone with your Cloud Function URL can write data to your BigQuery instance. Consider adding in some checks/validation that the request is coming from Looker.
  • For a more comprehensive and secure approach, consider adapting this using the Action Hub framework. I can provide more detail on this if you’re interested.

Cheers!
Jesse


(Krishna Potluri) #3

Thank you so much Jesse!
Going to try this now!


(jesse.carah) #4

Yeah, let me know how it goes! Happy to discuss further.


(Dimitri Masin) #5

Is it possible to push all the data this way into BigQuery? This would be super useful to create custom segments of users for example.


(Krishna Potluri) #6

Hello Jesse!
I followed the exact process, but I am seeing an error on Cloud Functions:

NameError: name ‘bigquery’ is not defined

I included google-cloud-bigquery==1.5.0 in requirements.txt but still am seeing the same error.

EDIT:
Fixed this by adding “from google.cloud import bigquery”


(jesse.carah) #7

Hey Krish, good catch. It looks like I didn’t copy over the first line of the cloud function when I pasted it in. Did you get the action to work?


(jesse.carah) #8

Dimitri – yes, that is totally possible, but we’ll need to leverage the Action Hub framework to push all results to BigQuery. I’ll take that as a challenge and try to get you a working example sometime next week.


(Krishna Potluri) #9

Hey Jesse,
Yes I finally made it work. But there is one issue (which I am trying to fix).
Data is being duplicated whenever I try to update. Any idea why that’s happening?

Capture
I tried updating rows with ids 8 and 10, and you can see data being duplicated


(Dimitri Masin) #10

Amazing, thank you @jesse.carah! Looking forward to it.


(jesse.carah) #11

Hey Krish,

Check out this discussion about the append-only nature of BigQuery.

I think the best move here is to have your Cloud Function insert a timestamp when each record is created, and create a view that selects only the most recent record. One approach to doing that is explained here.

Cheers!
Jesse


(Krishna Potluri) #12

Thank you, Jesse!
I now have a much better picture of this!


(jesse.carah) #13

Hey Dimitri,

I just wanted to update you that I’m making progress, but my goal to getting a working POC up and running this week was perhaps too bold.

In the meantime, could you elaborate on the specific use-case? What sort of data are you trying to push back to BigQuery?

Cheers,
Jesse


(Dimitri Masin) #14

@jesse.carah Whenever people construct a view which represents a segment, e.g. “all user_ids who did xyz and live in London”. I would like to enable them to save this list of user_ids as a segments so that it can be used in other dashboards/analysis. The only missing part is to be able to save the output of an explore into an existing BigQuery table. This way everyone would be able to create custom (arbitrary complicated) segments. Hope that makes sense!?

I think this a very common analytical pattern in general!


(jesse.carah) #15

@Dimitri_Masin – I got this working :slight_smile: . I’m going to make a post this afternoon describing how to get this up and running.


(jesse.carah) #16

@Dimitri_Masin check out the new post here. Let me know if you have any questions.