We have a certain use case in mind and I’d like to get feedback from the community.
Historically, reports had been generated by executing queries and pasting the results into Excel - error prone, time consuming, etc.
We’ve now modeled much of the query business logic and are able to replicate the same datasets and visualizations using Looker, but are trying to understand our options for scheduling / packaging the results for delivery. Looker seems to excel as an interactive tool, but the scheduling / exporting options seem limited, and it’s not always feasible to bring every data consumer into the tool as a user (board members, executives, external clients, etc.).
For a given deliverable that we are prototyping, we’ve organized all of the content into a single space, with saved Looks (~10 now, but will scale to as many as 50), and a single saved (user defined) dashboard containing all the relevant visualizations from those Looks. I was hoping that we could bulk schedule all of the contents of the space to export (CSVs for each of the Looks, PDF for the dashboard) into a single S3 folder. If I’m being greedy, my ultimate desire would be some monthly export process such that on May 1, all files (PDF and CSVs) from the associated content (dashboard and looks) in a given space end up in a zip file together that’s dated, so we can organize and archive the historic files.
Two options that we’ve come up with - conceptually - both involve some scripting:
Approach 1: Dump exports to staging area, then re-organize files
- (Manually) schedule exports of the looks and the dashboard to an S3 bucket [Deliverable123 / Recent].
- Schedule a task (cron, leveraging AWS CLI etc.) to run after Step 1, which moves all files from [Deliverable123 / Recent] >> [Deliverable123 / 2017-05], zips the contents of [Deliverable123 / 2017-05] to a single file, and clears the contents of [Deliverable123 / Recent].
Besides having to write and maintain a script to do Step 2, the biggest downside of this approach is that management of individual schedules is still manual in the UI, so setting up anything new takes effort, they can be set up inconsistently, and things like changing a bucket name, etc. will be fairly tedious.
Approach 2: Manage schedules and destinations with Looker API
- Leverage the Looker API to manage schedules on a per-space basis (in pseudo-code, something like: for each look in this space, set the S3 destination = [Deliverable123 / 2017-05], set format = CSV, set require changed results = “no”, etc.). We could run the script each time we want to re-configure the destination (once a month or on demand)
- Have a manual or scripted process to organize all of the individual files from a given S3 bucket. S3 unfortunately doesn’t offer a “Download All” option from their UI, so this may still require some scripting.
Approach 2 likely scales better, and once we’ve gotten familiar with the Looker API, it may offer us the ability to re-generate a set of files after data replication issues, etc., without individual schedules. But at this point we don’t have familiarity with the API so there will certainly be a learning curve, and potentially 2 scripts to manage instead of 1.
Before we go too far down either of these paths, I wanted to hear if anybody had attempted or accomplished something similar, either through one of the methods above, or via some other method I haven’t considered.
Something else worth mentioning is - while we anticipate updates to Looker that will continue to bring LookML dashboards and User Defines Dashboards closer together - we have a similar desire to schedule exports of some LookML dashboards.