Lookml-tools: better Looker code, user experience, and data governance

Weight Watchers are pleased to announce https://github.com/ww-tech/lookml-tools, a new toolkit that our customer intelligence engineering team is open-sourcing to help the Looker community — especially LookML developers — write cleaner, more consistent code, deliver a better end user experience, and enhance data quality and governance.

The library contains three tools:

  • Linter: a tool that checks LookML against a suite of coding and styling rules.

  • Grapher : a tool that generates a “network diagram” of the model-explore-view relationships and highlights any orphaned files.

  • Updater : a tool that takes a master source of metrics definitions, compares that with what is present (or not) in LookML, and, where necessary, injects or updates the correct definitions into the LookML. Finally, it creates a Github pull request. As such, the definitions flow from master to end user in a controlled and consistent manner without having to manually sync descriptions among multiple systems.

Let’s dig into each of these components.


The WW LookML Style guide, which (excluding declutter) is evaluated and quantified using our lookml-tool s linter.

Linter

To provide a consistent Looker user experience, we developed a LookML style guide. This provides rules for user-facing aspects such as naming conventions and how to ensure that users can navigate and explore the data effectively, as well as best practices under the hood. This is good for both end users as well as the engineering and analytics staff who develop the LookML models, views, and dashboards.

How do we evaluate LookML and flag any infractions? We developed a sweet and simple Python linter that runs these checks periodically and writes its finding — a report of which LookML dimensions, measures, or files passed or failed the rules — back into Looker itself, where they can be measured, dashboarded, and alerted. While we implemented 11 rules into the style guide (the declutter rule is difficult to define and alert against), we also made sure that the framework is very easy to extend if we want to create new rules. In fact, most rules are just 3–5 lines of code that evaluate whether a chunk of LookML is relevant for this rule and whether it passed. As such, we hope that the community can contribute to a larger suite of rules that we can all use.

Grapher

As developers write LookML code, they create different, interrelated files: models that define data sources, views that expose the dimensions and measures for some data source, and explores that group views into a logical unit, often around a data source. For instance, perhaps the model defines a Google Analytics data source, and one view focuses on session data while another covers referrals, and the explore file groups sessions and referral views into a single GA-focussed unit.

When you have multiple developers building out the LookML, especially if they are new to a data source or to LookML development, you might test out ideas and start to lose the forest for the trees: it becomes hard to keep track of the bigger picture and relationships among these files. This is where the grapher comes in. This tool parses the LookML files and produces a network diagram to show the relationships.

In the example below, we can see the relationships between the models (blue), the explores (green), and the views (purple) as well as code reuse. Moreover, you can spot a single orange dot at the top. This is an orphan file, a view that is not referenced by any explore and represents dead code that can be removed from the repository.


An example network diagram showing the relationships among the models (blue), the explores (green), and the views (purple). The single orange node at the top is an orphan, a view not referenced by any explore.

Updater

Data is higher quality if it is defined. When Looker users mouse over a dimension or measure, a description of that term will pop up and help them understand what the metric means but only if a description is included in the LookML . We can check for that with our linter. However, where should those definitions originate?

While developers could add those into the LookML directly and manually themselves, they will inevitably get out of sync with other systems. Data is higher quality if it is consistently defined across systems, and many companies have multiple business intelligence tools. Thus, we wanted to create a system that took definitions from a master source and injected or updated them in the LookML automatically, creating a pull request for the LookML repository admins to review. This is the updater component of lookml-tools. As such, it solves both the syncing and consistency problem, enhancing data governance.


The updater tools take master reference definitions (in this example, from a data catalog), cross reference them with descriptions in the LookML, and update or inject the correct definition into LookML, creating a pull request for review.

In this case, we use a data catalog as our master source, and the updater queries from that master database to obtain the list. We maintain a mapping table from those master definitions to individual dimensions and measures, and then the updater parses the LookML repo. You don’t have to use a data catalog, you can just maintain a list of master definitions as a CSV file and lookml-tools can use that. The key is that you are freeing developers of a tedious but important task and delegating it to a machine: the code runs periodically in Docker.

We are excited to release lookml-tools:

We hope that others find this useful, and we would love feedback, suggestions, and hopefully contributions. Create a pull request!

@Carl_Anderson

This is a cross post from the original WW post at:

9 Likes

Hey @Carl_Anderson,

This is so good! I just wanted to add a comment about some issues I had getting this working (the grapher specifically) on a Mac, in case anyone else looking at this has similar problems.

Firstly I had an issue with the command pip install -r requirements.txt which was falling over when it got to installing pandas. I couldn’t work out the issue (and it could well be this laptop as it’s a pool laptop and I didn’t set up python myself). Changing the pandas line in requirements.txt to pandas==0.24.0 and re-running the dependency installation solved this one.

Second, I also had to install an older version of lookml-parser (this issue is tracked on github here: https://github.com/ww-tech/lookml-tools/issues):
npm install -g lookml-parser@4.0.0

Finally I got this error running the grapher:
FileNotFoundError: [Errno 2] "dot" not found in path.

It turns out this was because graphviz and its dot executable weren’t installed, so I resolved this by installing graphviz via Homebrew:
brew install graphviz

Now I’m able to run the grapher and the only problem I have is we have waaaaaay too many files to see what’s going on in the graph… :upside_down_face:

Thanks for the great tools & keep up the good work!

1 Like

@simon_onfido thanks so much for the detailed info. I will incorporate those comments and instructions into the docs to make it easier and clearer for others.

I just found out that the latest version of lookml-parser breaks this tool. However, I am currently working on changes to fix it (the reported issue is here if you want to watch: https://github.com/ww-tech/lookml-tools/issues/2)

Carl

1 Like

@joshtemple has released a Python LookML parser called lkml. I’ve tested it with lookml-tools and it works great. I plan to swap out the node LookML parser in lookml-tools with this Python lkml parser in the next release–likely next week. Watch this space!
Carl

1 Like

lookml-tools 2.0.0 has been released.

Release notes:

Given the impact of the following two changes, this is a major release:

  • swapped out the node-based LookML parser with Josh Temple’s new Python lkml parser (https://pypi.org/project/lkml/). This simplifies install, dependency management, and underlying parsed JSON format.
  • added layer of abstraction via LookML and LookMLField classes so that rules and other code can query LookML attributes via methods instead of inspecting raw JSON.

Other changes:

  • lkmltools.RuleFactory is now a singleton so it is easier for users to register their own rules.
  • Can now parameterize any rule in the configuration by adding additional keys to the dictionary for that rule. For instance, if the config defines {"name": "MyAwesomeRule", "run": true, "debug": true, "strict_mode":true, length: 6} then this whole dictionary is passed into the constructor during rule instantiation.
1 Like