Henry - A Command Line Tool for Looker Instance Cleanup

The Problem

As a Looker model grows in size and sophistication, it will also experience an ever-increasing number of Explores, Views and Fields. Unfortunately, a common side effect of this is model bloat, which typically means a less than great end-user experience.

The Why

Henry is a command line tool that helps determine model bloat in your Looker instance and identify unused content in models and explores. It is meant to provide recommendations that developers can validate in order to cleanup models from unused explores and explores from unused joins and fields, as well as maintain a healthy and user-friendly instance.

The How

The tool currently has three main commands: pulse, analyze and vacuum.

The pulse command runs a number of tests that help determine the overall health of the Looker instance. Among the tests are: connection checks, which confirm that all connections are in working order; query history checks to determine if there are any whose runtime stands out; the use of any legacy features; schedule plans health; and finally, whether the latest version of Looker is being used.

The analyze command gives the ability to scan projects, models and explores. With projects, it scans their content as well as checks for the status of quintessential features for success such as the git connection status and validation requirements. Whereas with models and explores, it provides statistics around unused explores, joins and fields as well as query count.

Finally, the vacuum command can be used with models and explores and it outputs a list of unused content based on predefined criteria. As an example, if we want to find out what fields are unused in the cohorts explore in the model thelook, we can obtain this by running:

$ henry vacuum explores --model thelook --explore cohorts
which yields:

| model   | explore   | unused_joins   | unused_fields                |
|---------+-----------+----------------+------------------------------|
| thelook | cohorts   | N/A            | order_items.created_date     |
|         |           |                | order_items.id               |
|         |           |                | order_items.total_sale_price |
+---------+-----------+----------------+------------------------------+

The Setup

Henry is on PyPI and the easiest way to install it is by running
pip install henry.

If interested in the implementation or in contributing, the source code can be found in the Looker Open Source Repo on GitHub. Henry is developed and mantained by Joseph Axisa, aka @jax.

Please note that this tool is open source and is not supported by Looker’s normal support channels. However, any issues encountered while using the tool are encouraged to be filed here: https://github.com/looker-open-source/henry/issues as it will help contributors enhance it further.

10 Likes

Hi @jax we have only just got around to looking at this tool after speaking with @Katie_Hindson at a London Looker meetup. We both are trying to automate the cleanup process. I have documented how it would work to automate cleanup for content, users, spaces, connections and explores but I was wondering if you could see a way to hide/remove models, projects, pdts & datagroups.
Also what do you see as the future/next steps of this tool. Would removing the manual next steps after analysis using Henry and automating these fit with your vision of this tool?

Thanks!

2 Likes

Thanks Joseph! Henry will be a great tool for developers to clean up their codes. I have had face challenges on deciding which block of codes are used and which codes can be deleted. I’m really excited to try this out! :smiley:

1 Like

Hey @iant! Unfortunately I missed that event but I will do my best to be at the next one.

I did some research to answer your questions however I ended up with more questions myself due to certain behaviour I encountered. I will reach out to you directly via email.

With regards to future/next steps of this tool, having the option to automatically remove/hide fields is not on the radar since the results do require some analysis by the user before deciding whether to remove a field or not. Some features that I am currently working on include being able to count indirect usage of fields, some more actionable content stats and adding the ability to generate some form of report from the tool.

Happy to hear that Nicholas. I’m interested to know how you get on with this. Please feel free to leave feedback here, in the repo or sent directly to me once you’ve had a chance to try this out.

When I say automatically take actions I meant that actions could be scripted rather than done by the user in the UI.
As I said previously, some actions the user would want to complete are possible using the API, the rest I don’t think are :cry:

1 Like