Finding unused fields


(fabio) #1

Update: If you like this article, and want even more, check out Henry, a command-line tool built by another engineer on our team!

It can be tough trying to clean up a large model if you don’t know which fields are not being used. The i__looker explores get you pretty close, since you can look for fields that were used two months ago and not since then. However, since history is discarded after 90 days, there is no way using i__looker alone to find fields that had no activity in the last 90 days.

However, the model API can get you this information. So, we just need to combine the two. I wrote this JS snippet to do it, so you don’t have to. Just get administrative API credentials, head to the interactive API Docs on your instance and paste the script into your JS Console:

Where to use this

https://< your domain here >:19999/api-docs/index.html

The code to use

(async function(){
var config ={
  modelName:"faa_redshift",
  timeframe: "28 days",
  apiCredentials:{
    //Provide API credentials with the admin permission
    client_id:"9gC5bCTmrSP5KHCcvrnf",
    client_secret:"..."
  }
}

console.log("Authenticating")
var auth = await $.post("/login",config.apiCredentials)
var api = (verb,method,data)=>$.ajax({
    method:verb,
    url:"/api/3.0/"+method,
    data:verb=="POST"?JSON.stringify(data):data,
    headers:{"Authorization":"token "+auth.access_token}
  })
var m=config.modelName
console.log("Getting fields in "+m+"...")
var model = await api("GET","lookml_models/"+m)
var exploreNames = model.explores.map(e=>e.name)
var fields = await exploreNames.reduce(async (chain,e) => {
     var priorFields = await chain
     var explore = await api("GET","lookml_models/"+m+"/explores/"+e)
     return explore.fields.dimensions.concat(explore.fields.measures)
             .map(f=>(m+"::"+e+"/"+f.name))
             .concat(priorFields)
        }
     , await [] )
console.log("> Fetched "+fields.length+" field definitions")
console.log("Getting usage for timeframe '"+config.timeframe+"'...")
var queries = await api("POST","queries/run/json",{
    model:"i__looker",
    view:"history",
    fields:["query.model","query.view","query.formatted_fields","query.formatted_filters","query.sorts","query.formatted_pivots","history.query_run_count"],
    filters:{"history.created_date":config.timeframe,"query.model":m},
    limit:50000
})
console.log("> "+queries.length+" queries fetched")
var fieldUsage = queries.map( q => ([]
    .concat((JSON.parse(q["query.formatted_fields"])||[]).map(f => ({fid:q["query.model"]+"::"+q["query.view"]+"/"+f, count:q["history.query_run_count"]})))
    .concat((JSON.parse(q["query.sorts"])||[]).map(f => ({fid:q["query.model"]+"::"+q["query.view"]+"/"+f, count:q["history.query_run_count"]})))
    .concat(Object.keys((JSON.parse(q["query.formatted_filters"])||{})).map(f => ({fid:q["query.model"]+"::"+q["query.view"]+"/"+f, count:q["history.query_run_count"]})))
))
.reduce(((a,b)=>a.concat(b)),[])
.reduce((to,field)=>({[field.fid]:(to[field.fid]||0)+field.count, ...to}),{});
console.table(
  fields
  .map(f=>({field:f,usage:fieldUsage[f]||0}))
  .sort((a,b)=>a.usage-b.usage))
})()

The result should look something like…

Note that this only counts when the fields are used directly. But, the LookML validator already allows you to detect when a field is referenced from another field.


(jeremy.eckman) #2

This is awesome @fabio!

Just to be clear, people utilizing this code will want to update the var config at the top of the code block to their own values. Most importantly, to use the code users need to enter the API credentials (client_id and client_secret) of a user with Admin permissions.

We also need to set the modelName: to the model we want to see the field usage of. We can also set the timeframe: to a different value if we want a timeframe other than the default 28 days (up to the history’s 90 days.)


(aryeh) #3

This is an awesome document @fabio. It never occurred to me to use the API programmatically from the API documentation page (we should maybe build that into the product). Per request from a client, it is also possible to print a CSV string — which can be copied and pasted into a file with the CSV extension to create a CSV document — by doing the following:

Replace:

console.table(
  fields
  .map(f=>({field:f,usage:fieldUsage[f]||0}))
  .sort((a,b)=>a.usage-b.usage))

with:

fieldsAndUsage = fields.map(f=>({field:f,usage:fieldUsage[f]||0})).sort((a,b)=>a.usage-b.usage)
console.log("field,usage\n" + fieldsAndUsage.map(function(x) { return [x.field, x.usage].join(',') }).join('\n'))

(Sonali Digwal) #4

Hi,

It is a bit unclear from the article what the usage column stands for? Is it for the fields that are used actively for creating looks/dashboard or just that they are used for the dashboards that currently run every day?

Thanks,
Sonali


(fabio) #5

The data is based on /explore/i__looker/history. That column is the number of times that the field has been used in query runs (note that a given run of a query can use a field in multiple ways, like as a sort, or filter) in the past 28 days.

28 days is a default, but can easily be changed up to 90 days in the config section of the script.