Data Dictionary Best Practices

I’m not sure if this is the right category for my question, but the description has ‘best practices’ in it, so here goes.

Where do you store your data dictionary, and has it worked? Looker pitched their recently-released LookML Model API endpoint as being a tool to enable building a data dictionary. I’m inferring then that at least some users store the dictionary outside of Looker. As of last year, Warby Parker used gitbook. But Looker has come a long way with annotations and the markdown homepage, and in theory, the dictionary could live within Looker itself. Here at Managed by Q, we briefly tried using a google spreadsheet, but now matter how many times and places we linked to it, no one ever used it.

I’d love to hear input from the Looker community on how they’ve approached this problem.

3 Likes

At UpCounsel, almost everyone uses Looker to answer their own questions about the data. It became glaringly obvious that we needed a centralized data dictionary. We had a google doc but same problem as you, no one knew where it was and didnt visit it. It was also not kept updated.

Once we bought Looker a year ago, I made sure to create a data glossary within the documentation files using markdown, which also serves as our Looker homepage. I created a table of contents within the glossary using markdown that links to more documentation. The Looker homepage (glossary) is made obvious when I onboard new coworkers.

Another thing I do is put descriptions next to every dimension and measure within our Looker instance. Then I always ask people did you read description? when I receive questions. They know they can rely on the descriptions.

3 Likes

@weitzenfeld we have this discourse post on Generating a Data Dictionary in Google Sheet showing how to define a Data Dictionary from your instance into Google Drive. It uses the API of your instance to populate the Google Sheets.

and @Erin_Breen since this script is pulling the information automatically from your API this would also ensure that the data dictionary is maintained constantly (given that newly added fields have a description parameter).

If you have any feedback on the script please let us know!

Would anyone here be interested in a product that does this? Searchable data dictionary that autogenerates every time new code is pushed. Would be hosted on a subdomain with ability for users to log-in.

Additional features could allow admins to hide certain definitions, override definitions, or add more content.

4 Likes

Yes Sam, such a product feature within Looker would be great!