[Analytic Block] Redshift Admin


(Scott Hoover) #1

About This Block

Short of giving everyone access to the AWS Console, it’s difficult for non-AWS administrators to get insight into ETL cycles, recent load failures, table architectures, etc. This block is meant to address this issue, at least in part. While this block is not a perfect substitute for the AWS Console, it provides similar insights. This block is particularly useful for the class of user who needs to know if data is flowing normally and how tables are structured, in order to optimize queries and even the way they model data.

Ideal Data Types

It is highly recommended that a separate database connection to the Redshift cluster is used for this block, where the database user associated with the connection has access to the following tables/views: svv_table_info, stv_tbl_perm, pg_class, pg_namespace, pg_database, stv_blocklist, stl_load_errors, and stl_load_commits. These are tables that Looker generally does not need access to.

Included in the repo for this block are three files:

  1. redshift_admin.model.lookml, which defines the base views/explores.

  2. redshift_admin.view.lookml, which defines the various views, dimensions, and measures used to create the dashboard.

  3. redshift_admin.dashboard.lookml, which defines the visualizations presented in the dashboard.

Expected Output

At present, this block results in five visualizations:

  1. Table Load Summary

  2. Recent Files Loaded

  3. Recent Load Errors

  4. Database Consumption

  5. Table Architecture

Try it Yourself!

This block can be cloned directly from the Github repo (linked below). I also encourage people to contribute by forking the repo and making pull requests.

Update: The block now contains performance optimization views and dashboards as well!


[Analytic Block] Redshift Performance Optimization
(Lucas Thelosen) #2

Scott,

This is great!
A lot of companies think of using Looker to report on the usual suspects: revenue, user growth etc. and only for only a couple departments within a company. However, this really shows how you can use Looker beyond the usual use cases. Go setup a Looker alert for ETL load errors.

One note: On row 284 of the view, you have value_format_name which is not something I am seeing in any of the openly available Looker versions. Looks like it is in learn.looker.com so I am sure the general public will have access to it soon.

Thanks,
Lucas


(Scott Hoover) #3

Thanks, Lucas! Take a look at this Discourse for an explanation on value_format_name.


(Sadhasivam Jayabalaganesan) #4

When i try to install redshift.admin.blocks i get “Model Not Found” Dashboard error ? please help

My Connection having necessary permission to reach out alll redshift tables.


(Scott Hoover) #5

@Sadhasivam_Jayabalag I noticed that, in the block, there’s a model hardcoded into the dashboard LookML—namely, model: bigfdata. Make sure to update all of that code with whatever you named your model.


(Sadhasivam Jayabalaganesan) #6

Thanks @scott.hoover. works charm.


(Arthur Smith) #7

Thanks for this block, it’s very helpful!

A minor note - the “- scoping: true” line in the model file is obsolete in current LookML.


(Scott Hoover) #8

Good find, @apsmith. I’ve got a pull request in.

Thanks!


(Dillon Morrison) #9

Thanks @scott.hoover! Just pushed the changes through.


(Arthur Smith) #10

One other note here - I was puzzled why I wasn’t seeing all the load info, and then discovered something - some of these redshift system tables are not completely visible except to users with superuser status (“create user” privilege). So the account here needs to be a superuser account to see all the load data.


(Michael Erasmus) #11

These are great! We’ve also created our own, very similar model and views to monitor Redshift, but I definitely might steal some ideas from this (love the color coded columns)

Thought I might share two more related views that I find to be very useful:

Table Schemas
Query Alerts


(Dillon Morrison) #12

Thanks @Michael_Erasmus, this is awesome!


(Ryan Kulla) #13

Thanks. I ran into this too. I tried creating a new user called looker_admin and granted it privileges to all those system tables, yet in the SQL Runner, doing things like SELECT COUNT(*) FROM stl_load_commits would return the value 0 (rather than saying it doesn’t have permission, even though select has_schema_privilege('looker_admin', 'public', 'usage'); said 't').

This might be because these system tables aren’t part of the public (or any) schema that looker expects? None the less, using a real admin user resolves it, but isn’t ideal because of how any user with SQL Runner access can run ad hoc queries against everything.


(Dillon Morrison) #14

Hmmm @rkulla not immediately clear without being able to look under the hood in your instance / database. Did you try using Chat Support so our analysts can see a bit more detail?

Thanks!


(Ryan Kulla) #15

@Dillon_Morrison I did, yes. This was what chat support figured as well.


(Dillon Morrison) #16

Okay thanks, just wanted to be sure. Maybe there’s a workaround we can figure out. Just to be clear, your goal is for you (and a certain subset of other users) to be able to query the systems tables, but restrict access for another subset of users (all of whom still have developer privileges within Looker)? Thanks!


(Ryan Kulla) #17

Actually this behavior isn’t because of Looker. Upon further investigation:

production=# select current_user;
current_user
--------------
my_admin_user

production=# select count(*) from stl_load_commits;
count
-------
62966

production=# SET SESSION AUTHORIZATION 'looker_admin';
SET

production=> select count(*) from stl_load_commits;
count
------
 0

production=# revoke select on table stl_load_commits from looker_admin;
REVOKE
production=# select has_table_privilege('looker_admin', 'stl_load_commits', 'select');
has_table_privilege
---------------------
t

Makes no sense to me. Maybe there’s a bug with redshift?


(Dillon Morrison) #18

Hmmmm, yeah that is weird. Not sure what would be causing that but looks like a bug to me on first impression…


(Ryan Kulla) #19

This seems to be why https://docs.aws.amazon.com/redshift/latest/dg/c_visibility-of-data.html

These are virtual tables where Only users with superuser privileges can see the data in those tables that are in the superuser visible category. Regular users can see data in the user visible tables. In most cases, rows generated by another user are invisible to a regular user.


(Dillon Morrison) #20

Found some additional helpful queries in this repo: