[Data Block] - Data Block setup instructions and everything in between

modeling
blocks
data
extensions
import

(Haarthi Sadasivam) #1

#What are Data Blocks?
They are a chance for you get quick and easy access to hard to obtain datasets that contain Weather, Demographic, and other data! We provide the data and the model. You combine that with your data to come up with meaningful insights that you can take action on!

Types of Data Blocks

  • Demographic Data
  • Weather Data
  • Macro-economic Data
  • Geo Data
  • Exchange Rate Data

Refer to our Blocks Directory for detailed information on the datasets and potential use cases! The Weather datablocks and Exchange Rate datablocks are updated nightly, the other datasets are fairly static and will require minimal updates.

#Accessing Data

BigQuery

If you have an existing BigQuery account, you can access our BigQuery hosted datasets instantly. Skip ahead to the “Implementing Datablocks” section of this post.

If you currently do not have BigQuery you will first need to spin up a (free) trial of BigQuery with Google. You can then access Looker’s public datasets via BigQuery.

###Snowflake
You can access a Looker Hosted Snowflake DataWarehouse and use data sharing to easily get access to these datasets!

First, reach out to your Looker Analyst with the following information:

Customer Name: <Company1>
Customer Looker Instance: <company1.looker.com>
Snowflake Account Name: <company1.snowflakecomputing.com>

Once our ops team provides the Snowflake account with access to datablocks, the following commands will need to be run to get access to data from Looker.

-- Data Consumer Side (* Need to be ACCOUNT ADMIN *)
create database looker_datablocks from share looker.looker_share_datablocks;

grant imported privileges on database looker_datablocks to role sysadmin;

use role sysadmin;

grant imported privileges on database looker_datablocks to <other_role>;

show databases;
use looker_datablocks;
show tables;

###Other Database
Are you on Redshift? MySQL? Postgres? Oracle?
We’ve made the transformed data for each of these datasets publicly available in both GCS and S3 so that you can directly import them into the database of your choice.

We’ve also made the DDL available for each of the datasets in the GitHub Repo (the DDL statements might need to be tweaked for the datatypes in your selected database, but should provide an idea of the column types for each table).

Download data directly from either of these locations:
GCS - gs://looker-datablocks/
S3 - s3://looker-datablocks/
S3 Bucket Web Link - http://looker-datablocks.s3-website-us-east-1.amazonaws.com/

Accessing the data model (LookML)

Demographic Data (American Community Survey) - https://github.com/llooker/datablocks-acs
Weather (GSOD) - https://github.com/llooker/datablocks-gsod
Financial Indicators - https://github.com/llooker/datablocks-finance
Exchange Rates - https://github.com/llooker/datablocks-exchangerate

Note: Fork over our github repos and into a new github repo (either hosted by Looker or by your company) that you can then extend within your instance.

#Implementing Datablocks

Import Project from Looker

  1. Add a New Project to your Looker Instance

  2. Fork/Copy the Github repos mentioned above to access prebuilt LookML.
    Note: You should create a new github repo.

  3. Remove other DB dialects files from the repo. Our blocks will typically contain files for bigquery, redshift, and snowflake.
    Example: If you are setting up datablocks on bigquery, you will only need the bigquery view files, bigquery explore file, and bigquery model file.

  4. Replace the connection name in your model file with your DB connection where the data for datablocks lives. (Or in the case of bigquery and snowflake your existing database connection that you will be extending off of).

    Note: All join logic exists in an .explore file in each of the repositories. This is the file you will be including in the following steps after you have setup your project manifest.

###Enable Project Extensions

  1. Enable Project Import to build on top of our prebuilt Looker datablock. This is currently a labs feature that has to be turned on.

  2. In your main Looker project where you will be extending datablocks, create a Project Manifest.

  3. Add in the following lookML to the project manifest file to reference Looker datablocks in your main Looker Project.

project_name: "<your_project_name>"

local_dependency: {
  project: "<project_name_of_datablock>"
}

Model Away

You now have access to the modeled tables. Extend from the explores available or make use of granular joins at the view level.
Note1: Please use the correct set of modeled files. If you are on BigQuery you may want to reference the all files with bq in the file name. You may have to adapt our BigQuery model dialects to your own database dialect.
Note2: All of our projects have been setup to allow for extensions from explore files (since model extension could cause issues with multiple connections).

Feel free to also take a look at our documentation for Native Derived Tables. You can allow looker to write SQL for you at different levels of aggregation on our publicly available datasets and join them into your model.

Data Merge for data blocks

You can also choose to merge result sets from our datasets with your data by combining query result sets in the front end! Learn more about Data Merge.

Example Setup of the Demographic Dataset

  1. Get access to Data by either downloading raw data from our S3/GCS buckets or by connecting to a looker database

  2. Import in the Demographic Dataset Model from LookML as a separate project in your Looker instance.

  3. Enable Project Import and Update Project Manifest

  4. Pull in/extend view file, or make use of native derived tables to get data at the level of aggregation necessary for explores.
    Note: In our particular example, since the demographic data is at a different level of aggregation than our ecommerce dataset (block group vs zipcode) we use native derived tables to aggregate up stats up to the zipcode level. This eliminates messy many-to-many joins.

include: "/american_community_survey/bq.explore"

view: zipcode_income_facts {
  derived_table: {
    persist_for: "10000 hours"
    explore_source: fast_facts {
      column: ZCTA5 { field: tract_zcta_map.ZCTA5 }
      column: income_household { field: bg_facts.avg_income_house }
      column: total_population { field: bg_facts.total_population }
    }
  }
  dimension: ZCTA5 {}
  dimension: income_household {
    hidden: yes
  }
  1. Join in View Files into the Model
include: "acs*.view"

explore: order_items {
  join: users {
    sql_on: ${users.id} = ${order_items.user_id} ;;
    type: left_outer
    relationship: many_to_one
  }

  join: zipcode_income_facts {
    sql_on: ${users.zip} = ${zipcode_income_facts.ZCTA5} ;;
    type: left_outer
    relationship: many_to_one
  }
}
  1. Explore!
  2. Visualize!

(Seth Newman) #2

I’m having issues with some of the DDLs that have been shared in the links above. For example, most of the financial DDLs for Redshift do not work due to syntax errors and such.

The CREATE EXTERNAL Table / COPY commands do not match w/ the conventions (and also just do not work period!) from https://github.com/llooker/datablocks-acs/blob/master/readme.md. Also, there seems to be a mismatch in the type for the id column in the FRED_metadata ddl. The forex DDL only applies for the forex_real (I assume current data) and there is nothing for the historical data.

The financial indicator measures do not take into account that different datasets may be measured differently (e.g. employment rates are not measured in dollars).


(Department of Customer Love) #3

Hey @Seth_Newman,

Thanks for bringing this to our attention! In general, the DDL statements are meant as guidelines since they can vary greatly between dialects and should offer a clue as to the relevant datatype of the underlying field. We will definitely take note of these observations and update where necessary!


(eric.carr) #4

Hi, @Seth_Newman. This DDL was incorrectly using Spectrum format. It has been corrected. Thanks for letting us know!


(Kay) #5

Hi guys, how often are the exchange rates being updated? Could we use it in a pricing tool in near realtime?


(peter.whitehead) #6

Hi @kay,

Currently, Looker’s exchange rate data block contains daily historical closing exchange rates all the way back to the inception of the euro in 1999. Depending on your use case and the frequency of exchange rate updates you will need, you may be able to use our data block in your pricing tool. Please let us know if you have any further questions!


(peter.whitehead) #7

@kay, I have verified with a colleague and it seems that exchange rate block is updated with a day lag. So a realtime analysis would not be currently possible.


(Seth Jeffery) #8

I am seeing that the exchangerate data on S3 was last updated on 2018-06-03 which was over a month ago. Is it still being maintained?


(Ryan Dunlavy) #9

Hello @sethjeffery,

Thanks for bringing this to our attention! The data team is looking into this and I will follow up here with updates.

Best,
Ryan


(tyreinsch) #10

I’m also curious to see if this is still being updated.


(peter.whitehead) #11

HI @tyreinsch and @sethjeffery,

The real exchange rate data on S3 is being updated. There was a changed to the API endpoint, but everything should be up to date now. Thanks for checking in!


(Drew Gillson) #12

For anyone wondering how to preview public data on GCP Cloud Storage before loading it into BigQuery, you can use gsutil locally or from a Cloud Shell like this:

gsutil stat gs://looker-datablocks/gsod/gsod2011
gsutil cat gs://looker-datablocks/gsod/gsod2011 | head