How to: Creating a View of Your Customer Without a Dedicated Customer ETL

How to: Creating a View of your Customer Without a Dedicated Customer Data Feed

Originally authored for Looker 6.22.12

Business Requirement:

Your business intelligence platform is by necessity a low-cost application. Budgets for enhancements and additional data sources are very limited. Business users have requested insight into customer behavior but your database does not currently have a CRM-sourced data feed. You will need to use the data available in your sales history table to provide the most insight possible.

Technical Pre-Requisites/Assumptions:

This write up is intended for a technical audience. The write up assumes a strong understanding of LookML as well as write privileges for your instance. All SQL is written for BigQuery.

Although this example is taken from a physical retail and ecommerce business, the approach could be applied to any business in which your sales history contains a key for customer (this could be email, customer name, a foreign key to your CRM system, etc.).

Step 1: Writing the query to define your customer PDT

Author the query that will populate your customer PDT. The code below uses a query/sub-query to this effect:

Sub-query: This query pulls the key for customer and all associated transactions in order (notice the application of the dense_rank( function). Pay special attention to the where clause as it helps to cleanse the data somewhat (disregards invalid or null email addresses, etc.).

Query: Based on the results of the sub-query this outer statement aggregates the results at a customer level and begins to expose insightful customer data points.

The code below assumes the following data structure:

  • date and time are held in separate fields
  • email is the Looker primary key for customer
  • only two different ā€˜channelsā€™ of consumption exist (physical retail store vs. ecommerce)
  • each transaction is assigned a unique universal_no

Please feel free to copy this code and adapt it to your data. Although the column names will change, the logic and functions are universal.

SELECT
  xx.email,
  MIN(xx.adate) AS firstpurchasedate,
  MAX(xx.adate) AS lastpurchasedate,
  DATE_DIFF(CAST(MAX(xx.adate) AS DATE), CAST(MIN(xx.adate) AS DATE), day) AS customerdurationdays,
  DATE_DIFF(CAST(CURRENT_DATE() AS DATE), CAST(MAX(xx.adate) AS DATE), day) AS dayslapsed,
  COUNT(DISTINCT(xx.alocation)) AS locationcount,
  COUNT(DISTINCT(xx.channel)) AS channelcount,
  COUNT(DISTINCT(xx.universal_no)) AS transcount
FROM (
 SELECT
    email AS email,
    adate,
    atime,
    alocation,
    universal_no,
    CASE
      WHEN CAST(alocation AS string) LIKE "%ECOMMERCE%" THEN "Yes"
      WHEN CAST(alocation AS string) NOT LIKE "%ECOMMERCE" THEN "No"
  END
    AS channel,
    DENSE_RANK() OVER (PARTITION BY email ORDER BY ADATE ASC, ATIME ASC) AS purchaseorderoldtonew,
    DENSE_RANK() OVER (PARTITION BY email ORDER BY ADATE DESC, ATIME DESC) AS purchaseordernewtoold
  FROM
    twr.salesjournalhistory
  WHERE
    email IS NOT NULL
    AND CAST(email AS string) != ""
    AND CAST(email AS string) != " "
    AND CAST(email AS string) LIKE "%@%"
  GROUP BY
    1, 2, 3, 4, 5
  ORDER BY
    email ASC, adate ASC, atime ASC) AS xx
GROUP BY
  1
ORDER BY
  1 ASC, 2 ASC

To read more about ranking logic in BigQuery, follow this link.

Step 2: Construction of View to Expose Customer

Code sample below:

  dimension: customeremail {
    label: "Cust Email"
    primary_key: yes
    type: string
    sql: ${TABLE}.email ;;
  }

  dimension: customeremailmask {
    label: "Cust Email Masked"
    description: "Customer email encrypted as MD5 string. Anonymizes email for presentation/sharing."
    html: <font size = 3.5px>{{rendered_value}}</font> ;;
    type: string
    sql: MD5(${TABLE}.email) ;;
  }

  dimension: customeremaildomain {
    label: "Cust Email Domain"
    html: <font size = 3.5px>{{rendered_value}}</font> ;;
    type: string
    sql: SUBSTR(CAST(${TABLE}.email AS string), STRPOS(email, '@'), LENGTH(email)) ;;
  }

  dimension: customerfirstpurchasedate {
    label: "Cust Acq Date"
    type: date
    sql: ${TABLE}.firstpurchasedate ;;
  }

  dimension: customeracquisitioncohort {
    label: "Cust Acq Cohort"
    description: "Acquisition date: YYYY-MM. Enables cohort analysis."
    type: string
    sql: substr(cast(${TABLE}.firstpurchasedate as string), 0, 7) ;;
  }

  dimension: customerlastpurchasedate {
    label: "Cust Last Purch Date"
    description: "Maximum date on which email purchased from any channel"
    type: date
    sql: ${TABLE}.lastpurchasedate ;;
  }

  dimension: customerdurationdays {
    label: "Cust Lifespan"
    description: "Days between first purchase date and last purchase date"
    type: number
    sql: ${TABLE}.customerdurationdays ;;
  }

  dimension: customerdayslapsed {
    label: "Cust Lapsed Days"
    type: number
    sql: ${TABLE}.dayslapsed ;;
  }

  dimension: customerlocationlapsed {
    label: "Cust Location Count"
    description: "Count of unique locations through which a customer has purchased"
    type: number
    sql: ${TABLE}.locationcount ;;
  }

  dimension: customerlifetimepurchasecount {
    label: "Cust Lifetime Purchase Count"
    description: "Count of unique transactions completed; includes sales and returns"
    type: number
    sql: ${TABLE}.transcount ;;
  }

  dimension: customerchannelcount {
    label: "Cust OMNI?"
    type: string
    case: {
      when: {
        sql: cast(${TABLE}.channelcount as string) = "1" ;;
        label: "Not Omni"
      }
      when: {
        sql: cast(${TABLE}.channelcount as string) = "2" ;;
        label: "Omni"
      }
    }
  }

  measure: customercount {
    label: "Customer Count"
    value_format: "#,###"
    type: count_distinct
    sql: ${TABLE}.email ;;
    }
  }

Step 3: Building Reporting

If this is a new subject area for your users, it is important to build some easily-understood reports that demonstrate the functionality of this subject area. The view defintion above enables reports such as:

  • Customer Acquisition by Month
  • Lapsed Customers (could be used to prescribe action in stores or by email campaigns)
  • Best Customers (based on life-to-date transaction count)

Step 4: IT Testing, UAT, Deployment to Production, etc.

Step 5: Change Management & End-User Training

Depending on what other data and views you have available in your project, it may be possible to join this explore with a sales explore to begin calculating lifetime value (LTV).

Good luck!

3 Likes