Financial ratios over time with transactional data sets


(Segah Mir@Caura & Co.) #1

Financial data is typically stored in one of two ways: as transactions or as statuses. A transaction is any financial interaction with an account (e.g. debit or credit). Statuses are any snapshots of the current state of an account based on all transactions that have occurred until then.

In this article we will cover how to use Looker to create a bridge between the two types of data sets. Doing so, will allow the transactional data to be evaluated in the context of the latest status. This enables us to produce valuable financial ratios over time, such as assets-to-debt, 3-month cash flow to debt, or debt-to-income.

We will examine two existing tables: account_table and transactions_table. Our goal is to create two all-time daily rollup derived tables - one that represents transactions, another for estimated balances.

Stored data is discontinuous

A common database design for status data is to store actual values (assets, debt, or balance) in one row for each account.

account_table:

id | balance | account_type | default_status | reported_date
--- | --- | --- | --- | ---
1 | $3000 | checking | False | 2014-12-01
2 | $4000 | checking | False | 2014-12-01
3 | $5000 | credit | False | 2014-12-01

But what if we now want to graph average assets over time? May be we want to know whether someone’s account is looking healthier than, say, 3 weeks ago. Or if we are tracking the health of our entire portfolio, we need to add balances even if they were not reported on some particular date. Basically, we need a table that would have a value for each date and subsequently, many rows per account. We might even call this table an accounts ledger.

Hypothetical table “accounts ledger”:

date | id | balance | account_type | default_status
--- | --- | --- | --- | ---
2014-12-01 | 1 | $3000 | checking | False
2014-11-30 | 1 | $3000 | checking | False
2014-11-29 | 1 | $3100 | checking| False
2014-11-28 | 1 | $3200 | checking | False
2014-12-01 | 3 | $5000 | credit | False
2014-11-30 | 3 | $5020 | credit | False

For that we need an existing transactional table. An example of a typical transactional table will have a timestamp and amount associated with each account.

transactions_table:

account_id | posted_at | amount
--- | --- | ---
1 | 2014-12-01 12:15:01 | $3
1 | 2014-11-30 13:02:50 | -$10
2 | 2014-12-02 12:10:00 | $40
3 | 2014-12-01 11:00:06 | -$2

Notice that we now have two important ingredients: a single reported_date from our account_table and a series of timestamps describing transactions. The reported_date might be just one date as of today–if every account’s status is updated daily–or it may be a different date for each account describing the day an account was last updated. We don’t really need hours/minutes/seconds for our transactions, but this is how a transactional table is typically organized. In any case, having these dates and timestamps, we can now trace back every account’s balance into the past.

First attempt at creating an accounts ledger

SELECT
  transactions_daily_table.account_id AS account_id
  , transactions_daily_table.date AS date
  , (accounts_table.balance - SUM(COALESCE(transactions_daily_table.total_amount,0)) 
    OVER (PARTITION BY 
          transactions_daily_table.account_id
          ORDER BY transactions_daily_table.date DESC ROWS UNBOUNDED PRECEDING
    )) AS balance
FROM transactions_daily_table
LEFT JOIN accounts_table ON accounts_table.id = transactions_daily_table.account_id
  AND (transactions_daily_table.date < accounts_table.balance_date)

And the exposed LookML will be:

 - dimension: account_id
    type: int
    primary_key: true
  
  - dimension: date
    type: datetime
    convert_tz: false
    
  - dimension: balance
    decimals: 2
    type: number

The key part here is a window function that calculates a cumulative sum of daily transactions up to each of the dates in the account’s lifetime. So now we have, for each account, the balance of the account for each date the account existed:

(accounts_table.balance - SUM(COALESCE(transactions_daily_table.total_amount,0)) 
    OVER (PARTITION BY 
          transactions_daily_table.account_id
          ORDER BY transactions_daily_table.date DESC ROWS UNBOUNDED PRECEDING
    )) AS balance

So for the previous day, there is a sum of one daily amount subtracted from today; for two days ago, a sum of two daily amounts is subtracted. Therefore, if -$200 in transactions was recorded over the period of two days and the final balance was reported as $1000, we know that the balance two days ago was $1200.

Building daily totals from transactional data
Notice that we skipped one important step above. We relied on transactions_daily_table - a table that does not exist in our database. In the Looker world, this would typically be a derived table that we would then reference with something like

FROM ${transactions_daily_table.SQL_TABLE_NAME} AS daily_transactions

The purpose of this table is twofold. First, to aggregate transactions on daily bases. Remember, our original transactions table records individual transactions as they occur. We don’t yet have daily total amount. Second, the table spans the entire life of an account - producing a continuous time series that varies in length based on how long the account has existed.
transactions_daily_table:

SELECT
  date_series.account_id AS account_id
  , date_series.date AS date
  , SUM(account_transactions.amount) AS total_amount
FROM (
  SELECT
    account_table.id AS account_id
    , generate_series(
        LEAST(account_table.balance_date, fat_date.first_transaction_date)
        , DATE('yesterday'::timestamptz)
        , '1 day'
      )::date AS date
  FROM account_table
  LEFT JOIN (
    SELECT 
      account_id
      , MIN(DATE(posted_at)) as first_transaction_date
    FROM transactions_table
    GROUP BY 1
  ) AS fat_date ON account_table.id = fat_date.account_id
) date_series
LEFT JOIN transactions_table AS account_transactions 
  ON date_series.account_id = account_transactions.account_id
  AND date_series.date = DATE(account_transactions.posted_at)
GROUP BY 1, 2

And the corresponding LookML:

 - dimension: account_id
    type: int
  
  - dimension: date
    type: datetime
    convert_tz: false
  
  - dimension: total_amount
    type: number
    decimals: 2

While this SQL seems complicated, what happens is actually simple. First, we are making use of Postgres’ generate series function to build a date series for every account.

SELECT
  account_table.id AS account_id
  , generate_series(
      LEAST(account_table.balance_date, fat_date.first_transaction_date)
      , DATE('yesterday'::timestamptz)
      , '1 day'
    )::date AS date
FROM account_table

We use LEAST to make sure we only generate dates in which we have the first balance status, or the first transaction, whichever came first. We need this because, for one, some accounts might not have any transactions. The following join just pulls the first transaction–if it exists–to be plugged into the time series function:

LEFT JOIN (
  SELECT 
    account_id
    , MIN(DATE(posted_at)) as first_transaction_date
  FROM transactions_table
  GROUP BY 1
) AS fat_date ON account_table.id = fat_date.account_id

Second, we use yesterday as the last day for complete daily transaction totals. Essentially all of this logic is used to build a continuous lifetime for each account. That way we don’t create date observations for an account before its existence or even worse, into the future.

Filling in the missing observations

Alright, so we have covered a simple case where the balance is reported as of today and all transactions happen prior to today. But what if we had an old reported date and now we need to forecast where the balance is today based on occured transactions since then. Effectively, we would be filling in the missing balance statuses, based on transactions that have occurred since the last time the accounts_table table was updated. Well, we have window functions for this.

For transactions occurring after the reported date:

(accounts_table.balance + SUM(COALESCE(transactions_daily_table.total_amount,0))
  OVER (PARTITION BY 
    account_id
    ORDER BY transactions_daily_table.date ASC ROWS UNBOUNDED PRECEDING
)) AS balance

Completing “accounts ledger” in Looker
Finally combining all of the above logic into LookML, we have:

- explore: accounts_ledger
  joins:
    - join: account_table
      foreign_key: account_id
  
- view: account_balances
  derived_table:
    sql: |
      (
        -- for transactions before the date snapshot balance was recorded
        SELECT
          transactions_daily_table.account_id AS account_id
          , transactions_daily_table.date AS date
          , (accounts_table.balance - SUM(COALESCE(transactions_daily_table.total_amount,0)) 
            OVER (PARTITION BY 
                  transactions_daily_table.account_id
                  ORDER BY transactions_daily_table.date DESC ROWS UNBOUNDED PRECEDING
            )) AS balance
        FROM transactions_daily_table
        LEFT JOIN accounts_table ON accounts_table.id = transactions_daily_table.account_id
          AND (transactions_daily_table.date < accounts_table.balance_date)
      ) 
        UNION ALL 
      (
        -- for transactions on or after the date snapshot balance was recorded
        SELECT
          transactions_daily_table.account_id AS account_id
          , (transactions_daily_table.date + 1) as date
          , (accounts_table.balance + SUM(COALESCE(transactions_daily_table.total_amount,0)) 
            OVER (PARTITION BY 
                  transactions_daily_table.account_id
                  ORDER BY transactions_daily_table.date ASC ROWS UNBOUNDED PRECEDING
            )) AS balance
        FROM transactions_daily_table
        LEFT JOIN accounts_table ON accounts_table.id = transactions_daily_table.account_id
          AND (transactions_daily_table.date >= accounts_table.balance_date)
      ) 
        UNION ALL 
      (
        -- actual balance on the date
        SELECT
          account_id
          , date
          , balance AS balance
        FROM accounts_table
      )
    sql_trigger_value: SELECT DATE(NOW())
    indexes: [account_id,date] 

  fields:
  - dimension: account_id
    type: int
    primary_key: true

  - dimension: balance
    decimals: 2
    type: number  
    sql: ${TABLE}.balance
  
  - dimension: date
    type: datetime
    convert_tz: false
    
- view: account_transactions_daily
  derived_table:
    sql: |
      SELECT
        date_series.account_id AS account_id
        , date_series.date AS date
        , SUM(account_transactions.amount) AS total_amount
      FROM (
        SELECT
          account_table.id AS account_id
          , generate_series(
              LEAST(account_table.balance_date, fat_date.first_transaction_date)
              , DATE('yesterday'::timestamptz)
              , '1 day'
            )::date AS date
        FROM account_table
        LEFT JOIN (
          SELECT 
            account_id
            , MIN(DATE(posted_at)) as first_transaction_date
          FROM transactions_table
          GROUP BY 1
        ) AS fat_date ON account_table.id = fat_date.account_id
      ) date_series
      LEFT JOIN transactions_table AS account_transactions 
      ON date_series.account_id = account_transactions.account_id
      AND date_series.date = DATE(account_transactions.posted_at)
      GROUP BY 1, 2     
    sql_trigger_value: SELECT DATE(NOW())
    indexes: [account_id, date]
    
  fields:
  - dimension: account_id
    type: int
  
  - dimension_group: date
    type: datetime
    convert_tz: false
  
  - dimension: total_amount
    type: number
    decimals: 2

Now that we have the intended derived table a number of analysis is possible. The simplest approach is to just create a new ratio in account_balances view:

 - measure: debt_to_assets
    type: number
    decimals: 2
    sql: 1.0 * NULLIF(${total_debt},0) / NULLIF(${total_assets},0)

  - measure: total_debt
    type: number
    decimals: 2
    hidden: true
    filters:
       account_type: 'credit'

  - measure: total_assets
    type: number
    decimals: 2
    hidden: true
    filters:
       account_type: 'checking'

We can then produce a look like this one for any customer with both credit and checking accounts:

So, there you have it. With the use of our transactional data set we were able to build a report with financial ratios over time straight from a live database.


(Phil O'hagan) #2

This is exactly what we need to do- however the GENERATE_SERIES line doesn’t work in Redshift, is there an alternative for that dialect?


(sam) #3

Hey @Phil_O_Hagan, check out Lloyd’s post here for a Redshift alternative to GENERATE_SERIES!


(Segah Mir@Caura & Co.) #4

Glad this was useful. Since originally writing this post over a year ago, I have found a number of uses for having daily tables (this being just one). So I would advise to have a calendar table. That will work in any dialect.

Also, on larger datasets the above approach requires a slight alteration (it will require too many resources in its current form for a dataset with millions of transactions).

Segah Meer
Data & BI Architect
Caura |​ Data Musings


(Phil O'hagan) #5

Thanks- we do have a calendar table that we use to pull it in- I’m flummoxed however by how to replace the generate series with an appropriate join to that table