Deprecation Notice: Redshift Distribution Style (3.36+)


(Brett Sauve) #1

As of Looker 3.36, when users of Amazon Redshift create a persistent derived table, the Redshift distribution style will default to ALL instead of EVEN.

Description

Amazon Redshift allows database rows to be distributed in one of 3 ways:

  • All Distribution: All rows are fully copied to each node.
  • Even Distribution: Rows are distributed to different nodes in a round-robin fashion.
  • Key Distribution: Rows are distributed to different nodes based on unique values within a particular column.

Before 3.42, Looker defaulted to EVEN distribution, but now defaults to ALL if you do not specify a style using the distribution_style parameter, or the distribution parameter.

Going Back to an EVEN Distribution

If you would like to go back to using an EVEN distribution on a derived table, you can do so by using the distribution_style parameter as follows:

- view: customer_order_facts
  derived_table:
    sql: |
      SELECT
        customer_id,
        COUNT(*) AS lifetime_orders
      FROM
        order
    persist_for: 24 hours
    distribution_style: EVEN

Legacy Features End-of-life Schedule
(Rex Gibson) #2

Why is all the choice? I’m trying to understand why join optimization is the better choice vs space optimization?


(Rex Gibson) #3

Is the assumption that most PDTs are “kinda big” but not very big, so the redshift reshuffle for a other than distribution key join is a more common and painful problem all your pdts change and use too much space?


(Scott Hoover) #4

Rex, I’ve spoken to Product and Engineering about the problems posed by this decision. I’ll update when I know more.


(Scott Hoover) #5

Rex,

Here’s the plan: rather than change the default distribution style, Engineering is going to build a warning into the LookML parser. Now, if one omits both the distkey: and the distribution_style: arguments, Looker will present a warning: “By not declaring a distribution style or a key, Looker is going to default to distribution_style: all. This will have a potentially serious implication on database space.” Maybe not that verbose, but something to that effect.


(Ken Yeoh) #6

@sdhoover will there be a way to default to an EVEN distribution for our instance of Looker?

Many of our developers do not follow the Looker releases and don’t update their code according to new best practices - this release will cause the storage requirements on our Redshift instance to increase by a non-trivial factor. A lot of them rely heavily on derived tables, which now will use N times more storage, where N is the number of nodes in the cluster…


(Scott Hoover) #7

@kyeohhubspot Unfortunately there will be no way to default to even. @colin and the Product team may have suggestions for you.