Our Redshift first-run query performance is terrible

redshift
performance
cache

#1

Reaching out to other Looker customers and developers! Any of you out there using Redshift, and experiencing slow performance with first-run queries?

We have several large dashboards with up to 20 looks each, generating fairly hefty SQL queries, and using filters to customize the readout for various contexts.

We’ve noticed these dashboards, when first loaded, can take at worst 60 seconds or more to load the first time. On subsequent loads, the looks render in only a few seconds. The high query times were found on the AWS Redshift console so we don’t think it’s an issue on the Looker side.

tl;dr - Our questions for all of you

  • Has anyone else out there experienced problems with Redshift first-run query performance?
  • Has anyone else tried (in response to the problem):
    • Resizing their Redshift cluster
    • Pre-loading dashboards to warm the cache
    • Having Looker do the caching (i.e. data groups)
    • … something else? (including lateral workarounds - we’re interested in any creative approach people have taken.)

Code compilation

As far as we can tell it looks like we have slow query compile times. See AWS document on the subject (see under “Code Compilation”).

According to various posts and articles, it seems that:

  • The Redshift compilation cache has a finite size.
  • We developers have no control over when cache items are invalidated.
  • Amazon doesn’t even want to give us more details about it - the technology is too quickly changing, apparently - they’d like us to consider it black box.
  • There could be a performance hit even when querying empty tables.

Why do we think it’s a compilation issue?

Tests

We ran various tests to diagnose the issue. We tried, for example, finding large queries, and then altering small parts of them and running them again.

Our results (details available if you’re interested) do seem to indicate that query compilation is the culprit.

Resizing cluster had no effect

Acting on advice, we resized our cluster from one to two nodes, but this did nothing to improve query performance. We were not surprised to see that, because we know that compilation only occurs on the leader node of a cluster.

Our Redshift specs

  • Cluster size: one node at the beginning of this - we now have two nodes (no effect on query performance.)
  • Node type: dc2.large

What we have already tried

Resize cluster

See above. No effect resizing from 1 to 2 nodes.

Install the Redshift Block on Looker

On Looker’s advice we have installed the Redshift Block - it is helpful, but we’ve found no insight for this problem.

Proposed workarounds

Use Looker Data Groups

We’re looking at possibly using Looker data groups to have Looker cache as much as possible on its end. This would effectively be a workaround.

Has anyone else used this as a workaround? Any success?

Schedule daily loads of the dashbards to “warm the cache”

This probably won’t work for us:

  • It’s labor-intensive.
  • Not scalable.
  • Doesn’t work well for dashboards with filters (which all of ours have).

(Timothy Burke) #2

We’ve experienced similar results, and convinced ourselves that it’s because all of the queries necessary to render each individual dashboard element are kicked off immediately when you open the dashboard, and Redshift isn’t great at managing concurrent queries.

We solved it (kind of?) by adding caching everywhere, but I’d love to get more community feedback.


(fabio) #3

I may have been the source of the recommendation to upgrade the cluster from one node to two. I expected it would help based on these AWS docs.

On a single-node cluster, the node is shared for leader and compute functionality. On a multi-node cluster, the leader node is separate from the compute nodes.

I’ve also seen some of the data you’ve sent over contrasting the run times for 100% cached queries vs all others, and it is quite a big difference. It’s also surprising how small of a change to the query can lead to cache misses.

I’ll see if I can get this on the radar of our Redshift contacts. It would help if you added to this post the link to any relevant post in AWS’s forums, and also some stats from your system tables comparing the compile time differences between hits and misses.


(fabio) #4

Zach, can you provide us the case number you filed with AWS around this? (And if you haven’t already filed one, can you do so?)


(Lars Kamp) #5

Lars here, co-founder of intermix.io, we are a Looker partner.

The performance issue you describe is very common. This is likely because your workload management (WLM) is not aligned with the workloads your dashboards / looks are generating.

Redshift Workload Management

My hunch is that you’re maybe using the default WLM configuration in Redshift, which is one queue with a concurrency of 5. With that configuration, Redshift will only process 5 concurrent queries (per node) at a time, and other queries will wait in the queue. That’s the queue wait time you experience as “slow”, or when people say “Redshift is slow”.

Queue wait time

Some directional back-of-the-envelope math:

If you have e.g. 10 dashboards with 20 looks, then each time you refresh you trigger 10 * 20 = 200 queries. Your WLM has one queue with 5 concurrent slots, on a 1-node cluster. So right from the start, 195 of those queries get stuck in the queue.

Let’s say your large queries take 10 seconds to process. So after 10 seconds, five of your queries have processed, and then the next five queries run. With that configuration, you’re running 200 / 5 = 40 cycles, and each cycle takes 10 seconds, for a total of 40 * 10 = 400 seconds, i.e. 6 minutes and 40 seconds.

Even if you add a node now, you’re only cutting that time in half, which is still 3 minutes and 20 seconds.

The math doesn’t include any other activity that’s going on in your cluster, e.g. from your data loads or any transformations you’re running. Those are additional queries that also get queued, making everything last longer.

Setting up your WLM

The key to making your queries fast is by aligning the workloads from Looker with your WLM setting in Amazon Redshift. See this link for how to configure your WLM:

https://docs.aws.amazon.com/redshift/latest/mgmt/workload-mgmt-config.html

The max amount for concurrency slots is 50, though AWS doesn’t recommend going above 15. We’ve got a whole section on how to configure your WLM as part of our data engineering training - see section 3.

In short:

  • separate your users
  • group your users by workload type (load, transform, ad-hoc)
  • set up 4 different queues: load, transform, ad-hoc plus the default queue
  • assign the user groups to their corresponding queue, leave the default queue empty
  • fine-tune concurrency / memory for each queue

The last step is non-obvious - it’s hard If not impossible to figure out the right concurrency / memory combination in your WLM. Shoot me a note at lars@intermix.io if you need help with that.

Hope this helps!

Lars


#6

Hi Timothy, what kind of caching did you do, I’m curious? Was it Looker-side caching with data groups or did you “pre-chew” your data into pre-aggregated special tables, or what?

Did you see Lars Kamp’s post below about WLM? Have you tried that approach?


#7

Fabio, I don’t believe we have filed yet - I will pass on the request. We do have an open forum thread: https://forums.aws.amazon.com/thread.jspa?threadID=278652


#8

Fabio, it seems I’m not yet allowed to upload any attachments. I did just prepare a redacted result of the following query from March 8th on our cluster:

Can you advise me how I could best make the attachment available to this thread?

select      c.*, q.text
from        (
            select    query,
                      COUNT(*) as segments,
                      MIN(starttime) as starttime,
                      MAX(endtime) as endtime,
                      ROUND(DATEDIFF(ms, MIN(starttime), MAX(endtime)) / 100) / 10 as duration,
                      SUM(compile) as compiled,
                      COUNT(*) - SUM(compile) as reused
            from      svl_compile
            group by  query
            ) c
inner join  (
            select    query,
                      LISTAGG(text) WITHIN GROUP (order by sequence) as text
            from      stl_querytext
            group by  query
            ) q on (q.query = c.query)
order by    c.starttime

#9

Lars - we hadn’t considered the WLM angle. Thanks for taking the time to explain it carefully. We’re going to look at this and see what we can do. Stay tuned.


(bruce.sandell) #10

Zach, we’ve been in touch with the Redshift team and they already have a Database Engineer ready to assist you. We just need to supply them with a case number so that they can get the specifics on your cluster endpoint.


#11

Thank you Bruce,

We are working on our end to get that number. I’ll be posting again to this thread soon.


#12

Case number: 4917952651


#13

Lars, you can see the SQL query I posted in the thread to help us determine whether query compilation was the chief bottleneck we were facing.

Is it your experience that customers often misidentify their performance issues as compilation bottlenecks when the reality is a WLM misconfiguration?

Is the SQL we are using potentially ineffective at isolating one from the other?

P.S. we are looking into the training materials you linked - this is very helpful, thank you.


(fabio) #14

Hi Zach, were you able to rule out WLM concurrency as the limiting factor here? We are still in contact with an AWS rep and would like to provide the most current information


#15

What we did so far was make a couple changes to the primary WLM queue:

  1. Increased concurrency slots from 5 to 10.
  2. Enabled short query acceleration in a bid to get the simpler looks on a dashboard to load more quickly.

Given the variability of performance from moment to moment, we have not been able to prove that these changes made a measurable difference to performance. We are still seeing queries waiting in the queue and also occasionally going through lengthy compilation, as much as 25-30 seconds.

So that is a bit disappointing, unfortunately. Everyone I’ve talked to and everything I’ve read has cautioned me against pushing the concurrency too high (say, higher than 10 or 15). So we’re being conservative about that.

I hear through the grapevine as well that AWS engineers might be working on improvements to how Redshift caches and stores compiled segments, possibly reducing overall the amount of compilation that has to happen. Has anyone else heard this or seen any literature about it? I’d love to get a timeframe.


#16

Follow-up - on our development instance (where we have also been able to reproduce the slow query compile time), we tried two test configurations:

  1. Enabled short query acceleration and maximized the SQA threshold to 20 seconds.
  2. Disabled short query acceleration but cranked up the default queue concurrency to 50.

In the latter case, we were able to drop queue time for all queries to zero (with only one disk-based query over a test period of 20 minutes.) However, having done this we noticed that for cold-load dashboards, almost every query still goes through a compile time of 20-60 seconds (most over 45 seconds). Usually at least 60% of segments recompiled in these cases.

Here is a sample query (field names changed to hide our domain model) that took about 50 seconds to compile on first run:

SELECT
    COALESCE(
        COALESCE(
            CAST(
                (
                    SUM(
                        DISTINCT (
                            CAST(
                                FLOOR(
                                    COALESCE(snail.muscle_mass, 0) *(1000000 * 1.0)
                                ) AS DECIMAL(38, 0)
                            )
                        ) + CAST(
                            STRTOL(LEFT(MD5(CONVERT(VARCHAR, snail.snail_id)), 15), 16) AS DECIMAL(38, 0)
                        ) * 1.0e8 + CAST(
                            STRTOL(RIGHT(MD5(CONVERT(VARCHAR, snail.snail_id)), 15), 16) AS DECIMAL(38, 0)
                        )
                    ) - SUM(
                        DISTINCT CAST(
                            STRTOL(LEFT(MD5(CONVERT(VARCHAR, snail.snail_id)), 15), 16) AS DECIMAL(38, 0)
                        ) * 1.0e8 + CAST(
                            STRTOL(RIGHT(MD5(CONVERT(VARCHAR, snail.snail_id)), 15), 16) AS DECIMAL(38, 0)
                        )
                    )
                ) AS DOUBLE PRECISION
            ) / CAST((1000000 * 1.0) AS DOUBLE PRECISION),
            0
        ),
        0
    ) AS "snail.total_muscle_mass_1"
FROM
    body_parts.shell AS shell
    INNER JOIN animals.snail AS snail ON shell.habitat_id = snail.habitat_id
    and shell.snail_set_id = snail.snail_set_id
    and shell.snail_id = snail.snail_id
    INNER JOIN animals.snail_set AS snail_set ON shell.habitat_id = snail_set.habitat_id
    and shell.snail_set_id = snail_set.snail_set_id
WHERE
    (snail_set.habitat_id > 0)
    AND (
        (
            (
                case when snail.snail_eyestalks = '' then 'No snail eyestalks' else snail.snail_eyestalks END
            ) LIKE '%'
        )
    )
    AND (snail.family_group LIKE '%')
    AND (snail.species_number LIKE '%')
LIMIT
    500

(Lars Kamp) #17

a few updates to this reply.

I’ve written a detailed post on how to fix your slow Looker dashboards on Amazon Redshift.

3 Steps for Fixing Slow Looker Dashboards with Amazon Redshift

Then I’ve also prepared an in-depth write-up on how to configure your Amazon Redshift workload management (WLM):

4 Simple Steps To Set-up Your WLM in Amazon Redshift For Better Workload Scalability

Both posts together will give you detailed, step-by-step instructions on how to get fast dashboards and be confident to keep it that way as you add more users, queries, persistent derived tables, etc.


(Lars Kamp) #18

Hi Zach,

With a single-node cluster, you don’t have enough memory to go above 3-4 slots. By increasing the slot count, you’re increasing concurrency, but at the cost of memory. Each slot will have less memory, and that means your queries are falling back to disk.

That means two things:

  1. they are slower because they don’t execute in-memory
  2. they slow down the entire cluster because they use a lot of i/o

There’s nothing wrong going above a concurrency of 15. We have customers who run all 50 slots. Key is to have enough memory per slot when you do that, which is a function of your node count and node type.


#19

Lars, I appreciate the continued feedback. I should mention that we have switched database providers and are no longer using Redshift.

Our diagnostic queries did not indicate that the WLM queue waits were the blocking issue; we did try increasing to the maximum number of slots and even then were not seeing too many disk-bound queries. We were, in short, able to lower queue times to zero. The problem always kept coming down to query compile time. We tried many things to improve it but ultimately could not lower compile times to a satisfactory level.


(Gabriel) #20

Yes, we have experienced this issue on our implementation as well. Our cluster has been scaled up to have 15 nodes at this point. Would love to know of any solutions to improve the performance.