Cumulative Sum in MySQL

(Anika Kuesters Smith) #1

Cumulative Sums are a little tricky in MySQL.

The simplest way to tackle this would be to use the running_total measure, or utilize the running_total() function in Table Calculations.

If you need to be able to reference your running total or sum in other fields, these approaches won’t do it. In this case, you could try something like the following:

      SELECT 
        orders.user_id AS user_id
        , DATE_FORMAT(orders.created_at,'%Y-%m') AS orders_created_month
        , SUM(orders.amount) AS this_month_sum
        , ( 
              SELECT SUM(o.amount)
              FROM orders AS o
              WHERE o.user_id = orders.user_id
              AND o.id <= orders.id
          ) AS cumulative_sum
      FROM orders
      GROUP BY 1,2
      ORDER BY 1,2,3

Adapted from this StackOverflow Article.

0 Likes

(Anika Kuesters Smith) #2

Grouping by user_id gives flexibility to join users into this to get more flexible cohorting.

0 Likes

(Colin) #3

Inequality joins are another option that are a bit less flexible, but can be fast.

  SELECT 
    o1.user_id AS user_id
    , o1.order_id AS order_id
    , o1.created_at AS order_created
    , o1.amount AS order_amount
    , SUM(o2.amount) AS orders_prior_total
  FROM orders o1
  JOIN orders o2 ON o2.date < o1.date
  GROUP BY 1,2,3,4
2 Likes

(Noah) #4

What happens when even the inequality join is taking a long time? Our performance is completely lost when trying to execute this method. We have other options nowadays?

0 Likes

(lloyd tabb) #5

@_Noah, yes there are better ways of doing this.

If you are using a SQL dialect that supports window functions, the much better option is to use a SUM() window function in a derived table. Window functions are really fast. Unfortunately MySQL doesn’t have window functions, but there is another fast method using it, see example below.

For dialects that support Window Functions (redshift, bigquery, postgres, etc)

https://docs.aws.amazon.com/redshift/latest/dg/r_Examples_of_sum_WF.html

select salesid, dateid, sellerid, qty,
sum(qty) over (order by dateid, salesid rows unbounded preceding) as sum
from winsales
order by 2,1;

salesid |   dateid   | sellerid | qty | sum
---------+------------+----------+-----+-----
30001 | 2003-08-02 |        3 |  10 |  10
10001 | 2003-12-24 |        1 |  10 |  20
10005 | 2003-12-24 |        1 |  30 |  50
40001 | 2004-01-09 |        4 |  40 |  90
10006 | 2004-01-18 |        1 |  10 | 100
20001 | 2004-02-12 |        2 |  20 | 120
40005 | 2004-02-12 |        4 |  10 | 130
20002 | 2004-02-16 |        2 |  20 | 150
30003 | 2004-04-18 |        3 |  15 | 165
30004 | 2004-04-18 |        3 |  20 | 185
30007 | 2004-09-07 |        3 |  30 | 215
(11 rows)

For MySQL

In MySQL in a derived table, you can use variable to approximate the same thing.

 SELECT t.id,
         t.count,
         @running_total := @running_total + t.count AS cumulative_sum
    FROM TABLE t
    JOIN (SELECT @running_total := 0) r
ORDER BY t.id

A decent stack overflow article explains it:

https://stackoverflow.com/questions/2563918/create-a-cumulative-sum-column-in-mysql

0 Likes

(Noah) #6

Thanks @lloydtabb. I will look into utilizing the variable method, as we’re using mysql.

0 Likes