Indexes on multiple columns in derived table



The documentation is a bit unclear about indexes on multiple columns in a derived table. The Derived Tables page says:

add indexes on customer_id and first_order_date, like this:…
indexes: [customer_id, first_order_date]

On the other hand, the indexes page says:

with an index on customer_id and date:…
indexes: [customer_id, date]

Note the difference! According to the Derived Tables page, there are indexes on the two columns, one on each. According to the indexes page, there’s a single index on the two columns. This is a significant difference (in MySQL at least); specifically, a single multicolumn index serves as an index not only on the combination of columns but also on any initial subset of that combination, but not on any other subset. Therefore, if this is a multicolumn index, then the order columns are specified in matters. (Again, this is true for MySQL. I don’t know what other dialects it might be true in.)

So my questions are:

  1. Which documentation page is correct: does the indexes: [foo, bar] syntax create a single multicolumn index or two indexes?
  2. Can the incorrect documentation page be emended?
  3. If indexes: [foo, bar] creates a multicolumn index, then is there a way to create separate indexes on various columns?
  • If there isn’t a way, then can there be, please? (This is an important feature request, because a multicolumn index, as I mentioned, is useful only when using an initial subset of its columns. Again, this is true of MySQL.)
  1. If indexes: [foo, bar] creates separate indexes, then is there a way to create a multicolumn index?
  • If there isn’t a way, then can there be, please?

I propose a fairly mnemonic syntax for specifying a multicolumn index on foo and bar in that order, a multicolumn index on baz and quux in that order, and separate indexes on xyzzy and mumble, as follows: indexes: [foo, bar], [baz, quux], xyzzy, mumble

(lloyd tabb) #2
indexes: [foo,bar]

creates two separate indexes on two separate columns.

Your suggestion

indexes: [[foo,baz],bar]

Is a great one. We should look into supporting it @mtoy .

(Brett Sauve) #3

I’ve clarified the behavior of indexes in both the Derived Table doc and indexes doc.

(Nick Larson) #4

I know this is an old thread, but is there a preferred workaround for this? I can add indexes manually, but wont those get killed for derived tables that get rebuilt automatically?

(sam) #5

@ifightcrime you are correct that manually created indexes will be overwritten when the derived table is rebuilt. There isn’t a preferred workaround at this time. You could concat the desired columns together and use that as an index, but then the index won’t be used if you filter or join on just one of the fields instead of the index.

We appreciate you posting here though - it gives us an idea of who wants this feature. As Lloyd said, this is a great idea and we’re looking into supporting it.

(Nick Larson) #6

Thanks for the reply @sam. I suppose one option is to have three indexes, two for individual fields (if even needed), and one for the combined. I’ll mess around with that and see what I can come up with. If that doesn’t work I think we’ll just pull this out into a cron or something on one of our servers.

(sam) #7

Sounds reasonable - let us know if you come up with something that works really well for your use case!

(Noah) #8

+1 for us on including the option to add a composite index for PDTs

(Peggy Beard) #9

Thanks, Noah! I’ve let our product team know that this is something you’d like to see!

(Thomas Brambor) #10

We are also stuck with a MySQL database for one of our products. Adding a composite index increases the speed of a join by a factor of 1000 for one of our derived tables. So, yes, +1! for the addition of composite indices.

(Xiaolong Cheng) #11

+1 for composite indices on PDT.