Few SQL dialects support robust table-generating functions. Some common use cases for table-generating functions are creating a sequence of numbers or dates, un-nesting a JSON object or array into rows, or splitting a string on a delimiter into rows.
In this brief article, I'll demonstrate the latter example: how to take a string containing n distinct comma-separated values and split it into n unique rows.
Suppose we have a table of products containing
|id ||name ||tags |
|123 ||Scrunch Cloth Pants Set ||Pants, Throwback, 80s |
|1287 ||Le Suit Citrus Breeze Skirt Suit ||Skirts, Dresses, Business Casual |
|486 ||MaxStudio Plaid Romper ||Jumpsuits, Casual, Rompers |
Our goal is to un-nest the tags to create a product-tag mapping table. I am assuming a table of numbers already exists in the database (though this can be created using this pattern).
(please note that the following functions are not available in all dialects.)
Two functions are useful here: first, the
split_part function, which takes a string, splits it on some delimiter, and returns the first, second, ... , nth value specified from the split string; second,
regexp_count, which tells us how many times a particular pattern is found in our string.
Our strategy is to (i) determine how many instances of our delimiter (", ") are found for each string; (ii) add one to that value, telling us how many values are separated by our delimiter; (iii) join in our numbers table on an inequality such that we fan out our tagged products n times, where n is the the value returned from (ii) above for each product; (iv) use the
split_part function and our sequence of numbers to get the 1st, 2nd, nth tags for each product.
select row_number() over(order by 1) as product_tag_id
, products.id as product_id
, split_part(products.tags, ', ', numbers.num) as tag_name
on numbers.num <= regexp_count(products.tags, ',\\s') + 1
| product_tag_id || product_id || tag_name |
| 1 || 123 || Throwback |
| 2 || 123 || Pants |
| 3 || 123 || 80s |
| 4 || 1287 || Skirts |
| 5 || 1287 || Dresses |
| 6 || 1287 || Business Casual |
| 7 || 486 || Casual |
| 8 || 486 || Jumpsuits |
| 9 || 486 || Rompers
That's it! Super easy!