Looker and Vertica: Geospatial Analytics

done
low_priority
reply

(Erin Franz) #1

Vertica offers a suite of geospatial analytics functions with its HP Vertica Place package. This package enables storing and querying of geometric objects - translating locations into geometric points and regions into polygons. Common asks for measuring distance, length, area can be completed directly in Vertica and these requests are optimized at scale using spatial indexes and the underlying power of Vertica’s database.

Because Looker performs all analytical workloads in HP Vertica, Looker can utilize these geospatial analytic functions to enable exploration and visualization of results dynamically. To demonstrate some of what’s possible, we’ll start with a table of bookings in Vertica that have associated latitude and longitude locations. When we query for a sample of the table, called locations, we see the following:

select * from place.locations limit 10

In Looker, we can easily examine where the bookings are coming from by creating a view file, where we define how to query the table through dimensions and measures. In the view file, we can derive a location type field so that Looker can plot the latitude and longitude on a map.

- explore: locations

- view: locations
  sql_table_name: place.locations
  fields:

  - dimension: booking_id
    type: number
    sql: ${TABLE}.booking_id

  - dimension: booking_lat
    type: number
    sql: ${TABLE}.booking_lat

  - dimension: booking_long
    type: number
    sql: ${TABLE}.booking_long
  
  - dimension: booking_location
    type: location
    sql_latitude: ${booking_lat}
    sql_longitude: ${booking_long}

  - measure: count
    type: count
    drill_fields: []

Using these dimensions and measures, we can see where our bookings are coming from in Looker, by selecting the location dimension and count measure.

Though we can visually see where the bookings are coming from, we’d probably want to categorize them further for reporting. Suppose we have 4 sales regions: Northeast, Northwest, Southeast, and Southwest. We can define these regions using Polygon Geometry Objects and store this data in a table called regions.

CREATE TABLE place.regions AS
SELECT '1' as region_id, 'northeast' as region_name, 'POLYGON((-70 50,-95 50,-95 40,-70 40,-70 50))' as region_shape UNION ALL 
SELECT '2', 'southeast', 'POLYGON((-70 40,-95 40,-95 30,-70 30,-70 40))' UNION ALL 
SELECT '3', 'northwest', 'POLYGON((-95 50,-120 50,-120 40,-95 40,-95 50))' UNION ALL 
SELECT '4', 'southwest', 'POLYGON((-95 40,-120 40,-120 30,-95 30,-95 40))'

And create the corresponding view file in LookML to reference the values from table as dimensions:

- view: regions
  sql_table_name: place.regions
  fields:

  - dimension: region_id
    primary_key: true
    type: number
    sql: ${TABLE}.region_id

  - dimension: region_name
    sql: ${TABLE}.region_name

  - dimension: region_shape
    sql: ${TABLE}.region_shape

We can then take advantage of two Vertica Place functions to map our booking locations to regions. First, we’ll use ST_GeomFromText to convert text into a geometry object, for both the polygons defining the sales regions and the points defining the bookings. Then, we can use the ST_Intersects function to determine if the points overlap with the regions at any given point. This can be done via a join between the locations table and the regions table, which we can define in our Explore definition.

- explore: locations
  joins:
    - join: regions
      sql_on: |
        ST_Intersects(
          ST_GeomFromText('POINT(' || ${locations.booking_long} || ' ' || ${locations.booking_lat} || ')'),
          ST_GeomFromText(${regions.region_shape})
        )
      relationship: many_to_one

Now, when we explore locations, we can filter on region. For instance, maybe we are interested in performance in the Southwest region.

We can now associate each location to a region and look at region specific performance, or compare regions side by side like below. It’s easy to see that the Southwest region has the largest number of booking requests overall.

Now that we have an idea of regional performance, we likely want to dig deeper and examine performance even more locally, perhaps at a metro level. Assuming that we have a table called cities with latitude and longitude information, it’s possible to look at the bookings with a certain distance from each city. This mapping is done similarly to our region join, but instead of the ST_Intersects function, we’ll use the STV_DWithin function, which determines if the shortest distance from one object to another object is within a specified distance. We can add this to our explore definition, assuming in our view file for cities, in addition to the dimensions for city name and location, we’ve created a parameter for selecting the distance we’d like to be included in our metro area.

- explore: locations
  joins:
    - join: regions
      sql_on: |
        ST_Intersects(
          ST_GeomFromText('POINT(' || ${locations.booking_long} || ' ' || ${locations.booking_lat} || ')'),
          ST_GeomFromText(${regions.region_shape})
        )
      relationship: many_to_one
    - join: cities
      sql_on: |
        STV_DWithin(
          ST_GeomFromText('POINT(' || ${locations.booking_long} || ' ' || ${locations.booking_lat} || ')'),
          ST_GeomFromText('POINT(' || ${cities.longitude} || ' ' || ${cities.latitude} || ')'),
          {% parameter cities.distance_miles %} / 65)
      relationship: many_to_one

Let’s assume we’re interested in looking at Dallas bookings within a 250 mile radius. Filtering on both City and Distance Miles yields the following data and visualization in Explore:

We can now further examine the bookings isolated to the Dallas Metro, completing all transformation by defining it in the LookML modeling layer and using Vertica Place analytics package by directly querying the database.


[RETIRED] Using map layers in static region maps (3.42+) and interactive maps (3.50+)
#2

It would be nice to be able to set type: geometry for a Looker dimension, corresponding to a geometry-type SQL column. That would allow for spatial indexing.

Edit: Actually, I’ve added a geometry-type SQL column, spatially indexed it, and added it with unspecified type (which defaults to string) in Looker (with hidden: true!). I then joined on it and considerably sped up my queries versus joining on the textified version. (It would still be nice to have type: geometry, though.)

Obviously, this works only in SQL dialects and engines that support spatial indexing.


#3

Hey Michael - I’ll make sure our product team sees this


#4

See also Looker 3.44 Release Notes


#5

Hi Lindsey, I am actually trying to do the same thing for one of my report and was curious about the distance_miles field in cities table. Can you share more details on the cities table?


(Izzy) #6

Hey DeeV, looks like the distance_miles field isn’t actually a field in the underlying table, but rather a parameter, which exist only on the front-end and can be used to accept user input. Here, it allows the user to input the # of miles they’d like to filter on, and then inserts that user-specified number (divided by 65) into the sql_on clause.

Check out that link and see if that illuminates things a bit! Happy to explain more if necessary.


#8

Hi Izzy, gotcha. So the cities file contains only lat, long and city name and the distance_miles is a parameter created in the view file of the cities table correct? that way we can pull them into explore.


(Izzy) #9

That’s what it looks like, yep! I’m just guessing based on what I see up there, I don’t actually have access to the full LookML.


#10

Thanks Izzy. do you know if Erin Franz can get me access to more files on this? I really wanted to check this example as it is something I am currently working on!


(Izzy) #11

We can try and summon her by turning off all the lights and chanting her name 3 times… :mage:

@erin !
@erin !
@erin !

But the original post is from 2015, so my hunch is that those files probably aren’t laying around anymore. I bet with a little experimentation you can figure it out, even without the examples.