Spatialite: Speedup your query with spatial indexing

Spatialite, like any good spatial data management system, can build a spatial index for your layers. Using this index in your spatial queries will dramatically shorten the runtime for that query. The latest version of Spatialite offers a nice compact format for using a spatial index. To demonstrate, I created a point layer of 20,000 theoretical store locations, with sales data for each store, and a polygon layer of over 280 “local councils”. My mission it to sum up the total sales in each local council. So I need to find which stores are located in each local council and aggregate sales for those stores.

We begin with two tables, councils (multipolygon) and stores (point) that look like this:

spatialite> SELECT * FROM councils LIMIT 10;
pk_uid Name Geometry
---------- ---------- ----------
1 Ronneby
2 Sölvesbor
3 Ă„lvdalen
....

spatialite> SELECT * FROM stores LIMIT 10;
pk_uid sales Geometry
---------- ---------------- ----------
1 192617.423921806
2 75489.1390022072
3 163013.639152632
....

Let’s build the query. First, we are using both tables, so the FROM part of our SQL looks like:
FROM councils AS c, stores AS s
and since we want to aggregate total sales for each council, we require a GROUP BY clause:
GROUP BY c.pk_uid
What results do we want? The council name and total sales, so the SELECT part of our query will be:
SELECT c.Name AS Name, SUM(s.sales) AS "Total Sales"
Finally, we formulate the criteria, stores contained in each council, like so:
WHERE ST_Contains(c.Geometry, s.Geometry)

Putting it all together, our “naive” query (no spatial index yet) goes:
SELECT c.Name AS Name, SUM(s.sales) AS "Total Sales"
FROM councils AS c, stores AS s
WHERE ST_Contains(c.Geometry, s.Geometry)
GROUP BY c.pk_uid;

Now this query will run successfully on our two tables as described, but you better go for lunch when you start the query. It might finish on a fast computer by the time you get back… The query loops thru all 20,000 store locations for each of the 280 councils. So that’s over 5,000,000 table scans, and checks for each point in each polygon.

So how do we utilize spatial indexing to speed things up? The R*Tree indexing and its implementation in Spatialite are covered in the Spatialite Cookbook. It’s important to note that Spatialite does NOT make use of a spatial index unless you explicitly incorporate it into the query statement. (BTW, this is unlike PostGIS which automatically uses a spatial index, if it’s available.)
The options to implement a spatial index in a query structure were clearly explained in this thread from the spatialite maillist.

First we create a spatial index for both tables:
spatialite> SELECT CreateSpatialIndex('councils','Geometry');
1
spatialite> SELECT CreateSpatialIndex('stores','Geometry');
1

Now the additional WHERE clause, in it’s new incarnation (requires spatialite >=3.0) looks like this for our query:
s.ROWID IN
(SELECT ROWID FROM SpatialIndex WHERE f_table_name='stores' AND search_frame=c.Geometry)

“SpatialIndex” is a new virtual table, and using the search_frame component of that table we filter out the store locations in the bounding box (AKA Mbr=Minimum Bounding Rectangle). Searching for points in a rectangle is much faster than the full ST_Contains (or equivalent ST_Within) function. After this filtering, the ST_Contains() function runs on only a small subset of the 20,000 store locations.

So the spatial index enabled query will now look like:
SELECT c.Name AS Name, SUM(s.sales) AS "Total Sales"
FROM councils AS c, stores AS s
WHERE ST_Contains(c.Geometry, s.Geometry) AND
s.ROWID IN
(SELECT ROWID FROM SpatialIndex WHERE f_table_name='stores' AND search_frame=c.Geometry)
GROUP BY c.pk_uid;

On my machine, this query, aided by the spatial index, completed in less than 1.5 minutes. The first “naive” query took over 35 minutes. Pretty cool, no?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>