Spatialite- Updating one table from another

Users of SQL databases often get tripped up when trying to update values in one table from another table. The method involves a subquery in the UPDATE statement to extract the values from the other (source) table. We’ll review how it’s done, both with simple attribute values, and with an update based on a spatial query.

In sqlite3 and spatialite the generic UPDATE statement which extracts values from another table looks like

UPDATE target_table SET target_column=(SELECT source_column
FROM source_table

The tricky bit that sometimes gets people confused is the WHERE clause. In a simple SELECT statement you cannot put a “table.column” into the WHERE clause that doesn’t appear also in the FROM part. So a simple SELECT query would have to read:

SELECT source_column
FROM source_table AS s, target_table AS t

Both source and target tables must appear in the FROM portion.

Simple UPDATE example
However, when updating the target table with a subquery you must NOT have the target table in the subquery. The WHERE clause in the subquery must refer to the table being updated outside of the subquery. Let’s look at an example. We have a table of store locations, and we receive an excel spreadsheet file with sales data for each store. The excel table has one row for each store with, and a “StoreNum” that matches the StoreID in our stores spatial layer. See the table list in the image below. We want to get the sales values into our spatial layer. We first add a column for the new data:

ALTER TABLE Stores ADD COLUMN Sales double;

Now we can use the support in spatialite for virtual excel sheets, and add the sales data as a virtual table. In Spatialite GUI there’s a VirtualXL button – indicated in green in the image – to add a sheet from an excel file. Once the sales data is available as a virtual table, we want to update the Stores layer with the values from this excel table. So:

Notice – circled in red – that the subquery refers to the Stores table being updated, and this table does not appear in the FROM clause. The FROM clause contains only the sales_summary table, aliased AS s.


UPDATE with Aggregation
On to the next example, we are given an excel spreadsheet with detailed sales data. Rather than a summary for each store, this table contains all the transactions for all the stores. So each store appears many times in the table. We must now aggregate the sales data for each store, and update the sales column in our Stores spatial layer with a sum of sales for each store. This is accomplished easily with the SQL SUM() function. We first use the VirtualXL support to add a virtual table “sales_details”, then:

UPDATE Statement with SUM
The UPDATE statement with SUM() to aggregate data for each store

Again, the Stores table, being updated, appears in the WHERE part of the subquery, but not in the FROM clause. The FROM portion has only the sales_details table. And in the SELECT subquery we use SUM(sales) to aggregate sales for each store. Since the WHERE clause matches s.StoreNum to Stores.StoreID, the SUM will be done for each store individually.

UPDATE with a spatial query
For the last example, we do an UPDATE on two spatial tables, using a subquery with a spatial function. Our store locations are spread around many counties (a polygon layer). We need to summarize all sales for each county. But wait a sec, the table being updated is in this case is the Counties, and the source data is from the Stores table. So the Stores table will now drop down to the subquery. We’ll again use the SUM() function, and our WHERE clause utilizes the ST_Within() function to find which stores are inside each county, and total up sales for those stores. In addition we deploy a SpatialIndex on each of the tables to speed up the query. See my previous post for an explanation of using spatial indices. Our spatial query now looks like:

UPDATE with a spatial query
An UPDATE with a spatial query, also using SpatialIndex

The WHERE clause of the spatial query – in green -contains both ST_Within function to find stores in each county, and the search for ROWID in the SpatialIndex limiting the search for stores which fall within each counties bounding box, thus saving lots of time. Again we see that the counties table, being updated, appears in the WHERE clause of the subquery, but not in the FROM part.

Leave a Reply

Your email address will not be published. Required fields are marked *