Geostatistics and rainfall with R

With the rainy winter season behind us, it’s time to summarize how much precipitation we got this year. The southern Negev desert enjoyed above average annual rainfall (in stark contrast to the rest of the country, where we saw only 60% of the multi-year average)

I wrote about creating isohyetal contours some years ago. This time I’ll try to improve my interpolation by using the powerful, open source R statistics program.

We begin with the total annual precipitation measurements at discrete locations – rain gauges. Using this point data we interpolate the rainfall everywhere in the region. The interpolation method I choose is kriging, which works with a variance distribution function, the variogram, that we prepare as part of the procedure. Furthermore, the geostatistics procedure called co-kriging allows us to apply a second, independant variable to the analysis to predict even better the interpolated values. Co-kriging is applicable only when we know that there is some correlation between the main, dependant variable and the auxiliary variable. Intuitively, we are aware that in many areas rainfall is stronger in the mountains, and less at lower elevations. So, I will examine elevation of the rain gauges as the independant auxiliary variable.

We begin by starting R, loading some required libraries, and pulling in the rain gauge data.

library(sp)
library(rgdal)
library(gstat)
# Load precipitation data
rain.data <- "raindata_ttl_2014.csv"
gauges <- read.csv(rain.data, col.names=c("gauge_id","precip","x","y"));

Now I load a border line, just for spatial reference, and make a simple bubble plot of the data.

# Put data into a Spatial Data Frame
gauges.xy <- gauges[c("x","y")]
coordinates(gauges) <- gauges.xy
border_il <- readOGR('border_il.shp', 'border_il')
# Set the correct CRS for the gauges data
itm_proj4 <- proj4string(border_il)
proj4string(gauges) <- itm_proj4
class(gauges)
summary(gauges)
# Have a look
plot(gauges, pch=16, col='blue', cex=precip/20)
plot(border_il, add=TRUE)

Here’s what we get:
gauges.2014

Now we need to determine the variogram. We start by plotting some candidate variogram models. The parameters that we need to set are “cutoff” and “width”. By default, the cutoff (how far from each prediction point to go to look for real points) is chosen as one third the diagonal of the whole analysis region. And the width (the variance between real points within the cutoff) is 1/15 of the cutoff. We can get an idea how these parameters influence the model variograms by plotting some possible combinations. Once we have a good looking variogram model, we use those in the gstat variogram() function to create the model variogram:

# Create a variogram
# Display some possible models
plot(gstat::variogram(precip ~ 1 , gauges, cutoff=20000, width=2000))
plot(gstat::variogram(precip ~ 1 , gauges, cutoff=30000, width=4000))
plot(gstat::variogram(precip ~ 1 , gauges, cutoff=70000, width=5000))
# Once we have a visually good variogram, use those parameters:
vg <- gstat::variogram(precip ~ 1 , gauges, cutoff=70000)

Now we have a model, and we want to let R to actually fit a variogram to the point pairs that are included in the variogram model to create a fitted curve. The following code prints the final fitted variogram values and gives us the result below:

vg.fit <- gstat::fit.variogram(vg, vgm(700, "Exp", 20000, 300))
# Here are the fitted variogram parameters:
print(vg.fit)
plot(vg, vg.fit)
  model    psill    range
1   Nug 355.0346     0.00
2   Exp 421.1879 12672.45

variogram.2014

With that we are now prepared to run the kriging and produce a raster of predicted rainfall throughout the region.

# create a grid onto which we will interpolate:
# first get the range in data
x.range <- as.integer(range(gauges@coords[,1]))
y.range <- as.integer(range(gauges@coords[,2]))
# now expand to a grid with 100 meter spacing:
grd <- expand.grid(x=seq(from=x.range[1], to=x.range[2], by=100), y=seq(from=y.range[1], to=y.range[2], by=100) )
# convert to SpatialPixel class
coordinates(grd) <- ~ x+y
gridded(grd) <- TRUE
proj4string(grd) <- itm_proj4

# perform ordinary kriging prediction:
# make gstat object to hold the krige result:
g <- gstat(id="Precipitation", formula=precip ~ 1, data=gauges, model=vg.fit)
precip.krige <- predict(g, model=vg.fit, newdata=grd)

The predict() function chooses ordinary kriging, based on the basic parameters we passed to the function. Now we can go ahead and plot the result. I load the RColorBrewer library to take advantage of the full set of color palettes available, and I use the spplot function from the sp package for plotting, since it easily adds contours (isohyetal lines) to the plot.

# Some parameters for plotting
par(font.main=2, cex.main=1.5,cex.lab=0.4, cex.sub=1)
# Use the ColorBrewer library for color ramps
library(RColorBrewer)
precip.pal <- colorRampPalette(brewer.pal(7, name="Blues"))
# plot the krige interpolation
spplot(precip.krige, zcol='Precipitation.pred', col.regions=precip.pal, contour=TRUE, col='black', pretty=TRUE, main="Interpolated Precipitation - 2014", sub="Ordinary Kriging", labels=TRUE)

The plot of krige interpolated precipitation looks like this:
precip_krige_2014

Now, as promised, let’s continue on to elevation data. We import a DEM and use the R over() function to attach an elevation value to each gauge. Then we check correlation between the annual precipitation and elevation. These are the steps:

# Now start with elevation data
elev <- readGDAL('dem_negev.tif')
class(elev)
summary(elev)
# Add elevation values to each gauge
gauges.elev <- over(gauges, elev)
precip.elev  <-  cbind(gauges, gauges.elev)
coordinates(precip.elev) <- gauges.xy
proj4string(precip.elev) <- itm_proj4
# Check that we have the elevations in a spatial data frame
names(precip.elev)
# the column "band1" contains the elevation values
# Now check for correlation between precipitation and elevation
cor.test(precip.elev$band1, precip.elev$precip)

Checking the output of the cor.test() function we see:

	Pearson's product-moment correlation
data:  precip.elev$band1 and precip.elev$precip
t = 0.331, df = 96, p-value = 0.7414
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.1657725  0.2306346
sample estimates:
       cor 
0.03375868 

Well, a correlation of 0.033 means essentially no correlation. Values of correlation can range from 1 (prefect correlation) to -1 (perfect inverse correlation). Values near to 0 indicate nearly no correlation. So for this last season the spread of the total annual rainfall had no connection to elevation across the Negev. We will settle with the above kriging map as a good estimate of the distribution of rainfall for 2014. End of story.

But, we don’t give up so easily. Let’s go back a few years to the winter of 2010. Anyone dealing with rainfall and flooding in the Negev desert remembers that season and the storms of January. We had once in 50 year floods in several wadis throughout the south due to very heavy rains in the Negev mountains. Let’s search thru the archives to pull out the annual rainfall data for that year, and try to correlate with elevation. I rerun all the above steps for the gauge data for 2010. Of course the variogram will be different, and I obtain the ordinary kriging result as below:
precip_krige_2010
Now I load the same elevation dataset and as above I test correlation between the precipitation data and elevation values:

cor.test(precip.2010.elev$band1, precip.2010.elev$precip)
	Pearson's product-moment correlation

data:  precip.2010.elev$band1 and precip.2010.elev$precip
t = 6.6778, df = 71, p-value = 4.552e-09
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.4562349 0.7447525
sample estimates:
      cor 
0.6211078 

Well, a correlation of 0.62 is nothing to write home about, but is does indicate some reasonable level of correlation. Additionally, the p-value is very low, meaning that there is a high level of confidence that the resulting correlation is accurate. So let’s forge ahead and use the elevation values to (hopefully) improve our precipitation estimation.

We first need to redo the gstat object to hold both variables:

# recreate g, the gstat object with a second variable
rm(g)
g <- gstat(id="Precip", formula=precip ~ 1, data=precip.2010.elev, model=vgm(psill=2000, model="Exp", range=100000, nugget=50))
g <- gstat(g, id="Elevation", formula=band1 ~ 1, data=precip.2010.elev, model=vgm(psill=100000, model="Exp", range=50000, nugget=10))
vg <- gstat::variogram(g)
# Make the multivariable variogram (Linear Model of Coregionalization)
vg.fit <- fit.lmc(vg, g, vgm(psill=2000, model="Exp", range=100000, nugget=50))
# Graph the model and experimental variograms
plot(vg, vg.fit)

Note that we us the fit.lmc() function to create a fitted variogram with multiple variables, each variable with its own set of sill and range parameters. And we must add a non-zero nugget value to insure that the Linear Model of Coregionalization will be chosen.
The plot() shows us three variograms: one for elevation, one for precipitation, and a third for the covariance of the two variables together.
variograms.2010

What’s left is to rerun the predict() function, and note that it chooses co-kriging:

# Now predict() should create a CoKriging interpolation
precip.2010.cokrige <- predict.gstat(vg.fit, newdata=grd)
Linear Model of Coregionalization found. Good.
[using ordinary cokriging]

and here’s our new estimate of distribution of precipitation for 2010. Compared to the ordinary kriging map for 2010 above, we can identify an obvious change of rainfall distribution in the western mountain regions:
precip_cokrige_2010

5 thoughts on “Geostatistics and rainfall with R

  1. I would like to replicate the examples for my country and wish to have the sample datasets or the format the csv should have to use with my own data.

    Thanks for the nice job.

  2. The csv file is very simple:
    gauge_id,precipitation,x_coord,y_coord

    In the case above, the precipitation should be the annual total for each gauge.

    HTH

  3. This is great example! I’ve followed the steps and produced a great isohyet map, however I want to also show the polygon on the final plot.

    How would overlay the “border_il” spatial data to your krigged output?

  4. Hi,

    Thanks for your useful code. But I am stuck in fitting variogram code.

    It always shows this error message:

    error in evaluating the argument ‘x’ in selecting a method for function ‘plot': Error in gstat(formula = object, locations = locations, data = data) :
    argument “data” is missing, with no default

    do you have any idea about this?

    Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>