Pixel correlations from two raster datasets in R

Question

I am having trouble calculating the pixel by pixel correlation coefficient between two datasets. My current code takes in two folders full of rasters and creates two independent raster stacks. These rasters all have the same cellsize and extent. I then try and take the correlation coefficient (spearman in my case) between the column values of those rasters.

library(raster)

r <- raster()
raster1 <- list.files(path = "data/List1", pattern = "*.tif$", full.names = T)
raster2 <- list.files(path = "data/List2", pattern = "*.tif$", full.names = T)
l1 <- stack(raster1)
l2 <- stack(raster2)

list1Values <- values(l1)
list2Values <- values(l2)
corValues <- vector(mode = 'numeric')

for (i in 1:dim(list1Values)[1]){
  corValues[i] <- cor(x = list1Values[i,], y = list2Values[i,], method = 'spearman')
}

corRaster <- setValues(r, values = corValues)

However at the correlation for loop it gives six error messages saying

In cor(x = list1Values[i, ], y = list2Values[i, ], method = "spearman") :
  the standard deviation is zero

Ignoring that and continuing on, the last line errors out saying

Error in setValues(r, values = corValues) : 
  length(values) is not equal to ncell(x), or to 1

My initial guess was that since the datasets have a lot of NA values (In this case it is a lot of open ocean that there is no data for), that could cause problems, so I added the

use = "complete.obs"

parameter to the cor function. This errors saying

Error in cor(x = list1Values[i, ], y = list2Values[i, ], method = "spearman",  : 
  no complete element pairs

My guess is that this last error is telling me the matrices it created don't line up, and no non-NA cells match. I have no clue how this is possible because I have been working with these rasters for years and they certainly line up. Other than that, I don't know why this isn't working.

This question is not a duplicate of Correlation/relationship between map layers in R? in any way. First I am not using point data, but instead raster. I am also interested in a simple pearson and/or spearman correlation coefficient NOT cross correlation nor spatial correlation. I'm also not using two layers, but two sets of hundreds of layers (making the statistical analysis of the correlation valid). Lastly, this code is doing fundamentally different things than that post and I am asking for specific help with the errors arising.

obrl_soil · Accepted Answer

If I'm reading what you're trying to do correctly, a more robust approach would be to use raster::calc() on your stacked layers, as follows. This allows on-disk calculations as well as the option to speed things up with parallel processing.

# combine your two raster stacks. Now layers 1:(n/2) will be 
# correlated with layers ((n/2)+1):n. Make sure both stacks have the same 
# number of layers, i.e. nlayers(l_all) is even!
l_all <- stack(l1, l2)

# write a little function to do the correlation. The input is the vector
# of values at cell n, which has length() == nlayers(l_all)
corvec <- function(vec = NULL) {
  cor(
    # 'top' half of stack
    x      = vec[1:(length(vec)/2)],
    # 'bottom' half of stack
    y      = vec[((length(vec)/2) + 1):length(vec)],
    use    = 'complete.obs',
    method = 'spearman'
  )
}

corlyrs <- calc(
  l_all,
  fun = function(x) {
    # handle areas where all cell vals are NA, e.g. ocean
    if (all(is.na(x))) {
      NA_real_
    } else {
      corvec(vec = x)
    }
  }
)

The above will return a rasterLayer of correlation values. It took about 30s to correlate 2 sets of 6 300x300 cell rasters on my laptop.

To speed things up with clusterR,

# keep a cpu or two spare
cpus <- max(1, (parallel::detectCores() - 1))
beginCluster(n = cpus)
corlyrs <- clusterR(x         = l_all,
                    fun       = calc,
                    args      = list(fun = function(cell) {
                      if (all(is.na(cell))) {
                        NA_real_
                      } else {
                        corvec(cell)
                      }
                    }),
                    export    = 'corvec'
                    )
endCluster()

The last option may not be worth it unless your rasters are quite large. Saved me about 10 seconds on the actual calc, according to microbenchmark, but it also took about that long to establish the cluster. Benefits are much clearer with big rasters.

Jeffrey Evans · Answer

For reference, here is an approach, in memory, that uses mapply. This is one of the lesser known apply functions in R that lets you apply a function across objects.

Here, when I create objects from the raster stacks, I wrap the stack function call in list and values. This results in list objects, containing data.frames holding the raster values. This makes the data ready for mapply, which expects list objects. The use of sapply allows me to use a numeric row index 1:nrow(s1[[1]]) to aggregate the row-by-row correlations. This results in a vector of correlations, matching the rows in each lists data.frames. This vector is ordered to the cells in the source rasters and can be piped directly back into a source raster.

library(raster)
r <- raster(system.file("external/test.grd", package="raster"))
  s1 <- list(values( stack(r, r*0.15, r/1805.78) ))
  s2 <- list(values( stack(r^2, r/0.87*10, r/sum(values(r),na.rm=T)) ))

r[] <- as.vector(mapply(function(X,Y) { sapply(1:nrow(s1[[1]]), function(row) 
               cor(X[row,], Y[row,]))}, X=s1, Y=s2))
plot(r)

Pixel correlations from two raster datasets in R

2 Answers

Add your own answers!

Ask a Question