How can I understand these variograms?

Cross Validated Asked by Zheyuan Li on January 7, 2022

Using grf function from R package geoR, I simulated 6 replicates (each with 1000 samples) of a Gaussian random field on [0, 1] x [0, 1], with zero mean, zero nugget, and an exponential spatial covariance $$gamma(h) = expleft(frac{h}{phi}right)sigma^2,$$ with $sigma^2 = 1$ and $phi = 0.5$. In the following figure, the blue solid curve gives the semi-variogram of this true model and the blue dashed line gives the sill (which is $sigma^2 = 1$). Then for each replicate, I

1. compute an empirical semi-variogram with function variog (black solid dots);
2. compute a monte-carlo envelope for the empirical semi-variogram (gray shaded polygon);
3. fit an exponential spatial covariance model (black solid curve) to the empirical semi-variogram for $hat{sigma}^2$ and $hat{phi}$ (printed in the tile).

However, I find it difficult to understand the results / behavior as illustrated by the figure.

1. I simulated data from a true model and fit that true model, why does the fitted semi-variogram differ from the true one so much? Or, why does $(hat{sigma}^2, hat{phi})$ differ from $(sigma^2, phi)$ so much? On some replicate the difference is drastic.
2. How can the empirical semi-variogram (black solid dots), as well as the fitted variogram (black solid curve), breach sample variance (black dashed line)? Isn’t sample variance the sill?
3. Why does the black dashed line so much away from the blue dashed line? (the answer is maybe as same as that to question 1)

Is a spatial process expected to behave like this? If I examine an AR(1) process by simulating a time series and fitting it using Yule-Walker method, I can get back to the truth, i.e., true autocorrelation coefficient and variance. This is not possible for a spatial process?

Reproducible R code

set.seed(0)

phi <- 0.5
sigmasq <- 1

## unconditional simulation of Gaussian random field with exponential covariance
EXP_GRF <- grf(1000, cov.pars = c(sigmasq, phi), cov.model = "exponential",
nsim = 6)

## estimation of semi-variogram
semi_variog <- variog(EXP_GRF)

semi_variog_i <- semi_variog
semi_variog_i$v <- NULL par(mfrow = c(2, 3), mar = c(2.5, 2.5, 1, 1)) for (i in 1:6) { ## the i-th empirocal semi-variogram semi_variog_i$v <- semi_variog$v[, i] ## mc envelops env <- variog.mc.env(coords = EXP_GRF$coords, data = EXP_GRF$data[, i], obj.variog = semi_variog_i, nsim = 99, messages = FALSE) ## plotting range MAX <- max(env$v.lower, env$v.upper, semi_variog_i$v, sigmasq)
with(semi_variog_i, plot(u, v, ylim = c(0, MAX), type = "n"))

## empirical semi-variogram
polygon(c(semi_variog_i$u, rev(semi_variog_i$u)),
c(env$v.lower, rev(env$v.upper)), col = 8, border = NA)
with(semi_variog_i, points(u, v, pch = 19))

## variance of the data
var_data <- semi_variog\$var.mark[i]
abline(h = var_data, lty = 2)

## semi-variogram of the true model
lines.variomodel(EXP_GRF, col = 4, lwd = 2)
abline(h = sigmasq, lwd = 2, col = 4, lty = 2)

## fit an variogram for parameter estimation
model <- variofit(semi_variog_i, c(var_data, phi), "exponential",
fix.nugget = TRUE, nugget = 0, message = FALSE)

## fitted variogram
lines.variomodel(model)

## add estimated sigmasq and phi as title
title(sprintf("sigmasq = %.2f, phi = %.2f", model[[2]][1], model[[2]][2]))

}


I didn't have a chance to look carefully into the code, but I can already guess what you are experiencing. It may have to due with the fact that you are simulating random fields on finite domains (in a computer).

Try to increase the domain to a larger box [0,10]x[0,10] or try to decrease the range of the variogram to something that is at least smaller than half of your domain size, say 0.1. Then plot the variograms up to a lag that is at most half of your domain. Everything that is separated by a lag that is equal or larger to half of your domain is not useful in empirical variogram calculations (too few samples, and hence spurious oscillations).

Geostatistics theory is more general than AR models, and there are theorems related to the issue you are experiencing, but I have no time at the moment to write down a more detailed answer.

Answered by juliohm on January 7, 2022

Related Questions

How to model count data with decay

0  Asked on October 15, 2020 by learning-stats-by-example

Seeking Explanation of how this ANCOVA result in R shows the group means are not equal

2  Asked on October 13, 2020 by kyle

Why are my coefficients too large when control variables are not added?

1  Asked on October 10, 2020 by kyrhee

Statistical line comparison

1  Asked on October 6, 2020 by mobeus-zoom

In a 2 class problem how do I compute the normalization constant for finding the posterior distributions?

0  Asked on October 2, 2020 by anonymous

Using percentage change with different sample sizes, which is more significant?

0  Asked on September 30, 2020 by katie-fenton

Python Random Forest Prediction Probabilities Reliability, Overfitting?

0  Asked on September 24, 2020 by rkhan8

Euclidean distance score and similarity

4  Asked on September 24, 2020 by navige

How could I understand the Self-critical sequence training (SCST) model?

1  Asked on September 22, 2020

Looking for feedback on my approach to split data into validation and test set?

0  Asked on September 22, 2020 by s_am

How to set up first differences model?

1  Asked on September 21, 2020 by fabio

Precision and recall for clustering?

3  Asked on September 20, 2020 by learner

Mixed Effects Model

1  Asked on September 20, 2020 by seydou-goro

Which Regression scoring to use when have small dataset?

0  Asked on September 19, 2020 by user293111

Is my logistic regression model correct?

2  Asked on September 15, 2020 by mustapha-hakkou-asz

Gibbs entropy and Shannon entropy

0  Asked on September 14, 2020 by alhayer

Why small values produce undulating densities when ploting logarithm of a loguniform prior (in R)?

1  Asked on September 13, 2020 by prolix