Chi-squared test, Poisson distribution, type I error overestimated - well-suited test for discrete distributions?

Question

UPDATE I edited my original question to make it as clear as possible. My goal is to find a reliable goodness-of-fit test for Poisson-distributed samples. There are a few discussions here related to goodness-of-fit tests for discrete distributions, e.g., the Poisson distribution (for example, here and here). I have created a simulation to understand what happens to the type I error in the case of the chi-squared test. I am working with a sum of Poisson-distributed variables (which is in turn a Poisson-distributed variable itself): set.seed(123) n <- 100000 alpha <- 0.05 # significance level n_sim <- 10 res_chi2 <- vector(mode = "list", length = n_sim) res_ks <- vector(mode = "list", length = n_sim) lambda_i <- 10^sample(-10:-2, 100, replace = TRUE) # 100 Poisson-distributed variables total_lambda <- sum(lambda_i) # the random variable of interest is a sum of Poisson-distributed variables for (i in 1:n_sim){ set.seed(i) # observed frequencies my_sample <- rowSums(sapply(lambda_i, function(x) rpois(n, x))) # generate a sample by aggregating event counts of subsamples sample_freq <- table(my_sample) # expected frequencies # calculated using the density function for the aggregate Poisson distribution theor_freq <- dpois(as.numeric(names(sample_freq)), total_lambda)*n # add missing count for (n,+ inf) to the last bin # now frequencies are normalized to n (sample size) theor_freq[length(theor_freq)] <- theor_freq[length(theor_freq)] + n - sum(theor_freq) # test statistic, the first formula below # https://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm test_statistic <- sum((theor_freq - sample_freq)^2/theor_freq) # no estimated parameters, df = number of categories - 1 p_value <- 1 - pchisq(test_statistic, df = length(theor_freq)-1) # if TRUE, the null is accepted res_chi2[[i]] <- p_value >= alpha } sum_passed_chi2 <- Reduce(`+`,res_chi2) # 1000 simulations > 1000 - sum_passed_chi2 > 92 # the null was rejected 92 times The type I error is equal to 9% for the chi-squared test. Why is it overestimated? Can I assume that a well-suited goodness-of-fit test will give an error of approximately 5% (my significance level)? How do I implement/design a proper goodness-of-fit to test whether a sample is distributed according to a Poisson distribution with known parameters? UPDATE 2 I also ran a simulation with a single sample drawn from a Poisson distribution, i.e.: my_sample <- rpois(n, total_lambda) In this case, the type I error rate is 8%.

Chi-squared test, Poisson distribution, type I error overestimated - well-suited test for discrete distributions?

Add your own answers!

Ask a Question