I am trying to simulate a set of times for the below problem.

There are N nodes. Each node generates a random number($R$) in the range $[0,K]$ per second. Guess the time it takes by each node to generate $R$ such that $R <= frac{K}{(N∗X)}$ . The time should be in terms of $X$.

In my experiment $N> 2^{20}$ and $K = 2^{256}$, so brute forcing is not possible.

Do the times fall under any distribution? If so, then how can I simulate the times?

Cross Validated Asked by SlowMountain on December 3, 2021

2 AnswersThanks to @stephan for pointing it out as a geometric distribution. I found that we can generate a random number in geometric distribution with $log(u)/log(1-p)$. Where $u$ is a uniform distributed number in $[0,1]$.

In this case, I have to run the above formula for $N$ times. To get the set of times in terms of $X$, the possible values of $u$ should be limited to $N$.

Answered by SlowMountain on December 3, 2021

At each time point, each node performs a Bernoulli experiment, with a success probability of $p=frac{1}{XN}$, since success is defined as an outcome lower than $frac{K}{XN}$ for a $U[0,K]$ random variable.

For a single Bernoulli experiment, the waiting time until the first success follows a geometric distribution with parameter $p$.

You have $N$ such experiments running concurrently, and you are interested in the distribution of the time until *all* experiments have encountered at least one success. Thus, you are interested in the distribution of the *maximum* of $N$ iid geometric distributions.

Happily, COOLSerdash pointed out that there is a very simple closed form for the probability mass function (PMF) of this maximum. Let

$$ F(y) = P(Yleq y) $$

denote the CDF of a single random variable $Y$, then the CDF of the maximum of $N$ IID random variables $Y_1, dots, Y_N$ is simply

$$ P(Y_1leq y wedge dots wedge Y_Nleq y) = Pbig(max{Y_i}leq ybig) = F(y)^N. $$

Thus, the PMF of a *discrete* maximum is

$$begin{align*} Pbig(max{Y_i}= ybig) &= Pbig(max{Y_i}leq ybig)-Pbig(max{Y_i}leq y-1big) \ &=F(y)^N-F(y-1)^N. end{align*}$$

In the present case of geometric distributions with parameter $p=frac{1}{XN}$, we have $$ F(y) = P(Yleq y) = sum_{j=1}^y(1-p)^{j-1}p = psum_{j=0}^{y-1}(1-p)^j = 1-(1-p)^y $$ as a geometric series. We thus get $$Pbig(max{Y_i}= ybig) =F(y)^N-F(y-1)^N = big(1-(1-p)^ybig)^N-big(1-(1-p)^{y-1}big)^N.$$

We can simulate draws from this distribution by drawing your $R$s until you have $N$ successes. Here is R code for multiple $N$s, with histograms and the PMFs overlaid in red:

```
n_nodes <- c(1,2,3,5,7,10)
KK <- 3
XX <- 2
n_sims <- 1e5
sims <- matrix(NA,nrow=n_sims,ncol=length(n_nodes),dimnames=list(1:n_sims,n_nodes))
for ( jj in seq_along(n_nodes) ) {
pb <- winProgressBar(max=n_sims)
for ( ii in 1:n_sims ) {
setWinProgressBar(pb,ii,paste(n_nodes[jj],"nodes,",ii,"of",n_sims))
set.seed(ii) # for reproducibility
RR <- matrix(runif(n_nodes[jj],0,KK),nrow=1,ncol=n_nodes[jj])
while ( TRUE ) {
success_per_column <- apply(RR,2,function(foo)any(foo<=KK/(n_nodes[jj]*XX)))
if ( all(success_per_column) ) break
RR <- rbind(RR,runif(n_nodes[jj],0,KK))
}
sims[ii,jj] <- nrow(RR)
}
close(pb)
}
pmf <- function(yy,nn,XX) {
qq <- 1-1/(nn*XX)
(1-qq^yy)^nn-(1-qq^(yy-1))^nn
}
opar <- par(mfrow=c(2,3))
for ( jj in seq_along(n_nodes) ) {
hist(sims[,jj],breaks=seq(0.5,max(sims[,jj]+0.5)),
main=paste(n_nodes[jj],"nodes"),freq=FALSE,xlab="")
yy <- seq(min(sims[,jj]),max(sims[,jj]))
lines(yy,pmf(yy,n_nodes[jj],XX),col="red",lwd=2)
}
par(opar)
```

Bennett Eisenberg ("On the expectation of the maximum of IID geometric random variables", *Statistics & Probability Letters*, 2008), which I got from Expectation of the maximum of i.i.d. geometric random variables at Math.SE, gives formulas for the expectation and higher moments, e.g. for the expectation (called $E(M_n^ast)$ in his paper, right above section 2) for $N$ nodes:

$$ E(M_N^ast) =sum_{j=1}^N{N choose j}frac{(-1)^{j+1}}{1-q^j}, $$ where $q=1-p=1-frac{1}{XN}$.

This corresponds well with the expectations we get from our simulations above:

```
expectations <- rep(NA,length(n_nodes))
for ( jj in seq_along(n_nodes) ) {
qq <- 1-1/(n_nodes[jj]*XX)
ii <- 1:n_nodes[jj]
expectations[jj] <- sum(choose(n_nodes[jj],ii)*(-1)^(ii+1)/(1-qq^ii))
}
expectations
colMeans(sims)
```

With the following result:

```
> expectations
[1] 2.000000 5.714286 10.555445 22.171623 35.487560 57.602362
> colMeans(sims)
1 2 3 5 7 10
1.99875 5.72014 10.55815 22.17125 35.57198 57.66147
```

EDIT: I just saw your edit with your values of $N$ and $K$. First off, $K$ is irrelevant - whether you draw $Rsim U[0,K]$ and check whether $Rleqfrac{K}{NX}$, or draw $Rsim U[0,1]$ and check whether $Rleqfrac{1}{NX}$ is *exactly the same*. So we can completely disregard $K$.

In contrast, your large $N$ may be more of a numerical problem. If all you are interested in are the moments of your distribution, then I would *really* recommend you look at the Eisenberg (2008) paper, because he explicitly addresses asymptotics for the moments as $Ntoinfty$. If you are actually interested in sampling from the distribution, then there may also be some asymptotical possibilities involved. Judging from the histograms, it may well be that the maximum is asymptotically normal.

Answered by Stephan Kolassa on December 3, 2021

0 Asked on December 29, 2021 by bioinformatics_student

machine learning neural networks prediction interval regression

0 Asked on December 29, 2021 by tjaqu787

4 Asked on December 29, 2021 by nicolas-molano

1 Asked on December 29, 2021

2 Asked on December 29, 2021 by ahsan

3 Asked on December 27, 2021 by tsteatime

correlation matrix matrix inverse multicollinearity regression

1 Asked on December 27, 2021 by string_is_hard

2 Asked on December 27, 2021 by sgtbp

3 Asked on December 27, 2021

feature selection generalized linear model multiple regression r regression

2 Asked on December 27, 2021 by baktaawar

0 Asked on December 27, 2021 by 1111ktq

binary data clustering sample size statistical significance sufficient statistics

1 Asked on December 27, 2021

2 Asked on December 27, 2021

1 Asked on December 27, 2021 by af1402

lme4 nlme multilevel analysis panel data r repeated measures

0 Asked on December 27, 2021

0 Asked on December 27, 2021 by ruffybeo

0 Asked on December 27, 2021

0 Asked on December 27, 2021 by lakshman

Get help from others!

Recent Questions

- MouseLook Script “Pops” back to the last value when the script is enabled after being disabled or destroyed
- Unity app crashes when using unmodified custom Android manifest (didn’t find class “UnityPlayerActivity”)
- How do i draw a ray in unity
- How to test consistency of responses?
- How can I understand these variograms?

Recent Answers

- kjetil b halvorsen on How to test consistency of responses?
- Philipp on How do i draw a ray in unity
- Justin Markwell on Unity app crashes when using unmodified custom Android manifest (didn’t find class “UnityPlayerActivity”)
- eric_kernfeld on How to test consistency of responses?
- DMGregory on MouseLook Script “Pops” back to the last value when the script is enabled after being disabled or destroyed

© 2022 AnswerBun.com. All rights reserved.