# Estimating the blockchain mining time for $N$ nodes

I am trying to simulate a set of times for the below problem.

There are N nodes. Each node generates a random number($$R$$) in the range $$[0,K]$$ per second. Guess the time it takes by each node to generate $$R$$ such that $$R <= frac{K}{(N∗X)}$$ . The time should be in terms of $$X$$.

In my experiment $$N> 2^{20}$$ and $$K = 2^{256}$$, so brute forcing is not possible.

Do the times fall under any distribution? If so, then how can I simulate the times?

Cross Validated Asked by SlowMountain on December 3, 2021

Thanks to @stephan for pointing it out as a geometric distribution. I found that we can generate a random number in geometric distribution with $$log(u)/log(1-p)$$. Where $$u$$ is a uniform distributed number in $$[0,1]$$.

In this case, I have to run the above formula for $$N$$ times. To get the set of times in terms of $$X$$, the possible values of $$u$$ should be limited to $$N$$.

Answered by SlowMountain on December 3, 2021

At each time point, each node performs a Bernoulli experiment, with a success probability of $$p=frac{1}{XN}$$, since success is defined as an outcome lower than $$frac{K}{XN}$$ for a $$U[0,K]$$ random variable.

For a single Bernoulli experiment, the waiting time until the first success follows a geometric distribution with parameter $$p$$.

You have $$N$$ such experiments running concurrently, and you are interested in the distribution of the time until all experiments have encountered at least one success. Thus, you are interested in the distribution of the maximum of $$N$$ iid geometric distributions.

Happily, COOLSerdash pointed out that there is a very simple closed form for the probability mass function (PMF) of this maximum. Let

$$F(y) = P(Yleq y)$$

denote the CDF of a single random variable $$Y$$, then the CDF of the maximum of $$N$$ IID random variables $$Y_1, dots, Y_N$$ is simply

$$P(Y_1leq y wedge dots wedge Y_Nleq y) = Pbig(max{Y_i}leq ybig) = F(y)^N.$$

Thus, the PMF of a discrete maximum is

begin{align*} Pbig(max{Y_i}= ybig) &= Pbig(max{Y_i}leq ybig)-Pbig(max{Y_i}leq y-1big) \ &=F(y)^N-F(y-1)^N. end{align*}

In the present case of geometric distributions with parameter $$p=frac{1}{XN}$$, we have $$F(y) = P(Yleq y) = sum_{j=1}^y(1-p)^{j-1}p = psum_{j=0}^{y-1}(1-p)^j = 1-(1-p)^y$$ as a geometric series. We thus get $$Pbig(max{Y_i}= ybig) =F(y)^N-F(y-1)^N = big(1-(1-p)^ybig)^N-big(1-(1-p)^{y-1}big)^N.$$

We can simulate draws from this distribution by drawing your $$R$$s until you have $$N$$ successes. Here is R code for multiple $$N$$s, with histograms and the PMFs overlaid in red:

n_nodes <- c(1,2,3,5,7,10)
KK <- 3
XX <- 2

n_sims <- 1e5
sims <- matrix(NA,nrow=n_sims,ncol=length(n_nodes),dimnames=list(1:n_sims,n_nodes))
for ( jj in seq_along(n_nodes) ) {
pb <- winProgressBar(max=n_sims)
for ( ii in 1:n_sims ) {
setWinProgressBar(pb,ii,paste(n_nodes[jj],"nodes,",ii,"of",n_sims))
set.seed(ii)    # for reproducibility
RR <- matrix(runif(n_nodes[jj],0,KK),nrow=1,ncol=n_nodes[jj])
while ( TRUE ) {
success_per_column <- apply(RR,2,function(foo)any(foo<=KK/(n_nodes[jj]*XX)))
if ( all(success_per_column) ) break
RR <- rbind(RR,runif(n_nodes[jj],0,KK))
}
sims[ii,jj] <- nrow(RR)
}
close(pb)
}

pmf <- function(yy,nn,XX) {
qq <- 1-1/(nn*XX)
(1-qq^yy)^nn-(1-qq^(yy-1))^nn
}

opar <- par(mfrow=c(2,3))
for ( jj in seq_along(n_nodes) ) {
hist(sims[,jj],breaks=seq(0.5,max(sims[,jj]+0.5)),
main=paste(n_nodes[jj],"nodes"),freq=FALSE,xlab="")
yy <- seq(min(sims[,jj]),max(sims[,jj]))
lines(yy,pmf(yy,n_nodes[jj],XX),col="red",lwd=2)
}
par(opar)


Bennett Eisenberg ("On the expectation of the maximum of IID geometric random variables", Statistics & Probability Letters, 2008), which I got from Expectation of the maximum of i.i.d. geometric random variables at Math.SE, gives formulas for the expectation and higher moments, e.g. for the expectation (called $$E(M_n^ast)$$ in his paper, right above section 2) for $$N$$ nodes:

$$E(M_N^ast) =sum_{j=1}^N{N choose j}frac{(-1)^{j+1}}{1-q^j},$$ where $$q=1-p=1-frac{1}{XN}$$.

This corresponds well with the expectations we get from our simulations above:

expectations <- rep(NA,length(n_nodes))
for ( jj in seq_along(n_nodes) ) {
qq <- 1-1/(n_nodes[jj]*XX)
ii <- 1:n_nodes[jj]
expectations[jj] <- sum(choose(n_nodes[jj],ii)*(-1)^(ii+1)/(1-qq^ii))
}
expectations
colMeans(sims)


With the following result:

> expectations
[1]  2.000000  5.714286 10.555445 22.171623 35.487560 57.602362
> colMeans(sims)
1        2        3        5        7       10
1.99875  5.72014 10.55815 22.17125 35.57198 57.66147


EDIT: I just saw your edit with your values of $$N$$ and $$K$$. First off, $$K$$ is irrelevant - whether you draw $$Rsim U[0,K]$$ and check whether $$Rleqfrac{K}{NX}$$, or draw $$Rsim U[0,1]$$ and check whether $$Rleqfrac{1}{NX}$$ is exactly the same. So we can completely disregard $$K$$.

In contrast, your large $$N$$ may be more of a numerical problem. If all you are interested in are the moments of your distribution, then I would really recommend you look at the Eisenberg (2008) paper, because he explicitly addresses asymptotics for the moments as $$Ntoinfty$$. If you are actually interested in sampling from the distribution, then there may also be some asymptotical possibilities involved. Judging from the histograms, it may well be that the maximum is asymptotically normal.

Answered by Stephan Kolassa on December 3, 2021

## Related Questions

### Prediction Intervals (Conformal Predictions) for Regression Problems

0  Asked on December 29, 2021 by bioinformatics_student

### Why not use % change in regression instead of log diff?

0  Asked on December 29, 2021 by tjaqu787

### Estimating expected values for correlated data using random effects models

4  Asked on December 29, 2021 by nicolas-molano

### Should multiple testing correct with bonferroni ever reduce a p value’s size?

1  Asked on December 29, 2021

### Compute Mean of a Clipped Normal Distribution

2  Asked on December 29, 2021 by ahsan

### What is an example of perfect multicollinearity?

3  Asked on December 27, 2021 by tsteatime

### Should I use a seasonal arima or stl decomposition and model residuals only?

1  Asked on December 27, 2021 by string_is_hard

### Endogenous controls in linear regression – Alternative approach?

2  Asked on December 27, 2021 by sgtbp

### Variable selection in logistic regression model

3  Asked on December 27, 2021

### EM Algorithm Derivation, Discrete Case

1  Asked on December 27, 2021

### Should we really do Re-Sampling in Class Imbalance data?

2  Asked on December 27, 2021 by baktaawar

### How to determine data size is statistically efficient?

0  Asked on December 27, 2021 by 1111ktq

### (Non-limit) distribution of maxima from different univariate, discrete and stationary time series

1  Asked on December 27, 2021

### Confused about stationarity and ARIMA processes

2  Asked on December 27, 2021

### How to remove correlated features?

1  Asked on December 27, 2021 by ichait

### Between- and within-person level effects when using multilevel modelling for longitudinal data in R

1  Asked on December 27, 2021 by af1402

### How can I separate the abundance factor of the incidence results?

0  Asked on December 27, 2021

### Chi Square Test on non-numeric data in R

0  Asked on December 27, 2021 by ruffybeo

### How to make correlation test with compositional data?

0  Asked on December 27, 2021

### What are the parameters in signal recovery? Whether source of these parameters are the sampling property of impulse response?

0  Asked on December 27, 2021 by lakshman