TransWikia.com

Estimating the blockchain mining time for $N$ nodes

Cross Validated Asked by SlowMountain on December 3, 2021

I am trying to simulate a set of times for the below problem.

There are N nodes. Each node generates a random number($R$) in the range $[0,K]$ per second. Guess the time it takes by each node to generate $R$ such that $R <= frac{K}{(N∗X)}$ . The time should be in terms of $X$.

In my experiment $N> 2^{20}$ and $K = 2^{256}$, so brute forcing is not possible.

Do the times fall under any distribution? If so, then how can I simulate the times?

2 Answers

Thanks to @stephan for pointing it out as a geometric distribution. I found that we can generate a random number in geometric distribution with $log(u)/log(1-p)$. Where $u$ is a uniform distributed number in $[0,1]$.

In this case, I have to run the above formula for $N$ times. To get the set of times in terms of $X$, the possible values of $u$ should be limited to $N$.

Answered by SlowMountain on December 3, 2021

At each time point, each node performs a Bernoulli experiment, with a success probability of $p=frac{1}{XN}$, since success is defined as an outcome lower than $frac{K}{XN}$ for a $U[0,K]$ random variable.

For a single Bernoulli experiment, the waiting time until the first success follows a geometric distribution with parameter $p$.

You have $N$ such experiments running concurrently, and you are interested in the distribution of the time until all experiments have encountered at least one success. Thus, you are interested in the distribution of the maximum of $N$ iid geometric distributions.

Happily, COOLSerdash pointed out that there is a very simple closed form for the probability mass function (PMF) of this maximum. Let

$$ F(y) = P(Yleq y) $$

denote the CDF of a single random variable $Y$, then the CDF of the maximum of $N$ IID random variables $Y_1, dots, Y_N$ is simply

$$ P(Y_1leq y wedge dots wedge Y_Nleq y) = Pbig(max{Y_i}leq ybig) = F(y)^N. $$

Thus, the PMF of a discrete maximum is

$$begin{align*} Pbig(max{Y_i}= ybig) &= Pbig(max{Y_i}leq ybig)-Pbig(max{Y_i}leq y-1big) \ &=F(y)^N-F(y-1)^N. end{align*}$$

In the present case of geometric distributions with parameter $p=frac{1}{XN}$, we have $$ F(y) = P(Yleq y) = sum_{j=1}^y(1-p)^{j-1}p = psum_{j=0}^{y-1}(1-p)^j = 1-(1-p)^y $$ as a geometric series. We thus get $$Pbig(max{Y_i}= ybig) =F(y)^N-F(y-1)^N = big(1-(1-p)^ybig)^N-big(1-(1-p)^{y-1}big)^N.$$

We can simulate draws from this distribution by drawing your $R$s until you have $N$ successes. Here is R code for multiple $N$s, with histograms and the PMFs overlaid in red:

n_nodes <- c(1,2,3,5,7,10)
KK <- 3
XX <- 2

n_sims <- 1e5
sims <- matrix(NA,nrow=n_sims,ncol=length(n_nodes),dimnames=list(1:n_sims,n_nodes))
for ( jj in seq_along(n_nodes) ) {
    pb <- winProgressBar(max=n_sims)
    for ( ii in 1:n_sims ) {
        setWinProgressBar(pb,ii,paste(n_nodes[jj],"nodes,",ii,"of",n_sims))
        set.seed(ii)    # for reproducibility
        RR <- matrix(runif(n_nodes[jj],0,KK),nrow=1,ncol=n_nodes[jj])
        while ( TRUE ) {
            success_per_column <- apply(RR,2,function(foo)any(foo<=KK/(n_nodes[jj]*XX)))
            if ( all(success_per_column) ) break
            RR <- rbind(RR,runif(n_nodes[jj],0,KK))
        }
        sims[ii,jj] <- nrow(RR)
    }
    close(pb)
}

pmf <- function(yy,nn,XX) {
    qq <- 1-1/(nn*XX)
    (1-qq^yy)^nn-(1-qq^(yy-1))^nn
}

opar <- par(mfrow=c(2,3))
    for ( jj in seq_along(n_nodes) ) {
        hist(sims[,jj],breaks=seq(0.5,max(sims[,jj]+0.5)),
            main=paste(n_nodes[jj],"nodes"),freq=FALSE,xlab="")
                yy <- seq(min(sims[,jj]),max(sims[,jj]))
                lines(yy,pmf(yy,n_nodes[jj],XX),col="red",lwd=2)
    }
par(opar)

histograms

Bennett Eisenberg ("On the expectation of the maximum of IID geometric random variables", Statistics & Probability Letters, 2008), which I got from Expectation of the maximum of i.i.d. geometric random variables at Math.SE, gives formulas for the expectation and higher moments, e.g. for the expectation (called $E(M_n^ast)$ in his paper, right above section 2) for $N$ nodes:

$$ E(M_N^ast) =sum_{j=1}^N{N choose j}frac{(-1)^{j+1}}{1-q^j}, $$ where $q=1-p=1-frac{1}{XN}$.

This corresponds well with the expectations we get from our simulations above:

expectations <- rep(NA,length(n_nodes))
for ( jj in seq_along(n_nodes) ) {
    qq <- 1-1/(n_nodes[jj]*XX)
    ii <- 1:n_nodes[jj]
    expectations[jj] <- sum(choose(n_nodes[jj],ii)*(-1)^(ii+1)/(1-qq^ii))
}
expectations
colMeans(sims)

With the following result:

> expectations
[1]  2.000000  5.714286 10.555445 22.171623 35.487560 57.602362
> colMeans(sims)
       1        2        3        5        7       10 
 1.99875  5.72014 10.55815 22.17125 35.57198 57.66147

EDIT: I just saw your edit with your values of $N$ and $K$. First off, $K$ is irrelevant - whether you draw $Rsim U[0,K]$ and check whether $Rleqfrac{K}{NX}$, or draw $Rsim U[0,1]$ and check whether $Rleqfrac{1}{NX}$ is exactly the same. So we can completely disregard $K$.

In contrast, your large $N$ may be more of a numerical problem. If all you are interested in are the moments of your distribution, then I would really recommend you look at the Eisenberg (2008) paper, because he explicitly addresses asymptotics for the moments as $Ntoinfty$. If you are actually interested in sampling from the distribution, then there may also be some asymptotical possibilities involved. Judging from the histograms, it may well be that the maximum is asymptotically normal.

Answered by Stephan Kolassa on December 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP