TransWikia.com

How many combinations and how to find a (standard?) distribution that matches?

Cross Validated Asked by d-b on February 10, 2021

Take a string ABCDE and insert dots in it according to these rules:

  1. First and last character must not be a dot.
  2. 0-4 dots
  3. A dot may not follon upon another dot. In other words, a dot must always be preceded by a letter.

Valid strings:

ABCDE
AB.CD.E
A.B.C.D.E

Invalid strings:

.ABCDE
ABC..DE

With 0 dots, only one combination is possible (ABCDE), with 1 dot four combinations are possible (A.BCDE, AB.CDE and so on). With 2 dots I think there are 6 possible combinations:

A.B.CDE
A.BC.DE
A.BCD.E
AB.C.DE
AB.CD.E
ABC.D.E

3 dots:

A.B.C.DE
A.B.CD.E
A.BC.D.E
AB.C.D.E

4 dots – 1 combination.

If I made this right, the total number of combination with a 5 character long string to start with is 1 + 4 + 6 + 4 + 1 = 16 (for a 4 character string it is 8, 3 character -> 4). It seems the formula to calculate the number of possible combinations is 2^(n-1) where n is the number of characters in the initial string.

What kind of distribution is this? I want to create random numbers distributed like this (that is, if I draw 16 numbers between 0 and 4, on average 6 of them must be 2, 4 of them 1, and 4 of them 3 and so on) in Java.

And, if you have a longer string (the actual string I work with is 16 letters), how do you calculate the number of combinations you get if you, e.g., insert 5 dots into these 16 letters according to the rules above?

2 Answers

The questions appears to be a XY problem. You asked in the question "What kind of distribution is this [the number of dots in a string of length $n$]?", and "how I can create random numbers [the number of dots] using that [the distribution]?" but from the comments, you actually wanted to "generate strings with a random number of dots as input for something I am working on", with the random number of dots following the said distribution.

We will address both questions here, perhaps using a more procedural, CS-friendly perspective (spoiler: the solution to the second problem is easier than the first).

Inserting $k$ dots is a "$n$ choose $k$" problem

Consider a string with $n+1$ characters, the rules you gave mean that one can either insert zero or one dot for each of the $n$ "space" between characters. If one were to insert $k in {0, ..., n}$ dots in total, the number of ways to do that is $binom{n}{k}$, or "$n$ choose $k$".

If all possible combinations are equal, the chance of seeing $k$ dots follows a binomial distribution:

Across all possible values of $k$, one would have $ binom{n}{0} + binom{n}{1} + ... binom{n}{n} = 2^n $ possible ways to insert the dots. The chance in seeing a way with $k$ dots inserted is

$$ frac{binom{n}{k}}{2^n} = binom{n}{k} left(frac{1}{2}right)^{k}left(frac{1}{2}right)^{n-k}, $$

which is the probability mass function for a binomial distribution with $n$ trials and a success probability of 1/2. It is not a coincidence - it is a common way binomial distributions are constructed.

You can generate the random strings in a 2-step process...

You are then interested in creating random numbers, presumably the number of dots, so that they can generate strings with the said number of dots for downstream applications.

Given the above, one way to do it is:

  1. Generating the number of dots $k$: this can be done by taking a sample from $Bin(n,1/2)$ (i.e. binomial distribution with $n$ trials and success probability 1/2).
  2. Given $k$, generate the positions of the dots, randomly taking $k$ elements from the list/array [0, 1, 2, ..., n-1] should get what you want.

The random samples are easily obtainable in many programming languages - Java has Java.util.Random, Python has numpy.random, and R does that natively.

...but generating them in one step is cooler.

However, you are actually interested in generating the strings with dots in random location (or at least you appear to be interested). This means the value of $k$ doesn't really matter as long as it fits the binomial distribution in the long run.

In this case, we should take a step back, and recall we can get a random $k$ that follows a binomial distribution by flipping a fair coin $n$ times, and count how many time it landed on heads (which we denote by 1, and 0 otherwise).

The good news is, you now have a sequence of 0s and 1s for free:

  • Each sequence of length $n$ are equally likely to be generated, and thus for a known $k$ we will have $binom{n}{k}$ equally likely sequences with $k$ 1s that does the random choice for us. This is done by taking the indices of the 1s. Step 2 above sorted.

  • We also established above that the number of 1s in a randomly generated sequence follows the binomial distribution. This means Step 1 above is also sorted.

The even better news? You can generate such a sequence quickly with a (pseudo) random number generator - for a sequence of length $n$, one simply needs to generate a random integer between $0$ and $2^n - 1$ and convert the generated integer into its binary representation.

In short, if you have has a string with 16 characters, they will generate a random integer between $0$ and $32767$, say $19342$. $19342$ has a binary representation of 100101110001110, and thus you will insert a dot after the 0th, 3rd, 5th, 6th, ... characters to get a string with random dots inserted. The number of dots will then follow the binomial distribution in the long run.

Answered by B.Liu on February 10, 2021

With 5 letters there are 4 positions for dots $sum_{i=0}^4 {4choose i} = 1+4+6+4+1 = 16 = 2^4.$

In general, if I understand your problem correctly, with $n$ letters there are $n=1$ potential locations for $0$ through $n-1$ dots, and the number of possibilities is $2^{n-1}.$

The identity $sum_{i=0}^m {mchoose i} = 2^m.$ is proved from the binomial theorem $(a+b)^m = sum_{i=0}^m {mchoose i}a^ib^{m-i},$ by letting $a=b=1.$

Intuitively, the number of subsets of a set with $m$ elements (including 'improper' null subset and whole set itself) is $2^m$ because there are two choices for each element--in or out. Also, there are ${mchoose i}$ subsets of order $i.$

Answered by BruceET on February 10, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP