TransWikia.com

Encoding sequence of unfair coin flips

Signal Processing Asked by Certusic on November 5, 2021

The question: Consider transmitting the results of $1000$ flips of an unfair coin where the probability of heads is given by $p_H$. The information contained in an unfair coin flip can be computed:

$p_Hlog_{2}(1/p_H)+(1−p_H)log_{2}(1/(1−p_H))$

For $pH=0.999$, this entropy evaluates to $.0114.$ Can you think of a way to encode $1000$ unfair coin flips using, on average, just $11.4$ bits? (question from https://web.mit.edu/6.02/www/f2011/handouts/2.pdf)

My wrong answer: I thought that I could encode the location of the bits which turn up tails. Since there are 1000 flips, I could encode every flip using 10 bits ($2^{10}=1024$). taking the average expected length to encode each flip and then multiplying by $1000$ for all the flips gives:

$1000[(0.999)(0)+(0.001)(10)]\
1000(0.001)(10)\
10$

But I know that any encoding which averages a smaller length in bits than the entropy must have some ambiguity in the message, so since $10<11.4$, what information is my coding system missing?

3 Answers

One possible encoding scheme is to get the instance of your random process, pick the positions of the "tails" and encode for their position.

Intuitively, for this code, the length of the code is $10$ bits times the number of tails. This will result on codes of different lengths depending on the number of occurrences of tails. As the $1000$ draws are independent, you can compute the probability for each count $N$ of tails using the binomial probability distribution : $p(N) = binom{1000}{N} cdot p_H^N cdot (1-p_H)^{1000-N}$, where $p_H= 1- p_T = 1 - 0.999 = 0.001$ is the probability of "heads".

On average, you obtain a code length of $$mathcal{C} = sum_{N=1 ldots 1000} 10 cdot N cdot p(N)$$

It follows that $$mathcal{C} = 10 sum_{N=0 ldots 1000} N cdot frac{1000!}{N! (1000-N)!} cdot p_H^N cdot (1-p_H)^{1000-N} $$

that is, the mean of the binomial:

$$mathcal{C} = 10 cdot p_H cdot 1000 = 10 $$

The extra information comes from the fact that you know a priori that thes probability is close to one. A similar encoding with $p_H=.5$ would result in code longer by a factor $5$.

Answered by meduz on November 5, 2021

The problem is the assumed knowledge that the receiver needs to have. In your coding scheme you assume that the receiver knows that you transmit exactly $1000$ symbols. If the receiver didn't know that, there is no way to distinguish the following two cases:

  1. $2$ tails at certain positions inside one block of $1000$ symbols
  2. $2$ tails at the same positions as for case 1. but in two different blocks

So your coding scheme is incomplete and that's why you end up with an average bit rate less than what we would expect from the entropy of the source.

Also note that your scheme, even though incomplete, will exceed the minimum possible bit rate for larger block lengths. That limit of the block length beyond which the required rate becomes larger than the minimum possible rate can be computed as the smallest integer value of $N$ satisfying

$$plceillog_2(N)rceil>-plog_2(p)-(1-p)log_2(1-p)tag{1}$$

where $p$ denotes the probability of a tail.

Apart from the above, you would also need to assign a codeword to the case that there is no tail inside a given block. Of course, for this likely case it would be wise to choose a short codeword.

Answered by Matt L. on November 5, 2021

I will only answer the first part, why your encoding doesn't work.

Let $Z = X_1, ..., X_{1000} sim text{Bernoulli}(0.999)$.

Note that all random variables are i.i.d. Then

begin{align*} E[-log_2(Z)] &= E[-log_2(P(X_1)) - cdots - log_2(P(X_{1000}))]\ &= E[-log_2(P(X_1))] + cdots + E[-log_2(P(X_{1000}))]\ &= 1000E[-log_2(P(X_1))]\ &= 1000left(-0.999log_2(0.999) - 0.001log_2(0.001)right)\ &approx 11.4078 end{align*}

This is what we should be able to achieve. Next, let's concatenate all coin flips ${0, 1}^{1000}$. Each position is one flip:

$C(text{1st flip head}, dots, text{998 flip head, 999th flip tail}) = 0 cdots 01$ (length: 1000)

This is a single binary number but requires a length of 1000 bit which would be a bit too long.

Your solution would be to only encode the position of tails. For example, tail = position 200 and tail = 800. Then $800 = 1100100000$, $200 = 0011001000$. We ignore heads. Let's write this more formally.

A code is a function $C : mathcal{X} to Sigma^*$ where $Sigma = {0, 1}$ and $mathcal{X} = {0, dots, 999}$. Then $C(800) = 1100100000$ and $C(200) = 0011001000$. Each $x in mathcal{X}$ appears with probability $mathbb{P}(X = x)$.

For heads: we want a code length $0$ e.g. $C(12) = C(56) = epsilon$. However, here the problem starts. This code is singular (not non-singular), because multiple code words map to $epsilon$ (not injective). Often one assumes injectivity and/or $Sigma^{+}$ (without $epsilon$).

In Elements of Information theory, we find the following theorem:

Let $l_1^*, l_2^*, dots, l_m^*$ be optimal codeword lengths for a source distribution $mathbf{p}$ and a $D$-ary alphabet, and let $L^{*}$ be the associated expected length of an optimal code ($L^* = sum p_il_i^*$). Then $$H_D(X) leq L^* < H_D(X) + 1$$

But this theorem holds only for uniquely decodable codes / prefix codes / instanteous codes. So you need injectivity and can't ignore heads (if you want to use the theorem).

Answered by displayname on November 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP