[math-fun] biased to unbiased coin flips in real life

9 Apr 2003

      So I assume you know the old problem of "convert a sequence
of biased coin flips (with unknown bias) into a (not necessarily
same length) sequence of unbiased coin flips".

I recently learned two interesting things about this problem:
it was apparently first posed (& of course solved) by Johnny
von Neumann and that it is used in certain Intel chips as
part of a hardware random number generator process.
(See http://www.cryptography.com/resources/whitepapers/IntelRNG.pdf
for details.)

While looking at this, I wondered if there was a more efficient
scheme that processed more than two bits at a time.  A little
bit of thought revealed that there was, for example, this 4 bit
code:
0000, 1111, 1100, 0011 -> output nothing
0001, 1110, 0101 -> 00
0010, 1101, 0110 -> 01
0100, 1011, 1001 -> 10
1000, 0111, 1010 -> 11
This is a win because the 2 bit code would only output 1 bit
for sequences like 0001, while here we output 2 bits.

In general, good codes can be obtained by mapping an n bit block
with k 1's in it to an output with round_down( log_2( choose(n,k) ) )
bits, but these will only be more efficient than the two bit
code for all biases when n is a power or two.  The reason is that
for very low or very high biases, the efficiency depends most on
what you do when there is only one 0 or 1 in the input block.  The
two bit code will always output one bit in this circumstance, while
most of the others (including those for all odd n) will sometimes
output no bits.

In fact, if we define the efficiency of a code as the ratio of
the expected output length to the input length, then the efficiency
of the 2^m block codes goes to 1 as m goes to infinity.  The
worst case is when p is very small (or very large); for these
the efficiency is approximately p m.  On the other hand, if the
bias is known to be near 0.5, then odd length codes seem to be
more efficient.  (The length 7 code has efficiency 29/56, which
is not beaten by any code of length less than 15.)

I'm a little bit curious as to why Intel didn't implement a more
efficient code.  One guess is that they didn't know about them.
Another is that the 2 bit code was good enough so as to make
wasting silicon on more a complicated code not worth it.  Still
another possible reason is that the bias of the input coin flipper
isn't exactly constant, but has (say) a small linear trend over
time which is small enough to ignore over 2 bits but maybe not
over longer lengths.

-Thomas C

[math-fun] biased to unbiased coin flips in real life

Thomas Colthurst