[math-fun] provable pseudo-randomness viewed as about information loss

31 Mar 2016

      The Blum Blum Shub "provably unpredictable" psu-random number generator is
just repeated squaring modulo (p*q)  where p and q are each large primes
that each are 3 mod 4.   Output just one bit per iteration, and the
bits are provably hard to predict in the backwards-time direction, in
the sense that any algorithm that could predict them would also be
able to factor p*q.

Well, seems to me, abstracting what is going on and regarding the
prime factoring stuff as a red herring... that what we have here, is
information loss.  That is, squaring (mod p*q)
is a many-to-one map.  It loses information.  If you try to go
backward in time, there
are several "correct answers"... except only one of them is itself a
square (i.e. could
have been the result of a forward iteration) and hence the "really
correct" answer.

So. Let us contrast iterating a function x:=F(x)
in two cases: where F is  random bijection, and where F is a random
function, both
having both domain and range that is a universe of cardinality U.

Random bijections F:
After iterating F a total of T times in a U-element universe from
a random starting point, the chance the last step creates a cycle is 1/(U+1-T).
So the chance that we evade creating a cycle during steps 1,2,3,...,T is
   (U-1)/U * (U-2)/(U-1) * ... * (U-T)/(U+1-T) = (U-T)/U
so the chance that we are in a giant cycle missing <=M elements of
the universe is M/U.   In other words, in the best 1% of such experiments, our
cycle will miss <=1% of the universe.

Random functions F:
After iterating F a total of T times in a U-element universe from
a random starting point, the chance the last step creates a cycle by
returning to a previous, is T/U.
So the chance that we evade creating a cycle during steps 1,2,3,...,T is
   (U-1)/U * (U-2)/U * (U-3)/U * ... * (U-T)/U
which is   <= (1-(T+1)/(2*U))^T   by Jensen's inequality,
and   >= (1-T/U)^T  by monotonicity of the terms.
Hence if U is large, the period length is more than 63% likely to
be below sqrt(2*U),  but also more than 50% likely to be
above sqrt(U)*ln2=0.69*sqrt(U).
The chance the period exceeds K*sqrt(2*U) is less than exp(-K).
So the best 1% of experiments should have periods exceeding about
ln(100)*sqrt(U)  which is about 4.6*sqrt(U).
but the worst among the best 1% should not exceed ln(100)*sqrt(2*U)
which is about 6.5*sqrt(U).

Point I want to make is:
iterating a random function F which is NOT bijective loses
information, in fact about 1 bit of information per iteration.  One
might therefore expect that if we output 1 bit from each iterate, then
essentially ANY such iteration ought to be "hard to predict" in the
backwards time direction, by any algorithm which doesn't work hard
enough to know about
which elements of the universe are cycle-members and which aren't.

In the Blum-Blum-Shub case, the "cycle members" were the "squares" and
recognizing
a cycle member with 51% accuracy was equivalent to recognizing
squares, which was computationally polynomial-time-equivalent to
factoring p*q.

But I think that in general, where F is implemented using a boolean
circuit, say,
distinguishing cycle members from not is going to be a hell of a lot
harder than
integer factoring.  Integer factoring is soluble in
polynomial time if you own a quantum computer.   But cycle-member recognition
should be beyond quantum assistance; since information-loss is
generally speaking
a quantum-computer killer.

Essentially the only way I know to test cycle-membership for X is just
to iterate F
until it cycles, then ask whether X was on that cycle.   That takes exponential
expected time for random functions F, and also empirically takes exponential
expected time for plenty of actual F.

Now let me note that the
"boolean circuit cycle recognition problem" is COMPLETE.
PROBLEM STATEMENT:
Given a boolean circuit F that inputs N bits and outputs N bits;
and given an N-bit string X; question is: "is X a cycle member?"

Fairly obviously, any oracle capable of answering that with >=51% accuracy
(for both false-positive and false-negative error rates below 49%),
would also be capable of solving cycle membership for any function F
computable in polynomial time, and I think also ought to be capable of
99.9999% accuracy for "most" F.

So regard this as a new (?) computational complexity class.
Can we now prove interesting/useful subclasses of it, remain complete?
(Such as, as a first step, restricted classes of circuits?)

-- 
Warren D. Smith
http://RangeVoting.org  <-- add your endorsement (by clicking
"endorse" as 1st step)

[math-fun] provable pseudo-randomness viewed as about information loss

Warren D Smith