Thu, 19 Apr 2007 18:12:21 +0100 "Fred lunnon" <fred.lunnon@gmail.com>
On 4/16/07, James Propp <propp@math.wisc.edu> wrote:
This topic provides me with a good opportunity to raise a question that
has
been on my mind for a year or two:
Can anyone find an explicit construction of an infinite sequence S of 0' s and 1's with the property that for any finite bit-string s, the relative frequency with which s occurs in the first N bits of S converges to (1/2)^(length of s) with error that falls like (log N)/N?
Note that the Champernowne sequence A030190 has error 1/(log N). An explicit construction of Alon, Kohayakawa, Mauduit, Moreira and R\"odl allows one to achieve error 1/N^c for any exponent c greater than 2/3 (which beats what one would get from a random sequence of bits). [*Less than* 2/3? Gareth McCaughan]
Jim Propp
Suppose we lower our sights, and just try to improve the dismal performance of the Champernowne sequence. Here's a couple of plausible-looking upgrades: firstly, instead of cheerfully dropping initial zeros, at iteration n we append a block in which every binary n-bit word in full is concatenated; secondly, instead of the conventional binary ordering, we arrange these words in Gray-code order. Also I've just noticed that my program reverses the conventional digit-order within a word: least significant bit on the left! ... At first sight these improvements appear singularly unsuccessful: the frequency of words of size n in block n still varies wildly from 1 to 2n -1. But a closer look reveals the astonishing fact that the frequencies of all smaller sizes m < n in block n are constant! I have some naive/ignorant questions while trying to understand this. First, what does this construction buy us that the deBruijn sequence doesn't? I see how the deBruijn sequence gets us an even distribution of all n-bit sequences in the first N'=2^n (actually, n-1+2^n bits, if we want to be picky), but I don't see how we know what the distribution of n-bit substrings will be *after* that (I'm missing the part that even tells us that the error (|d(N)| as defined in the earlier mail, but restricted to only n-bit substrings) even drops after those first N' bits). This construction seems to try to do the same thing, but in sequences that are longer by roughly a factor of n. From this I am guessing that I am missing the point of this construction. Second, I am wondering about the intuition behind the error metric that was outlined in James Propp's email. If I understand it correctly, the idea is to find a construction for an infinite string S in which the "distributional error" [I don't know any term (let alone a *better* term) for the error computation used] over substrings of *all* sizes n < N decreases asymptotically fastest (I think someone said that log(N)/N is optimal, no?) with the length of the string prefix. So I'm wondering whether there's any intuition about why we might care about all length substrings equally, even substrings pretty close to N in length, rather than focusing on strings that are small compared to N. For a concrete example, let's say we define a "good enough" frequency distribution (to pick a random example, perhaps the distribution is "good enough" if the frequency of all strings of size n lies between 1/(k (log n) 2^n) and k (log n)/(2^n) for some constant k). It then makes sense to me to define a function F_S(n) over a string S that returns an N' for each n such that for all prefixes of S with length N > N', the distributional error for substrings of size n is "good enough". Then maybe we want to find a string S such that F_S(n) increases slowly (for S = the (rotated) deBruijn sequence, and assuming that substrings of size n behave well even after their initial 2^n deBruijn prefix (and as I said, I have to confess that I don't see how we can guarantee that they are evenly distributed after that), then F_S(n) should grow like O(2^n), while for S being the Champernowne-Gray string I think F_S(n) would grow like O(n 2^n). (But I'm out of my depth here, I fear). My question is really whether the two error metrics are related? I assume they're not, why is the first more interesting than the 2nd? Is my confusion just a symptom of my not understanding the basic thing we are trying to do? My apologies if these questions should have obvious answers, or if I'm just grabbing hold of this problem from the wrong end. I'm just feeling dense. Thanks.