[math-fun] Re: Path Question

23 Apr 2007

      Sorry to take so long responding!

Michael Greenwald writes:
...
Second, I am wondering about the intuition behind the error
metric that was outlined in James Propp's email.  If I understand
it correctly, the idea is to find a construction for an infinite
string S in which the "distributional error" [I don't know any
term (let alone a *better* term) for the error computation used]
over substrings of *all* sizes n < N decreases asymptotically
fastest (I think someone said that log(N)/N is optimal, no?) with
the length of the string prefix.
I probably said it wrong, so let me try to say it a different way.

The thing I want is an infinite sequence S of 0's and 1's with the
following countably many properties:
1) The number of times "0" occurs in the first N bits of S goes
to infinity like N/2 + O(log N)
2) The number of times "1" occurs in the first N bits of S goes
to infinity like N/2 + O(log N)
3) The number of times "00" occurs in the first N bits of S goes
to infinity like N/4 + O(log N)
4) The number of times "01" occurs in the first N bits of S goes
to infinity like N/4 + O(log N)
etc.
(with one property for each finite string of 0's and 1's).  So
the quantification is that for each n, and for each bit-string of
length n, there exists a constant C such that for all N > 1,
the number of times the bit-string of length n occurs in the 
first N bits of S differs from (N-k+1)/2^n by less than C times 
log N.

Note that the constant C may depend on the particular bit-string
of length n one is looking at.

Note also that I've used N > 1 rather than N > N_0 (for some N_0);
this cuts down on the number of quantified variables with no real 
change in meaning (the only price you pay is that you might have to 
use a bigger C).
...
So I'm wondering whether
there's any intuition about why we might care about all length
substrings equally, even substrings pretty close to N in length,
rather than focusing on strings that are small compared to N.
I don't see any reason to think that a sequence of the kind I
want will have good properties for the initial N terms vis-a-vis
substrings of length n when n is a lot bigger than log N.

Steve Witham writes:
...
Wouldn't it be easier to look at the minimum frequency instead of
the distribution of frequencies?  I realize it's not the same
question, but as long as you're hunting in the dark.
I agree: for fixed n and N, one could look at the sequence s of
length n that occurs the least number of times in one's sequence
S of length N, and try to make this quantity as large as possible
by choosing S cleverly.

One candidate for S that deserves more study is the very pretty
Ehrenfeucht-Mycielski sequence: for an introduction see Klaus
Sutner's paper www.cs.cmu.edu/~sutner/papers/em-sequence.pdf .
It's mildly scandalous that we still don't know whether 0's 
and 1's occur in the E-M sequence with asymptotic frequency 1/2
(let alone finite subsequences of length greater than one!).

Jim

James Propp

Fred lunnon

tags

participants (2)