[math-fun] Re: Path Question
Sorry to take so long responding! Michael Greenwald writes:
Second, I am wondering about the intuition behind the error metric that was outlined in James Propp's email. If I understand it correctly, the idea is to find a construction for an infinite string S in which the "distributional error" [I don't know any term (let alone a *better* term) for the error computation used] over substrings of *all* sizes n < N decreases asymptotically fastest (I think someone said that log(N)/N is optimal, no?) with the length of the string prefix.
I probably said it wrong, so let me try to say it a different way. The thing I want is an infinite sequence S of 0's and 1's with the following countably many properties: 1) The number of times "0" occurs in the first N bits of S goes to infinity like N/2 + O(log N) 2) The number of times "1" occurs in the first N bits of S goes to infinity like N/2 + O(log N) 3) The number of times "00" occurs in the first N bits of S goes to infinity like N/4 + O(log N) 4) The number of times "01" occurs in the first N bits of S goes to infinity like N/4 + O(log N) etc. (with one property for each finite string of 0's and 1's). So the quantification is that for each n, and for each bit-string of length n, there exists a constant C such that for all N > 1, the number of times the bit-string of length n occurs in the first N bits of S differs from (N-k+1)/2^n by less than C times log N. Note that the constant C may depend on the particular bit-string of length n one is looking at. Note also that I've used N > 1 rather than N > N_0 (for some N_0); this cuts down on the number of quantified variables with no real change in meaning (the only price you pay is that you might have to use a bigger C).
So I'm wondering whether there's any intuition about why we might care about all length substrings equally, even substrings pretty close to N in length, rather than focusing on strings that are small compared to N.
I don't see any reason to think that a sequence of the kind I want will have good properties for the initial N terms vis-a-vis substrings of length n when n is a lot bigger than log N. Steve Witham writes:
Wouldn't it be easier to look at the minimum frequency instead of the distribution of frequencies? I realize it's not the same question, but as long as you're hunting in the dark.
I agree: for fixed n and N, one could look at the sequence s of length n that occurs the least number of times in one's sequence S of length N, and try to make this quantity as large as possible by choosing S cleverly. One candidate for S that deserves more study is the very pretty Ehrenfeucht-Mycielski sequence: for an introduction see Klaus Sutner's paper www.cs.cmu.edu/~sutner/papers/em-sequence.pdf . It's mildly scandalous that we still don't know whether 0's and 1's occur in the E-M sequence with asymptotic frequency 1/2 (let alone finite subsequences of length greater than one!). Jim
On 4/23/07, James Propp <propp@math.wisc.edu> wrote:
Steve Witham writes:
Wouldn't it be easier to look at the minimum frequency instead of the distribution of frequencies? I realize it's not the same question, but as long as you're hunting in the dark.
I agree: for fixed n and N, one could look at the sequence s of length n that occurs the least number of times in one's sequence S of length N, and try to make this quantity as large as possible by choosing S cleverly.
There seems to be some confusion here --- serving to emphasise the point I was trying to make earlier about considering the standard deviation of the distances between successive occurrences of a given word, rather than dragging in unnecessary sums to N. Unless the (asymptotic) density of every word of fixed length n is 2^(-n), --- i.e. the sequence is oo-distributed --- you aren't even going to consider it. What will affect your discrepancy measure is _not_ the number of occurrences of the word in some interval of interest, but how irregularly they occur --- effectively, the standard deviation. Fred Lunnon
participants (2)
-
Fred lunnon -
James Propp