Re: [math-fun] Zipf, Harmonic series & Egyptian fractions

15 Jun 2006

      On 6/15/06, Henry Baker <hbaker1@pipeline.com> wrote:
...
Zipf's Law says that the probability of the n'th most popular thingy
is proportional to 1/n.  Clearly this probability exists only if the
universe of thingy's is finite.
I'm interested in efficient encodings of thingy's.  The Shannon-Fano
code says to split the thingy's into two groups having approximately
equal probabilities, assign on group an initial bit "0" and the other
group an inital bit "1", and then recurse within the groups.  This
means to me that one must express the fraction 1/2 as a sum of probabilities,
each of which is proportional to 1/k for some integer k.
Is there a general way to do this for arbitrary m, where m is the number
of thingy's ?  I.e., is there a general scheme for splitting the infinite
set {1/k} into two pieces that gives an optimal encoding for every m ?
If such a general scheme isn't exactly optimal, what about a scheme that
is within epsilon of being optimal ?
Don't you also want to optimise the assignments within each half,
and so on recursively down to the leaves of the (search) tree;
in which case an appropriate weight function is required to combine
the discrepencies at each node?

Dunno whether maybe Huffman coding  might be at all relevant to
this problem --- but you've presumably considered that possibility.

Oh, and shouldn't it be "thingies"?

Fred Lunnon

Re: [math-fun] Zipf, Harmonic series & Egyptian fractions

Fred lunnon