[math-fun] Forgot to say: Zipf, etc.

16 Jun 2006

      Henry,

Forgot to say that, once you split a Zipf distribution into
a nice 50/50 split, now you have two groups of probabilities
that aren't Zipf distributions, but you still have to split
them, too!  That makes a method specialized for Zipf distributions
seem unlikely to me.

If you were to take, say, as many of the least-likely symbols
as got closest to 1/2 total probability, you'd have two
Zipf-*like* distributions, and so you could do that recursively,
but I don't think it would give you a code as efficient as
Huffman.  (I can't see how that could be better than the Shannon-
Fano top-down method, which can be less efficient than Huffman's
bottom-up method.)

The thing about Huffman codes of sorted lists (or monotonic functions)
is that, once the Huffman tree is done, ranges of thingies have the
same number of bits, so you can throw away the tree and code using a
table something like this (for the example where 1 <= n <= 16
is the number of the thingy with probability 1/n):

1 <= n <= 1   --> .00     + ( n - 1 )  x .01 (binary)
2 <= n <= 3   --> .010    + ( n - 2 )  x .001
4 <= n <= 7   --> .1000   + ( n - 4 )  x .0001
8 <= n <= 14  --> .11000  + ( n - 8 )  x .00001
15 <= n <= 16 --> .111110 + ( n - 15 ) x .000001

(I calculated this table by hand and it may be wrong,
but the correct table definitely has that form.)

When I looked up the history of the Huffman vs. Shannon-Fano
codes, I found
    http://en.wikipedia.org/wiki/Huffman_coding
which talks about the history and almost everything I mentioned,
including arithmetic codes and patents, except
this way of representing the code in a log-sized table.

  --Steve

Steve Witham

tags

participants (1)