[math-fun] Forgot to say: Zipf, etc.
Henry, Forgot to say that, once you split a Zipf distribution into a nice 50/50 split, now you have two groups of probabilities that aren't Zipf distributions, but you still have to split them, too! That makes a method specialized for Zipf distributions seem unlikely to me. If you were to take, say, as many of the least-likely symbols as got closest to 1/2 total probability, you'd have two Zipf-*like* distributions, and so you could do that recursively, but I don't think it would give you a code as efficient as Huffman. (I can't see how that could be better than the Shannon- Fano top-down method, which can be less efficient than Huffman's bottom-up method.) The thing about Huffman codes of sorted lists (or monotonic functions) is that, once the Huffman tree is done, ranges of thingies have the same number of bits, so you can throw away the tree and code using a table something like this (for the example where 1 <= n <= 16 is the number of the thingy with probability 1/n): 1 <= n <= 1 --> .00 + ( n - 1 ) x .01 (binary) 2 <= n <= 3 --> .010 + ( n - 2 ) x .001 4 <= n <= 7 --> .1000 + ( n - 4 ) x .0001 8 <= n <= 14 --> .11000 + ( n - 8 ) x .00001 15 <= n <= 16 --> .111110 + ( n - 15 ) x .000001 (I calculated this table by hand and it may be wrong, but the correct table definitely has that form.) When I looked up the history of the Huffman vs. Shannon-Fano codes, I found http://en.wikipedia.org/wiki/Huffman_coding which talks about the history and almost everything I mentioned, including arithmetic codes and patents, except this way of representing the code in a log-sized table. --Steve
participants (1)
-
Steve Witham