Re: [math-fun] Re: More ZipfHuffing

21 Jun 2006

      You can improve this using Fibonacci (or Zeckendorf) encoding - every 
string ends in 11.  So we have:

1 11 2 bits
2 011 3 bits
3 0011 4 bits
4 1011 4 bits
5 00011 5 bits
6 10011 5 bits
7 01011 5 bits
8 000011 6 bits
etc.

Note, by the way, that changing the radix as the number of symbols 
increases is *not* using the same encoding regardless of the number of 
symbols.

Franklin T. Adams-Watters

-----Original Message-----
From: Henry Baker <hbaker1@pipeline.com>

Franklin, Steve, David, et al:

After a little thinking, I reinvented ternary encoding (or arbitrary 
radix
encoding for radices 2^k-1).  The idea is similar to normal decimal 
notation,
except that we use a radix of 3 or 7 or 15 or ...  The reason for not 
using
binary encoding is that we have to encode a "comma" or something to 
delimit
the end of the integer, so we use up one of the "digits" for this 
purpose.
The larger the radix, the smaller the coding loss.

Here are ternary encodings for small integers:

1 1_ (4 bits, 2 bits per character)
2 2_ 4 bits
3 10_ 6 bits
4 11_ 6 bits
5 12_ 6 bits
6 20_ 6 bits
7 21_ 6 bits
8 22_ 6 bits
9 100_ 8 bits
...

Actually, since the first digit is always 1 or 2, we can subtract 1 bit 
from the
(ternary) encoding without any loss.  The calculation below did not 
make this
optimization.

For a Zipf alphabet of 2^20 symbols, this ternary encoding scheme gets 
an
efficiency of 88.4%.  For a Zipf alphabet of 2^20 symbols, a radix-7
(septary?) scheme gets an efficiency of 97%.

Once again, the smoothness of the Zipf distribution allows us to ignore 
some of
the inefficiencies for the first few symbols.

At 02:48 PM 6/20/2006, franktaw@netscape.net wrote:
...
I'm not sure exactly you define efficiency, but I'm pretty sure that 
this is
impossible.  If the first k values use half the available code space, 
as n goes
to infinity this half of the code space will be used for values with
asymptotically zero (O(1/log(n)) of the probability distribution.  I 
don't see
how this can be anything but inefficient.
Franklin T. Adams-Watters
-----Original Message-----
From: Henry Baker <hbaker1@pipeline.com>
Is there a better encoding of Zipf alphabets such that the encoding of
the symbol of relative probability 1/i is _independent_ of the alphabet
size and the encoding tends towards 100% efficiency asymptotically?

Re: [math-fun] Re: More ZipfHuffing

franktaw＠netscape.net