Re: [math-fun] Binary codes as binary numbers

28 Jun 2006

      How about the Catalan numbers?

0. 0 1 0 2 0 5 0 14 0 42 ...

or similarly

0. 1 0 1 0 2 0 5 0 14 0 42 ...

or combine and shift

0. 0 1 1 1 2 2 5 5 14 14 42 42 ...

or another combination

0. 0 0 1 2 2 4 5 10 14 28 42 84 ...

Does sum_{k>0} C_k/4^k = 1?  It definitely converges (C_k ~ 
4^k/(sqrt(pi)*k^3/2)), and the limit is close to 1.

Franklin T. Adams-Watters

-----Original Message-----
From: Steve Witham <sw@tiac.net>

A nice way to look at and compare the binary codes we've been talking 
about is as binary representations (!) of the number one. This 
corresponds to a histogram of lengths of codewords. 

The most trivial code is for an alphabet of one symbol, whose 
probability is 1. I'll write it like this: 
  1. 

The other reading of that is, there is one symbol that encodes with 
no bits. (The question of how do you know how many of this symbol 
have been transmitted is outside the scope of everything we've been 
talking about!) It probably seems strange to include the zero as a 
possible length of codes, but I had an influential boss who said 
always to handle the zero case. 

Unary means zero or more zero bits followed by a one bit (or, zero or 
more one bits followed by a zero): 
  0. 1 1 1 1 1... 

The codes are 1, 01, 001, 0001, etc., in other words, one code of 
every length starting at 1. The binary interpretation is 
  0/1 + 1/2 + 1/4 + 1/8 + 1/16... = 1 

Eight-bit binary has 256 symbols that use eight bits each: 
  0. 0 0 0 0 0 0 0 256 

Ternary uses three combinations of two bits as digits and one as a 
terminator, so the lengths are always even: 
  0. 0 1 0 3 0 9 0 27 0 81 0 243... 

"Septary" or radix seven uses groups of three bits: 
  0. 0 0 1 0 0 7 0 0 49 0 0 343... 

Fibonacci or Zeckendorf coding is very interesting written this way: 
  0. 0 1 1 2 3 5 8 13 21 34 55 89... 

This works because 1/4 + 1/8 + 2/16 + 3/32 + 5/64 + 8/128 ... = 1. 

Ternary and septary are lumpy and Fibonacci is smooth--just looking 
at this you can guess that Fibonacci is more efficient with small 
numbers. 

Here are the optimal codes for Zipfian distributions over alphabets 
of 3, 15, 255 and 65535 symbols: 
  0. 1 2 
  0. 0 1 2 4 8 
  0. 0 0 1 2 4 8 16 32 64 128 
In general the optimal encoding looks like that when the size of the 
alphabet is 2^(2^n) - 1. 

The awful combination code I talked about the other day uses n zero 
bits, n bits to tell the number of data bits ("exponent"), then the 
data bits ("mantissa"), so, e.g., the number 9 codes as 000 100 1001. 
  0. 0 0 1 0 0 2 4 0 0 8 16 32... 

This lumpy duckling doesn't even make use of all the available codes, 
so it doesn't add up to 1. But, it's the first 
code in this message that approaches 100% swanlike efficiency. 

Henry Baker (earlier than that) tested a combination of Fibonacci for 
the size, plus binary mantissa with the leading one removed: 
  0. 0 0 1 0 2 0 4 8 0 16 32 64... 

One way to smooth this out would be to (roughly) subtract the size of 
the size from the size: 
  0. 0 0 1 1 1 2 2 4 8 8 16 32... 
(The Fibonacci there is (1) (1) (1 2) (2 4 8) (8 16 32 64 128)...) 

These three combination codes all have runs of doublings that are 
interrupted exponentially less and less often, which is why they 
approach 100% efficiency. I imagine a code where the interruptions 
slowed more slowly would also work. Also, I've been trying codes 
where the uplings gradually approach doublings, e.g., 
  0. 0 1 2 1 3 4 6 11 16 27 45 76... 

This histogram representation can be used for coding and decoding, 
and as the output of a Huffman code generator. Codes generated this 
way end up with different bit patterns than Huffman generates (they're 
sorted instead of scrambled), but the efficiencies of the codes are 
the same. 

 --Steve 

_______________________________________________ 
math-fun mailing list 
math-fun@mailman.xmission.com 
http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun 

Re: [math-fun] Binary codes as binary numbers

franktaw＠netscape.net