How about the Catalan numbers? 0. 0 1 0 2 0 5 0 14 0 42 ... or similarly 0. 1 0 1 0 2 0 5 0 14 0 42 ... or combine and shift 0. 0 1 1 1 2 2 5 5 14 14 42 42 ... or another combination 0. 0 0 1 2 2 4 5 10 14 28 42 84 ... Does sum_{k>0} C_k/4^k = 1? It definitely converges (C_k ~ 4^k/(sqrt(pi)*k^3/2)), and the limit is close to 1. Franklin T. Adams-Watters -----Original Message----- From: Steve Witham <sw@tiac.net> A nice way to look at and compare the binary codes we've been talking about is as binary representations (!) of the number one. This corresponds to a histogram of lengths of codewords. The most trivial code is for an alphabet of one symbol, whose probability is 1. I'll write it like this: 1. The other reading of that is, there is one symbol that encodes with no bits. (The question of how do you know how many of this symbol have been transmitted is outside the scope of everything we've been talking about!) It probably seems strange to include the zero as a possible length of codes, but I had an influential boss who said always to handle the zero case. Unary means zero or more zero bits followed by a one bit (or, zero or more one bits followed by a zero): 0. 1 1 1 1 1... The codes are 1, 01, 001, 0001, etc., in other words, one code of every length starting at 1. The binary interpretation is 0/1 + 1/2 + 1/4 + 1/8 + 1/16... = 1 Eight-bit binary has 256 symbols that use eight bits each: 0. 0 0 0 0 0 0 0 256 Ternary uses three combinations of two bits as digits and one as a terminator, so the lengths are always even: 0. 0 1 0 3 0 9 0 27 0 81 0 243... "Septary" or radix seven uses groups of three bits: 0. 0 0 1 0 0 7 0 0 49 0 0 343... Fibonacci or Zeckendorf coding is very interesting written this way: 0. 0 1 1 2 3 5 8 13 21 34 55 89... This works because 1/4 + 1/8 + 2/16 + 3/32 + 5/64 + 8/128 ... = 1. Ternary and septary are lumpy and Fibonacci is smooth--just looking at this you can guess that Fibonacci is more efficient with small numbers. Here are the optimal codes for Zipfian distributions over alphabets of 3, 15, 255 and 65535 symbols: 0. 1 2 0. 0 1 2 4 8 0. 0 0 1 2 4 8 16 32 64 128 In general the optimal encoding looks like that when the size of the alphabet is 2^(2^n) - 1. The awful combination code I talked about the other day uses n zero bits, n bits to tell the number of data bits ("exponent"), then the data bits ("mantissa"), so, e.g., the number 9 codes as 000 100 1001. 0. 0 0 1 0 0 2 4 0 0 8 16 32... This lumpy duckling doesn't even make use of all the available codes, so it doesn't add up to 1. But, it's the first code in this message that approaches 100% swanlike efficiency. Henry Baker (earlier than that) tested a combination of Fibonacci for the size, plus binary mantissa with the leading one removed: 0. 0 0 1 0 2 0 4 8 0 16 32 64... One way to smooth this out would be to (roughly) subtract the size of the size from the size: 0. 0 0 1 1 1 2 2 4 8 8 16 32... (The Fibonacci there is (1) (1) (1 2) (2 4 8) (8 16 32 64 128)...) These three combination codes all have runs of doublings that are interrupted exponentially less and less often, which is why they approach 100% efficiency. I imagine a code where the interruptions slowed more slowly would also work. Also, I've been trying codes where the uplings gradually approach doublings, e.g., 0. 0 1 2 1 3 4 6 11 16 27 45 76... This histogram representation can be used for coding and decoding, and as the output of a Huffman code generator. Codes generated this way end up with different bit patterns than Huffman generates (they're sorted instead of scrambled), but the efficiencies of the codes are the same. --Steve _______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun