Re: [math-fun] Encoding sorted lists with duplicates
Okay, encoding differences between successive elements (deltas) of a sorted list of r integers in { 0 .. n-1 }. This came from a programming puzzle I saw floating around the net: sort 1 million 32-bit ints in 2 million bytes of memory. 32 million bits is 4 million bytes, but for r = 1,000,000 and n = 2^32, a sorted list ought to be representable in ( r + n - 1 )! log ( -------------- ) ~= 13,511,284 bits = 1,688,910.5 bytes. 2 r! ( n - 1 )! The key is that the total of the deltas is less than 2^32. Once I started thinking about limiting how fast you use up bits vs. how fast you close in on that total--as a ratio to apply to the individual codewords--I was driven against my will to encode a delta d as floor( d / 2^12 ) one bits, a zero bit, and twelve binary bits. With this mixture of unary and binary, sizeof(d) <= 13 + d / 2^12 which makes it easy to see that for a million deltas totalling < 2^32, sum( sizeof(d) ) < 13,000,000 + 2^32 / 2^12 = 13,000,000 + 2^20 = 14,048,576 bits. Which is about 6/10 of a bit more per codeword than the the best possible. Just the fact that it uses a whole number of bits for each codword accounts for half a bit. I don't know whether this is an old trick, I haven't found another solution to the problem (as I understand it) on the web. --Steve
participants (1)
-
Steve Witham