Pat Gallagher, in 1976 (mathematika "primes in short intervals") showed, under the prime k-tuple hypothesis, that the gaps between succesive primes were distributed in a Poisson distribution. So I would try arithmetic coding with that distribution. Victor Sent from my iPhone On May 23, 2012, at 13:50, Henry Baker <hbaker1@pipeline.com> wrote:
Wow! Thanks for all this interesting data. Yes, I'm also amazed that these programs didn't do a bit better.
At 09:45 AM 5/23/2012, Tom Rokicki wrote:
As near as I can tell, bzip2 defaults to -9 (and there's no higher setting, even though it is fast). gzip -9 is very slow. I also added 7z.
First, for the 30-integers-in-one-byte representation, they all fall far short of arithmetic coding, with not too much variation. The slightly improved compression of 7z is offset by its greatly increased CPU requirements in this case.
745501238 primes.7z 761333494 primes.dat.gz9 761814389 primes.dat.gz 761814523 primes.zip 798639547 primes.dat.bz2 1073741824 primes.dat
When using a one-bit-per-integer representation, all files increase substantially in size. I was surprised at this; I had expected such a simple, almost symbol-for-symbol transformation to not affect the final sizes of the compressed files very much. But for this case, the greater redundancy appeared to permit the different compression programs to show there differences:
886586498 primesb.7z 975278536 primesb.bz2 1038286643 primesb.gz9 1130861176 primesb.gz 1130861312 primesb.zip 4026531840 primesb.dat
If it were indeed important to store a compressed bitmap of the primes, a simple technique of omitting multiples of (2,3,5,7,11,13,17,19,23) (or some such) would do better than any of these programs, showing once again that domain-specific compression that takes advantage of domain knowledge usually wins.
I often use this technique when looking for patterns in things like pruning tables for permutation puzzles and the like; try compressing different representations and see what happens. Almost always the results are obvious once you understand why, but gaining that understanding through empirical compression can help the process of enlightenment.
-tom
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun