I just found that the canonical name for this type of coding is called "Golomb coding": We choose a suitable a priori fixed M based upon the (fixed) statistics of the integer N being encoded. Compute (q,r) = (floor(N/M),mod(N,M)). Then the code for N is unary(q)|binary(r). More accurately, we actually encode N-1: (q,r) = (floor((N-1)/M),mod(N-1,M)) code is unary(q)|1|binary(r). https://en.wikipedia.org/wiki/Golomb_coding Here's a page with worked-out examples: https://w3.ual.es/~vruiz/Docencia/Apuntes/Coding/Text/03-symbol_encoding/09-... Once again, I claim that the Roman numeral requirement for unary coding of thousands, e.g., "MMMMMM" for 6,000, was an *advantage*, since the mere size of the representation would more quickly distinguish it from the representation for 3,000. The Roman numeral system *approximated Golumb coding* thousands of years before Golomb. https://en.wikipedia.org/wiki/Roman_numerals At 02:33 AM 1/26/2019, Dave Dyer wrote:
Assuming I had a large number of integers to protect, my thought would be to feed them through a good compressor, maybe LZW, and then apply ECC to the resulting high entropy strings. The initial representation of the integers wouldn't matter much.