[math-fun] Error correction for integers (bignums)
The problem with standard ECC is that *all bits are treated the same*, regardless of bit position. Too bad if the corrupted will & testament reads "Mallory has just inherited $XYZ,000.00"; X matters a heck of a lot more than Y, which matters a heck of a lot more than Z. Clearly, one would like to put more redundancy into X than Y, and more into Y than Z. Alternatively, encode X into more bits than Y, which enables "standard" ECC to better deal with errors in X than in Y. As a warmup, consider mapping integers n in the range [0,2^k) into k-bit bit-strings; alternatively, use *unary* coding 1^(2^k) for these same integers. Although the unary coding has the desired property, it is quite inefficient. If I understand his papers, Gallager solved this problem back in the 1970's, by choosing a fixed large number M and unary encoding floor(n/M) and Huffman encoding mod(n,M) (almost always standard binary encoding of mod(n,M)). Although Gallager considered *variable- length codes*, so these unary codes don't have to be full length, I suspect that the answer doesn't change by more than a constant factor (??) if a fixed maximum length is specified for the unary coding of floor(n/M). Is there a better method of protecting the high order bits of n using a more sophisticated type of ECC than this unary encoding hack of Gallager's ?
I forgot to add: based upon Galleger's results, those who've been laughing at Roman numerals for the past 1,000 years for their "inefficiency" at representing large numbers might want to rethink their mirth. At 05:15 PM 1/25/2019, Henry Baker wrote:
The problem with standard ECC is that *all bits are treated the same*, regardless of bit position. Too bad if the corrupted will & testament reads "Mallory has just inherited $XYZ,000.00"; X matters a heck of a lot more than Y, which matters a heck of a lot more than Z.
Clearly, one would like to put more redundancy into X than Y, and more into Y than Z.
Alternatively, encode X into more bits than Y, which enables "standard" ECC to better deal with errors in X than in Y.
As a warmup, consider mapping integers n in the range [0,2^k) into k-bit bit-strings; alternatively, use *unary* coding 1^(2^k) for these same integers. Although the unary coding has the desired property, it is quite inefficient.
If I understand his papers, Gallager solved this problem back in the 1970's, by choosing a fixed large number M and unary encoding floor(n/M) and Huffman encoding mod(n,M) (almost always standard binary encoding of mod(n,M)).
Although Gallager considered *variable- length codes*, so these unary codes don't have to be full length, I suspect that the answer doesn't change by more than a constant factor (??) if a fixed maximum length is specified for the unary coding of floor(n/M).
Is there a better method of protecting the high order bits of n using a more sophisticated type of ECC than this unary encoding hack of Gallager's ?
Assuming I had a large number of integers to protect, my thought would be to feed them through a good compressor, maybe LZW, and then apply ECC to the resulting high entropy strings. The initial representation of the integers wouldn't matter much.
This scheme wouldn't work. For example, suppose you wanted to protect a large number of initial digits of pi. According to current knowledge, pi isn't compressible by any standard compressor, so its bits will be represented just by themselves. Any ECC appled to the "compressed" bitstring will still have the problem that the high-order bits are treated the same as the low order bits (reason: the Hamming weight decoding assumption doesn't distinguish bit positions -- all are equally important). At 02:33 AM 1/26/2019, Dave Dyer wrote:
Assuming I had a large number of integers to protect, my thought would be to feed them through a good compressor, maybe LZW, and then apply ECC to the resulting high entropy strings. The initial representation of the integers wouldn't matter much.
I just found that the canonical name for this type of coding is called "Golomb coding": We choose a suitable a priori fixed M based upon the (fixed) statistics of the integer N being encoded. Compute (q,r) = (floor(N/M),mod(N,M)). Then the code for N is unary(q)|binary(r). More accurately, we actually encode N-1: (q,r) = (floor((N-1)/M),mod(N-1,M)) code is unary(q)|1|binary(r). https://en.wikipedia.org/wiki/Golomb_coding Here's a page with worked-out examples: https://w3.ual.es/~vruiz/Docencia/Apuntes/Coding/Text/03-symbol_encoding/09-... Once again, I claim that the Roman numeral requirement for unary coding of thousands, e.g., "MMMMMM" for 6,000, was an *advantage*, since the mere size of the representation would more quickly distinguish it from the representation for 3,000. The Roman numeral system *approximated Golumb coding* thousands of years before Golomb. https://en.wikipedia.org/wiki/Roman_numerals At 02:33 AM 1/26/2019, Dave Dyer wrote:
Assuming I had a large number of integers to protect, my thought would be to feed them through a good compressor, maybe LZW, and then apply ECC to the resulting high entropy strings. The initial representation of the integers wouldn't matter much.
https://en.m.wikipedia.org/wiki/Elias_omega_coding On Sat, Jan 26, 2019 at 10:32 Henry Baker <hbaker1@pipeline.com> wrote:
I just found that the canonical name for this type of coding is called "Golomb coding":
We choose a suitable a priori fixed M based upon the (fixed) statistics of the integer N being encoded.
Compute (q,r) = (floor(N/M),mod(N,M)).
Then the code for N is unary(q)|binary(r).
More accurately, we actually encode N-1:
(q,r) = (floor((N-1)/M),mod(N-1,M))
code is unary(q)|1|binary(r).
https://en.wikipedia.org/wiki/Golomb_coding
Here's a page with worked-out examples:
https://w3.ual.es/~vruiz/Docencia/Apuntes/Coding/Text/03-symbol_encoding/09-...
Once again, I claim that the Roman numeral requirement for unary coding of thousands, e.g., "MMMMMM" for 6,000, was an *advantage*, since the mere size of the representation would more quickly distinguish it from the representation for 3,000. The Roman numeral system *approximated Golumb coding* thousands of years before Golomb.
https://en.wikipedia.org/wiki/Roman_numerals
At 02:33 AM 1/26/2019, Dave Dyer wrote:
Assuming I had a large number of integers to protect, my thought would be to feed them through a good compressor, maybe LZW, and then apply ECC to the resulting high entropy strings. The initial representation of the integers wouldn't matter much.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
Starting from a single bit for 0 or 1 - surely no difference here…….
On 26 Jan 2019, at 01:15, Henry Baker <hbaker1@pipeline.com> wrote:
Clearly, one would like to put more redundancy into X than Y, and more into Y than Z.
Alternatively, encode X into more bits than Y, which enables "standard" ECC to better deal with errors in X than in Y.
If you want to move this information around and keep the error correcting properties, worth remembering that communication channels require carefully coded bit streams that (for Shannon) look like white Gaussian noise and for the electrical efficiency in the transmitter are free of long strings of 1's or 0's Can that be folded into the ideas of targeted bit error correction? --R. On Mon, Jan 28, 2019 at 6:03 AM David John Makin via math-fun < math-fun@mailman.xmission.com> wrote:
Starting from a single bit for 0 or 1 - surely no difference here…….
On 26 Jan 2019, at 01:15, Henry Baker <hbaker1@pipeline.com> wrote:
Clearly, one would like to put more redundancy into X than Y, and more into Y than Z.
Alternatively, encode X into more bits than Y, which enables "standard" ECC to better deal with errors in X than in Y.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
participants (5)
-
Dave Dyer -
David John Makin -
Henry Baker -
Richard Howard -
Victor Miller