Re: [math-fun] Error correction for integers (bignums)

26 Jan 2019

      Hi Victor:

Yes, the Elias omega coding is interesting, but it's still
going in the exactly wrong direction because a single-bit
error can produce catastrophic results in the value decoded.

Indeed, any of the usual "lossless compression" algorithms
assume that *every individual bit* is accurate, with no
errors possible.

This is what is called the "digital cliff effect", whereby
a cellphone or a digital TV which reaches the edge of its
range suddenly finds *no signal at all*, rather than a
smoothly degrading signal a la "analog TV" or "analog
radio".

The only way I can think of to encode integer N in a
range [0,M^k) for some "large" M>0 is to utilize a
sort of unary code for m bits, followed by a binary
code for the remainder mod(N,M), in a manner very
Golumb-like.

The unary code has a fixed length dependent upon
k, and decoding is easy:

HammingWeight(unary code portion)*M+Binarycode(remainderbits)

Thus a single-bit error in the unary part can cost
at most M (out of M^k), and a single-bit error in
the remainder costs at most M/2.  Thus, no single
bit can cause a catastrophic loss.

At 09:27 AM 1/26/2019, Victor Miller wrote:
...
https://en.m.wikipedia.org/wiki/Elias_omega_coding
On Sat, Jan 26, 2019 at 10:32 Henry Baker <hbaker1@pipeline.com> wrote:
...
I just found that the canonical name
for this type of coding is called
"Golomb coding":
We choose a suitable a priori fixed M
based upon the (fixed) statistics of
the integer N being encoded.
Compute (q,r) = (floor(N/M),mod(N,M)).
Then the code for N is unary(q)|binary(r).
More accurately, we actually encode N-1:
(q,r) = (floor((N-1)/M),mod(N-1,M))
code is unary(q)|1|binary(r).
https://en.wikipedia.org/wiki/Golomb_coding
Here's a page with worked-out examples:
https://w3.ual.es/~vruiz/Docencia/Apuntes/Coding/Text/03-symbol_encoding/09-...
Once again, I claim that the Roman numeral requirement
for unary coding of thousands, e.g., "MMMMMM" for 6,000,
was an *advantage*, since the mere size of the representation
would more quickly distinguish it from the representation
for 3,000.  The Roman numeral system *approximated Golumb
coding* thousands of years before Golomb.
https://en.wikipedia.org/wiki/Roman_numerals
At 02:33 AM 1/26/2019, Dave Dyer wrote:
...
Assuming I had a large number of integers to protect, my thought
would be to feed them through a good compressor, maybe LZW, and
then apply ECC to the resulting high entropy strings.  The initial
representation of the integers wouldn't matter much.