[math-fun] "hate speech" coding, aka "escape sequence" coding

26 Sep 2018

      (Regardless of the subject heading, this is not a political post.)

Governments around the world, most obviously the EU, China and
Thailand, require the "scrubbing"/"deletion" of certain "speech",
which boils down to the scrubbing/elision of certain subsequences.
The recent EU decision also requires the scrubbing/elision of
all *copyrighted* subsequences, thus vastly increasing the %
of subsequences prohibited from transmission and processing on
networked computer systems.

Computer scientists have long had to deal with this sort of thing
on an ad hoc basis -- e.g., how to represent the quote character
within a quoted string, or how to represent the *preamble*
sequence of bits for a block on a disk drive, or the *preamble*
sequence of bits for a data block sent on the radio waves.

Indeed, in the case of disk blocks and radio blocks, one can
still find sequences that inadvertently are treated as preamble
sequences which reset the disk/radio circuitry.  This error
has been used to successfully hack these systems.

However, with a large and growing database of prohibited
subsequences, the requirement exists for coding systems that
can somehow avoid all the prohibited subsequences, while
permiting non-prohibited subsequences.

Leaving aside the issue of whether the set of prohibited
subsequences is decidable -- let's assume that it is -- what
sort of encoding schemes are possible in a world where the
database of prohibited sequences is not constant, but is
ever growing ??

Henry Baker

tags

participants (1)