[math-fun] "hate speech" coding, aka "escape sequence" coding
(Regardless of the subject heading, this is not a political post.) Governments around the world, most obviously the EU, China and Thailand, require the "scrubbing"/"deletion" of certain "speech", which boils down to the scrubbing/elision of certain subsequences. The recent EU decision also requires the scrubbing/elision of all *copyrighted* subsequences, thus vastly increasing the % of subsequences prohibited from transmission and processing on networked computer systems. Computer scientists have long had to deal with this sort of thing on an ad hoc basis -- e.g., how to represent the quote character within a quoted string, or how to represent the *preamble* sequence of bits for a block on a disk drive, or the *preamble* sequence of bits for a data block sent on the radio waves. Indeed, in the case of disk blocks and radio blocks, one can still find sequences that inadvertently are treated as preamble sequences which reset the disk/radio circuitry. This error has been used to successfully hack these systems. However, with a large and growing database of prohibited subsequences, the requirement exists for coding systems that can somehow avoid all the prohibited subsequences, while permiting non-prohibited subsequences. Leaving aside the issue of whether the set of prohibited subsequences is decidable -- let's assume that it is -- what sort of encoding schemes are possible in a world where the database of prohibited sequences is not constant, but is ever growing ??
participants (1)
-
Henry Baker