[math-fun] Testing N people for viruses with log(N) tests

17 Apr 2020

      For some reason there's still an extreme shortage of covid-19 tests.
If only everyone could be tested, preferably at least once a week,
most of social isolation could end.  Negative people could associate
with negatives.  Positive people could associate with positives.
People who were formerly positive but are now negative could associate
with everyone, especially if they test positive for antibodies.

Fortunately, it's not necessary to have N tests to test N people
unless the expected infection rate is close to 50%.  If the expected
infection rate is, say, one in a hundred, samples can be taken from
128 people, and each sample thoroughly stirred then divided into 7
equal parts.  The first part from all 128 would be mixed together,
thoroughly stirred, then tested with one test.  If the test is
negative, then all 128 people are negative, and we're done.

If that test is positive, the second part from 64 of the 128 would be
mixed together and tested, and the second part from the other 64 of
the 128 would be mixed together and tested, requiring two more tests.
Probably one of the tests would be negative, in which case those 64
would be negative.  The process would then continue with the third,
fourth, etc., parts, subdividing it further and further to find the
probably just one or two people in the set who test positive.

The identical approach could be used with the test for antibodies.

The sets of 128 should be chosen so that no two family members or
coworkers are in the same set, to avoid sets with a high likelihood
of correlations.  If some sets have half the people in them infected,
that will result in a lot more tests being needed, more than would
be compensated for by a similar number other sets having nobody in
them infected.

I'll leave it to others to calculate what the expected number of tests
needed would be, but it's obviously O(log2(N)) where the expected
infection rate is 1/N.

We're accustomed to thinking in terms of terabytes and petabytes.
But it's worth remembering that fractional bits of information are
meaningful too.  For instance if just one person in the world is
expected to test positive for something (e.g. for being the one who
left DNA at a specific crime scene), learning which person that is is
only 33 bits of information.  That means that testing any one person
provides just 4 nanobits of information, hence that relatively few
tests which provide one bit of information need to be used.

Of course the above assumes that the tests are reliable.  Testing in
this way amplifies any unreliability in either direction.  However,
this can be compensated for by multiple tests in ways that are much
more efficient than simply testing everyone multiple times.  Just as
unreliable data transmissions can be made more reliable in ways more
efficient than simply repeating the transmission multiple times.

Parity bits and CRC checks can reduce the error rate of a data
transmission to as low as is desired (albeit never quite to zero).
I wonder if the reason why doctors aren't doing testing in the way I
suggest is because of specialization.  Doctors need to talk to experts
on data transmission and encoding, such as the inventors of the JT65
mode which made it possible for hams with battery-operated radios and
hand-held antennas to communicate by bouncing signals of the moon.
With larger antennas, this has been done with a transmitter of just
3 milliwatts.

Keith F. Lynch

Brad Klee

William R Somsky

Andy Latto

Michael Kleber

Henry Baker

Victor Miller

Tom Knight

Hilarie Orman

Michael Greenwald

tags

participants (10)