Re: [math-fun] nucleotide tetrahedron

18 Jan 2005

      Hi Jim.  That tetrahedron certainly would have caught my eye 
as well. It occurs to me that my current job would be arbitrarily
easier if the genomes we were assembling looked like the
output of rolling that die 3 million times :-).

David Kephart already mentioned the basic idea that what we
really care about is strings over the alphabet {A,C,G,T}, and
that the fundamental operation on this set is the self-inverse
"reverse complement" action, the product of the double 
transposition (AT)(CG) with reversal of the entire string.
This is the operation that really embodies Watson-Crick
base pairing, because (again as David already said), DNA
strands are oriented and the two strands in the double-helix
are opposite to one another, so "reverse complement" takes
one strand of your chromosome to the other.

It's worth noting that the Watson-Crick pairing is inherently
meaningful in the DNA or RNA polymer context -- that is, as
part of a string, not a single base -- since the fact that these
guys pair with one another is a result of the precise alignment
and separation enforced by the helical backbone of the polymer.
In other words, the base pairing rules follow from the reverse-
complement operation, not the other way around!
...
I'd be surprised if there were some biologically relevant
action of the 2-element group that leaves T and G alone
but acts like Sym({A,C}) on the other two elements (to
give just one example of a group-action that probably
isn't biologically meaningful).  But, like most of you,
I like to be surprised, which is why I'm asking.
Well, A is certainly distinguished from the other three bases:
it's the same A as in ATP and ADP, the molecules that store
and release just about all energy used in the cell...
...
Also, do any linear representations (as opposed to permutation
representations) of Sym({A,C,G,T}) play a role in genomics?
Here's something that scores on both counts, but it requires about
three paragraphs of background.

One beautiful instance of biologically meaningful combinatorics
is somatic hypermutation in the immune system.  This answers
the fundamental question "How can we fight off so many pathogens?"
-- I mean, we only have about 20,000 genes, and yet our ability to
recognize a pathogen that the immune system had seen 20 years
earlier ("acquired immune response") shows that we have targeted 
cells that respond to easily a million different molecular invaders.  
How can that be?

The basic but shocking answer is that the genes that code for the 
"recognition" part of your immune cells are deliberately modified.
In the parent cells that give rise to B-cells and T-cells there is a
gene region "V", which contains many different subsegments,
V1,V2,...,V70.  But in the daughter cells, some deliberate mutation
happens, and all but a randomly-selected one of the V portions are 
snipped out.  This happens similarly in four other regions lying in two
genes, all independently, and in the end the daughter cell has
somewhere between 10^12 and 10^16 different possible gene pairs,
each of which codes for a distinct receptors.

Let me point out explicitly that the same effect could *not* be achieved 
by, for example, random splicing of the RNA after it's copied from the DNA.
This could make a random protein, but it wouldn't be *reproducible*.  The 
recombined B-cell, if it turns out to be useful in fighting an invader, can now
reproduce and yeild an army of cells tuned to that specific pathogen.)

Okay, enough background, now for the group action:

It seems that even this amount of variation isn't enough, and there's
another mechanism to increase diversity among your B-cells and T-cells.
There's a protein called cytidine deaminase that deliberately causes C->T 
mutations in a region of one of the recombined genes I described above.
(It only acts rarely -- but about a million times more commonly than random
mutation in an active gene.)  So that linear action with one off-diagonal entry
plays an important immunological role.

How's that?

--Michael Kleber

-- 
It is very dark and after 2000. If you continue you are likely to be
eaten by a bleen.