Hi Jim. That tetrahedron certainly would have caught my eye as well. It occurs to me that my current job would be arbitrarily easier if the genomes we were assembling looked like the output of rolling that die 3 million times :-). David Kephart already mentioned the basic idea that what we really care about is strings over the alphabet {A,C,G,T}, and that the fundamental operation on this set is the self-inverse "reverse complement" action, the product of the double transposition (AT)(CG) with reversal of the entire string. This is the operation that really embodies Watson-Crick base pairing, because (again as David already said), DNA strands are oriented and the two strands in the double-helix are opposite to one another, so "reverse complement" takes one strand of your chromosome to the other. It's worth noting that the Watson-Crick pairing is inherently meaningful in the DNA or RNA polymer context -- that is, as part of a string, not a single base -- since the fact that these guys pair with one another is a result of the precise alignment and separation enforced by the helical backbone of the polymer. In other words, the base pairing rules follow from the reverse- complement operation, not the other way around!
I'd be surprised if there were some biologically relevant action of the 2-element group that leaves T and G alone but acts like Sym({A,C}) on the other two elements (to give just one example of a group-action that probably isn't biologically meaningful). But, like most of you, I like to be surprised, which is why I'm asking.
Well, A is certainly distinguished from the other three bases: it's the same A as in ATP and ADP, the molecules that store and release just about all energy used in the cell...
Also, do any linear representations (as opposed to permutation representations) of Sym({A,C,G,T}) play a role in genomics?
Here's something that scores on both counts, but it requires about three paragraphs of background. One beautiful instance of biologically meaningful combinatorics is somatic hypermutation in the immune system. This answers the fundamental question "How can we fight off so many pathogens?" -- I mean, we only have about 20,000 genes, and yet our ability to recognize a pathogen that the immune system had seen 20 years earlier ("acquired immune response") shows that we have targeted cells that respond to easily a million different molecular invaders. How can that be? The basic but shocking answer is that the genes that code for the "recognition" part of your immune cells are deliberately modified. In the parent cells that give rise to B-cells and T-cells there is a gene region "V", which contains many different subsegments, V1,V2,...,V70. But in the daughter cells, some deliberate mutation happens, and all but a randomly-selected one of the V portions are snipped out. This happens similarly in four other regions lying in two genes, all independently, and in the end the daughter cell has somewhere between 10^12 and 10^16 different possible gene pairs, each of which codes for a distinct receptors. Let me point out explicitly that the same effect could *not* be achieved by, for example, random splicing of the RNA after it's copied from the DNA. This could make a random protein, but it wouldn't be *reproducible*. The recombined B-cell, if it turns out to be useful in fighting an invader, can now reproduce and yeild an army of cells tuned to that specific pathogen.) Okay, enough background, now for the group action: It seems that even this amount of variation isn't enough, and there's another mechanism to increase diversity among your B-cells and T-cells. There's a protein called cytidine deaminase that deliberately causes C->T mutations in a region of one of the recombined genes I described above. (It only acts rarely -- but about a million times more commonly than random mutation in an active gene.) So that linear action with one off-diagonal entry plays an important immunological role. How's that? --Michael Kleber -- It is very dark and after 2000. If you continue you are likely to be eaten by a bleen.