[math-fun] nucleotide tetrahedron
While at the Joint Winter Meetings in Atlanta, I saw a rather curious object sitting on a table: a regular tetrahedron made of white cardboard, whose sides were labelled "A", "C", "G", and "T". (Does anyone know who created it, and why?) This sighting prompted a question which is perhaps more suited to Michael Kleber than to any other person alive, since it concerns both polyhedral models (an old interest of his) and nucleotides (a newer interest of his), but since the answer may interest lots of people besides the two of us, I thought I'd ask it in this forum: What are the biologically relevant group-actions on the set {A,C,G,T}? Leaving aside the symmetric group action ("they're all necleotides, aren't they?") and the trivial group action ("yeah, but they're DIFFERENT nucleotides"), there's the 4-element group action that stabilizes the set {A,G} and the set {C,T} ("sure, but the two purines are more like each other than they are like the two pyrimidines, and vice versa"), and there's the 8-element group action that stabilizes the partition {{A,T},{C,G}} ("okay, but what really matters is that A pairs with T and C pairs with G") and there's the 4-element group action that stabilizes the sets {A,T} and {C,G} ("fine, but don't forget that the two pairs behave differently vis-a-vis transcription of DNA into RNA (the whole uracil thing)"), and ... I'd be surprised if there were some biologically relevant action of the 2-element group that leaves T and G alone but acts like Sym({A,C}) on the other two elements (to give just one example of a group-action that probably isn't biologically meaningful). But, like most of you, I like to be surprised, which is why I'm asking. Also, do any linear representations (as opposed to permutation representations) of Sym({A,C,G,T}) play a role in genomics? Jim Propp
Hi Jim. That tetrahedron certainly would have caught my eye as well. It occurs to me that my current job would be arbitrarily easier if the genomes we were assembling looked like the output of rolling that die 3 million times :-). David Kephart already mentioned the basic idea that what we really care about is strings over the alphabet {A,C,G,T}, and that the fundamental operation on this set is the self-inverse "reverse complement" action, the product of the double transposition (AT)(CG) with reversal of the entire string. This is the operation that really embodies Watson-Crick base pairing, because (again as David already said), DNA strands are oriented and the two strands in the double-helix are opposite to one another, so "reverse complement" takes one strand of your chromosome to the other. It's worth noting that the Watson-Crick pairing is inherently meaningful in the DNA or RNA polymer context -- that is, as part of a string, not a single base -- since the fact that these guys pair with one another is a result of the precise alignment and separation enforced by the helical backbone of the polymer. In other words, the base pairing rules follow from the reverse- complement operation, not the other way around!
I'd be surprised if there were some biologically relevant action of the 2-element group that leaves T and G alone but acts like Sym({A,C}) on the other two elements (to give just one example of a group-action that probably isn't biologically meaningful). But, like most of you, I like to be surprised, which is why I'm asking.
Well, A is certainly distinguished from the other three bases: it's the same A as in ATP and ADP, the molecules that store and release just about all energy used in the cell...
Also, do any linear representations (as opposed to permutation representations) of Sym({A,C,G,T}) play a role in genomics?
Here's something that scores on both counts, but it requires about three paragraphs of background. One beautiful instance of biologically meaningful combinatorics is somatic hypermutation in the immune system. This answers the fundamental question "How can we fight off so many pathogens?" -- I mean, we only have about 20,000 genes, and yet our ability to recognize a pathogen that the immune system had seen 20 years earlier ("acquired immune response") shows that we have targeted cells that respond to easily a million different molecular invaders. How can that be? The basic but shocking answer is that the genes that code for the "recognition" part of your immune cells are deliberately modified. In the parent cells that give rise to B-cells and T-cells there is a gene region "V", which contains many different subsegments, V1,V2,...,V70. But in the daughter cells, some deliberate mutation happens, and all but a randomly-selected one of the V portions are snipped out. This happens similarly in four other regions lying in two genes, all independently, and in the end the daughter cell has somewhere between 10^12 and 10^16 different possible gene pairs, each of which codes for a distinct receptors. Let me point out explicitly that the same effect could *not* be achieved by, for example, random splicing of the RNA after it's copied from the DNA. This could make a random protein, but it wouldn't be *reproducible*. The recombined B-cell, if it turns out to be useful in fighting an invader, can now reproduce and yeild an army of cells tuned to that specific pathogen.) Okay, enough background, now for the group action: It seems that even this amount of variation isn't enough, and there's another mechanism to increase diversity among your B-cells and T-cells. There's a protein called cytidine deaminase that deliberately causes C->T mutations in a region of one of the recombined genes I described above. (It only acts rarely -- but about a million times more commonly than random mutation in an active gene.) So that linear action with one off-diagonal entry plays an important immunological role. How's that? --Michael Kleber -- It is very dark and after 2000. If you continue you are likely to be eaten by a bleen.
Date: Sun, 16 Jan 2005 14:49:35 -0600 (CST) From: James Propp <propp@math.wisc.edu>
While at the Joint Winter Meetings in Atlanta, I saw a rather curious object sitting on a table: a regular tetrahedron made of white cardboard, whose sides were labelled "A", "C", "G", and "T". (Does anyone know who created it, and why?)
A combination of bioinformatics with Dungeons & Dragons...? :-) More seriously, lots of sequence alignment algorithms start from a null hypothesis that parts of the sequence are randomly generated, and compute a p-value from there. You could imagine illustrating this to a freshman class by flipping the tetrahedron die a few times. -- Steve Rowley <sgr@alum.mit.edu> http://alum.mit.edu/www/sgr/ ICQ: 52-377-390
participants (3)
-
James Propp -
Michael Kleber -
Steve Rowley