If M is the minimal length of the genome of a "lifeform capable of independent existence", then it strikes me as unlikely that there would be many that would 'work'. (At least if you count properly -- two different ways of coding the same AA shouldn't count as different in this context. Otherwise you introduce a fairly predictable exponential factor, something like (64/20)^(n/3) on average. In any case the remaining entropy is quite high.) I'd expect that almost any change would render it useless -- maybe some transpositions would be possible, but probably not many. (At that level of compression, probably many sections are being reused in strange ways.) Of course even if you could break it up into 30 sections (average length just 100bp!) and transpose them you'd only gain 107 bits toward the 4322, leaving you with much too much entropy. For genomes of length M + 100, say, you can get lots of viable lifeforms by adding noncoding sections, but you don't have that option with length M. In any case I don't think that a single 3000bp strand of DNA could reasonably form by chance, let alone N/4^3000 of them. I suspect abiogenesis was much more subtle. Charles Greathouse Analyst/Programmer Case Western Reserve University On Thu, May 7, 2015 at 2:46 PM, Warren D Smith <warren.wds@gmail.com> wrote:
If the simplest lifeform capable of independent existence had, say, 3000 base pair long DNA --actually I think the least the "minimal genome project" has been able to come up with is more like 100 times that -- then you might say "the chance of that is 4^(-3000) which is about 10^(-1806), which is so small that even if every atom in the observable universe were trying a new 3000-long DNA sequence every femtosecond, life almost certainly would still never have come into existence anywhere ever... therefore, life is a miracle and Earth is likely the only place in the universe that has any."
However, that calculation was wrong because more than one of those 4^3000 sequences probably works to produce a viable lifeform. In fact the number that work is probably also enormous. If the number were N then the viability probability is more like P=N/4^3000, and it is that chance P that really needs to be used to assess the miraculousness.
OK, that brings me to my point. We can do an experiment to approximately measure P. Start with some near-minimal bacterial genome which say has G base pairs in its genome. Randomly mutate K of its base pairs. There are binomial(G,K)*3^K possible mutated genomes obtainable in this way. We are taking a uniform random sample among them. When you do this, count how many of the resulting bacteria remain viable, versus how many are rendered unviable.
The result of such an experiment is a function F(K) estimating the chance that mutating K of the base pairs, still yields a viable lifeform. We will know, to good accuracy, the values of F(1), F(2), F(3), etc for some set of K's. We then want to EXTRAPOLATE this function to determine the value of F(G-1), which is, essentially, equal to P, the life-viability chance we were seeking.
To perform this extrapolation we need to obtain an empirical formula that fits the data F(1), F(2), F(3) etc that we have. I doubt this extrapolation will be very difficult. In fact I might a priori suspect that F(K) = K^Q * C^K for suitable fitting constants Q and C, will work decently.
Thinking some more, it might be possible to do an even better job than that. We can do a different kind of experiment to attempt to estimate F(K+J)/F(K), by starting with a viable K-mutant, and generating J-mutants of it. We might be able to reach very large K values this way and thus build a long chain of such ratio-estimates.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun