[math-fun] How miraculous was the origin of life? Experiment suggested to answer this.
If the simplest lifeform capable of independent existence had, say, 3000 base pair long DNA --actually I think the least the "minimal genome project" has been able to come up with is more like 100 times that -- then you might say "the chance of that is 4^(-3000) which is about 10^(-1806), which is so small that even if every atom in the observable universe were trying a new 3000-long DNA sequence every femtosecond, life almost certainly would still never have come into existence anywhere ever... therefore, life is a miracle and Earth is likely the only place in the universe that has any." However, that calculation was wrong because more than one of those 4^3000 sequences probably works to produce a viable lifeform. In fact the number that work is probably also enormous. If the number were N then the viability probability is more like P=N/4^3000, and it is that chance P that really needs to be used to assess the miraculousness. OK, that brings me to my point. We can do an experiment to approximately measure P. Start with some near-minimal bacterial genome which say has G base pairs in its genome. Randomly mutate K of its base pairs. There are binomial(G,K)*3^K possible mutated genomes obtainable in this way. We are taking a uniform random sample among them. When you do this, count how many of the resulting bacteria remain viable, versus how many are rendered unviable. The result of such an experiment is a function F(K) estimating the chance that mutating K of the base pairs, still yields a viable lifeform. We will know, to good accuracy, the values of F(1), F(2), F(3), etc for some set of K's. We then want to EXTRAPOLATE this function to determine the value of F(G-1), which is, essentially, equal to P, the life-viability chance we were seeking. To perform this extrapolation we need to obtain an empirical formula that fits the data F(1), F(2), F(3) etc that we have. I doubt this extrapolation will be very difficult. In fact I might a priori suspect that F(K) = K^Q * C^K for suitable fitting constants Q and C, will work decently. Thinking some more, it might be possible to do an even better job than that. We can do a different kind of experiment to attempt to estimate F(K+J)/F(K), by starting with a viable K-mutant, and generating J-mutants of it. We might be able to reach very large K values this way and thus build a long chain of such ratio-estimates.
Something similar could also be done in artificial universes like Conway's "life." As is well known by now, ConwayLife allows the existence of Turing machines and also self-reproducing "universal assemblers," i.e. life. Indeed, people have actually constructed Universal Turing machines in ConwayLife in full detail, I am not sure whether anybody ever constructed the universal assembler, though. There are plenty of ConwayLife experts here who may know, and who could say what is the smallest known construction? Anyhow: one could ask what is the PROBABILITY that a random set of bits in an NxN square happens to be "alive" or "a Turing machine" or whatever. Obviously it it were a 1000x1000 square the naive probability estimate would be 2^(-1000000) for the configuration being a particular Turing machine... but a more sensible estimate would ask: "if that construction were altered by mutating K bits, what is the chance it still remains in working order?" etc. I would naively expect that most cells in any such construction are 0s, and the construction would still function if almost any 0-cell were changed to 1, because isolated 1s die out in a single generation. Therefore, the "information content" of a typical human-designed ConwayLife machine fitting in an HxW rectangle is really not at all well described by "H*W bits" and it is probably better described by lg( binomial(H*W, L) ) bits, where L is the number of live cells, but this again is a considerable overestimate. Perhaps something like 4*L is a better rough estimate. http://www.conwaylife.com/wiki/Universal_turing_machine claims that Paul Rendell in 2010 constructed a ConwayLife UTM with 252192 live cells fitting in a 12699x12652 rectangle. So it would still seem based on this that life has to be considered a "miracle" because 252192 is so damn large. Another experimental question in ConwayLife you probably already know the answer to: What is the chance that a random configuration swiftly becomes "boring" i.e. trivial to predict the whole future of? -- Warren D. Smith http://RangeVoting.org <-- add your endorsement (by clicking "endorse" as 1st step)
Warren Smith wrote:
Another experimental question in ConwayLife you probably already know the answer to: What is the chance that a random configuration swiftly becomes "boring" i.e. trivial to predict the whole future of?
Okay, here's the current status of the distributed search. Out of 78 * 10^9 random 16-by-16 soups, 921421 infinite-growth patterns have occurred. These have all been simple linear growth, and all other patterns have stabilised as a disjoint union of non-interacting oscillators and spaceships. http://catagolue.appspot.com/statistics Results are coming in at roughly 2.5 * 10^9 soups per day from people running the search script in Golly. There are roughly 120 CPUs running this script continuously. It's less exciting to watch than the election, but occasionally interesting things appear. I think that the closest thing to emergence observed so far is the appearance of Bill Gosper's period-46 twin-bees shuttles. We've had 2 cis- and 1 trans- shuttle appear so far: http://catagolue.appspot.com/census/b3s23/C1/xp46 Sincerely, Adam P. Goucher
If M is the minimal length of the genome of a "lifeform capable of independent existence", then it strikes me as unlikely that there would be many that would 'work'. (At least if you count properly -- two different ways of coding the same AA shouldn't count as different in this context. Otherwise you introduce a fairly predictable exponential factor, something like (64/20)^(n/3) on average. In any case the remaining entropy is quite high.) I'd expect that almost any change would render it useless -- maybe some transpositions would be possible, but probably not many. (At that level of compression, probably many sections are being reused in strange ways.) Of course even if you could break it up into 30 sections (average length just 100bp!) and transpose them you'd only gain 107 bits toward the 4322, leaving you with much too much entropy. For genomes of length M + 100, say, you can get lots of viable lifeforms by adding noncoding sections, but you don't have that option with length M. In any case I don't think that a single 3000bp strand of DNA could reasonably form by chance, let alone N/4^3000 of them. I suspect abiogenesis was much more subtle. Charles Greathouse Analyst/Programmer Case Western Reserve University On Thu, May 7, 2015 at 2:46 PM, Warren D Smith <warren.wds@gmail.com> wrote:
If the simplest lifeform capable of independent existence had, say, 3000 base pair long DNA --actually I think the least the "minimal genome project" has been able to come up with is more like 100 times that -- then you might say "the chance of that is 4^(-3000) which is about 10^(-1806), which is so small that even if every atom in the observable universe were trying a new 3000-long DNA sequence every femtosecond, life almost certainly would still never have come into existence anywhere ever... therefore, life is a miracle and Earth is likely the only place in the universe that has any."
However, that calculation was wrong because more than one of those 4^3000 sequences probably works to produce a viable lifeform. In fact the number that work is probably also enormous. If the number were N then the viability probability is more like P=N/4^3000, and it is that chance P that really needs to be used to assess the miraculousness.
OK, that brings me to my point. We can do an experiment to approximately measure P. Start with some near-minimal bacterial genome which say has G base pairs in its genome. Randomly mutate K of its base pairs. There are binomial(G,K)*3^K possible mutated genomes obtainable in this way. We are taking a uniform random sample among them. When you do this, count how many of the resulting bacteria remain viable, versus how many are rendered unviable.
The result of such an experiment is a function F(K) estimating the chance that mutating K of the base pairs, still yields a viable lifeform. We will know, to good accuracy, the values of F(1), F(2), F(3), etc for some set of K's. We then want to EXTRAPOLATE this function to determine the value of F(G-1), which is, essentially, equal to P, the life-viability chance we were seeking.
To perform this extrapolation we need to obtain an empirical formula that fits the data F(1), F(2), F(3) etc that we have. I doubt this extrapolation will be very difficult. In fact I might a priori suspect that F(K) = K^Q * C^K for suitable fitting constants Q and C, will work decently.
Thinking some more, it might be possible to do an even better job than that. We can do a different kind of experiment to attempt to estimate F(K+J)/F(K), by starting with a viable K-mutant, and generating J-mutants of it. We might be able to reach very large K values this way and thus build a long chain of such ratio-estimates.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
I have long believed that this kind of argument about abiogenesis to be barking up the wrong tree. We are asking how likely it is for a self-reproducing object to arise spontaneously out of a universe originally consisting of non-self-reproducing objects. The phrasing of the question conceals an assumption that I think is quite false. The assumption is that we have some definite criterion for distinguishing self-reproducing objects from non-self-reproducing ones -- that is, that we can definitely say whether a given sequence of events is or is not an example of self-reproduction. I say we can't, and that there is a gray area, a big one, one that may provide a shallow enough slope that climbing it from definitely-inorganic to definitely-organic is not so very implausible. The central question is, how similar does a "child" have to be to its "parent" for the scare-quoted words to acquire their familiar meanings? Clearly we do not insist on anything close to identicalness, or no organism that we call an organism would qualify. Even in species without sexual reproduction, where the child is a clone of the parent, mutations that change the genome are frequent, and even in the absence of mutation, there are always minor incidental differences in anatomy due to differing circumstances in the course of development. Identicalness sets the bar way too high. Without belaboring the point too much, I think it's plausible to claim that good self-reproduction evolved from bad self-reproduction, and bad self-reproduction evolved from some haphazard process that we might not even dignify with reproductive nomenclature. On Thu, May 7, 2015 at 4:37 PM, Charles Greathouse < charles.greathouse@case.edu> wrote:
If M is the minimal length of the genome of a "lifeform capable of independent existence", then it strikes me as unlikely that there would be many that would 'work'. (At least if you count properly -- two different ways of coding the same AA shouldn't count as different in this context. Otherwise you introduce a fairly predictable exponential factor, something like (64/20)^(n/3) on average. In any case the remaining entropy is quite high.) I'd expect that almost any change would render it useless -- maybe some transpositions would be possible, but probably not many. (At that level of compression, probably many sections are being reused in strange ways.) Of course even if you could break it up into 30 sections (average length just 100bp!) and transpose them you'd only gain 107 bits toward the 4322, leaving you with much too much entropy.
For genomes of length M + 100, say, you can get lots of viable lifeforms by adding noncoding sections, but you don't have that option with length M.
In any case I don't think that a single 3000bp strand of DNA could reasonably form by chance, let alone N/4^3000 of them. I suspect abiogenesis was much more subtle.
Charles Greathouse Analyst/Programmer Case Western Reserve University
On Thu, May 7, 2015 at 2:46 PM, Warren D Smith <warren.wds@gmail.com> wrote:
If the simplest lifeform capable of independent existence had, say, 3000 base pair long DNA --actually I think the least the "minimal genome project" has been able to come up with is more like 100 times that -- then you might say "the chance of that is 4^(-3000) which is about 10^(-1806), which is so small that even if every atom in the observable universe were trying a new 3000-long DNA sequence every femtosecond, life almost certainly would still never have come into existence anywhere ever... therefore, life is a miracle and Earth is likely the only place in the universe that has any."
However, that calculation was wrong because more than one of those 4^3000 sequences probably works to produce a viable lifeform. In fact the number that work is probably also enormous. If the number were N then the viability probability is more like P=N/4^3000, and it is that chance P that really needs to be used to assess the miraculousness.
OK, that brings me to my point. We can do an experiment to approximately measure P. Start with some near-minimal bacterial genome which say has G base pairs in its genome. Randomly mutate K of its base pairs. There are binomial(G,K)*3^K possible mutated genomes obtainable in this way. We are taking a uniform random sample among them. When you do this, count how many of the resulting bacteria remain viable, versus how many are rendered unviable.
The result of such an experiment is a function F(K) estimating the chance that mutating K of the base pairs, still yields a viable lifeform. We will know, to good accuracy, the values of F(1), F(2), F(3), etc for some set of K's. We then want to EXTRAPOLATE this function to determine the value of F(G-1), which is, essentially, equal to P, the life-viability chance we were seeking.
To perform this extrapolation we need to obtain an empirical formula that fits the data F(1), F(2), F(3) etc that we have. I doubt this extrapolation will be very difficult. In fact I might a priori suspect that F(K) = K^Q * C^K for suitable fitting constants Q and C, will work decently.
Thinking some more, it might be possible to do an even better job than that. We can do a different kind of experiment to attempt to estimate F(K+J)/F(K), by starting with a viable K-mutant, and generating J-mutants of it. We might be able to reach very large K values this way and thus build a long chain of such ratio-estimates.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
Speaking of evolution (both natural and artificial), whatever happened to Tom Ray's Tierra? Back in the 90s, he had already found some interesting things, and I was hoping that he would find more and more eerily life-like phenomena as he scaled up the project. Last I heard (a decade ago), he was going to kick things up a notch by exploiting the parallelism of the web. But I never hear anyone talk about Tierra anymore, so I'm guessing the project hit some sort of ceiling. Anyone out there have any facts or opinions or arrant speculation? Jim Propp On Thu, May 7, 2015 at 5:03 PM, Allan Wechsler <acwacw@gmail.com> wrote:
I have long believed that this kind of argument about abiogenesis to be barking up the wrong tree. We are asking how likely it is for a self-reproducing object to arise spontaneously out of a universe originally consisting of non-self-reproducing objects. The phrasing of the question conceals an assumption that I think is quite false.
The assumption is that we have some definite criterion for distinguishing self-reproducing objects from non-self-reproducing ones -- that is, that we can definitely say whether a given sequence of events is or is not an example of self-reproduction. I say we can't, and that there is a gray area, a big one, one that may provide a shallow enough slope that climbing it from definitely-inorganic to definitely-organic is not so very implausible.
The central question is, how similar does a "child" have to be to its "parent" for the scare-quoted words to acquire their familiar meanings? Clearly we do not insist on anything close to identicalness, or no organism that we call an organism would qualify. Even in species without sexual reproduction, where the child is a clone of the parent, mutations that change the genome are frequent, and even in the absence of mutation, there are always minor incidental differences in anatomy due to differing circumstances in the course of development. Identicalness sets the bar way too high.
Without belaboring the point too much, I think it's plausible to claim that good self-reproduction evolved from bad self-reproduction, and bad self-reproduction evolved from some haphazard process that we might not even dignify with reproductive nomenclature.
On Thu, May 7, 2015 at 4:37 PM, Charles Greathouse < charles.greathouse@case.edu> wrote:
If M is the minimal length of the genome of a "lifeform capable of independent existence", then it strikes me as unlikely that there would be many that would 'work'. (At least if you count properly -- two different ways of coding the same AA shouldn't count as different in this context. Otherwise you introduce a fairly predictable exponential factor, something like (64/20)^(n/3) on average. In any case the remaining entropy is quite high.) I'd expect that almost any change would render it useless -- maybe some transpositions would be possible, but probably not many. (At that level of compression, probably many sections are being reused in strange ways.) Of course even if you could break it up into 30 sections (average length just 100bp!) and transpose them you'd only gain 107 bits toward the 4322, leaving you with much too much entropy.
For genomes of length M + 100, say, you can get lots of viable lifeforms by adding noncoding sections, but you don't have that option with length M.
In any case I don't think that a single 3000bp strand of DNA could reasonably form by chance, let alone N/4^3000 of them. I suspect abiogenesis was much more subtle.
Charles Greathouse Analyst/Programmer Case Western Reserve University
On Thu, May 7, 2015 at 2:46 PM, Warren D Smith <warren.wds@gmail.com> wrote:
If the simplest lifeform capable of independent existence had, say, 3000 base pair long DNA --actually I think the least the "minimal genome project" has been able to come up with is more like 100 times that -- then you might say "the chance of that is 4^(-3000) which is about 10^(-1806), which is so small that even if every atom in the observable universe were trying a new 3000-long DNA sequence every femtosecond, life almost certainly would still never have come into existence anywhere ever... therefore, life is a miracle and Earth is likely the only place in the universe that has any."
However, that calculation was wrong because more than one of those 4^3000 sequences probably works to produce a viable lifeform. In fact the number that work is probably also enormous. If the number were N then the viability probability is more like P=N/4^3000, and it is that chance P that really needs to be used to assess the miraculousness.
OK, that brings me to my point. We can do an experiment to approximately measure P. Start with some near-minimal bacterial genome which say has G base pairs in its genome. Randomly mutate K of its base pairs. There are binomial(G,K)*3^K possible mutated genomes obtainable in this way. We are taking a uniform random sample among them. When you do this, count how many of the resulting bacteria remain viable, versus how many are rendered unviable.
The result of such an experiment is a function F(K) estimating the chance that mutating K of the base pairs, still yields a viable lifeform. We will know, to good accuracy, the values of F(1), F(2), F(3), etc for some set of K's. We then want to EXTRAPOLATE this function to determine the value of F(G-1), which is, essentially, equal to P, the life-viability chance we were seeking.
To perform this extrapolation we need to obtain an empirical formula that fits the data F(1), F(2), F(3) etc that we have. I doubt this extrapolation will be very difficult. In fact I might a priori suspect that F(K) = K^Q * C^K for suitable fitting constants Q and C, will work decently.
Thinking some more, it might be possible to do an even better job than that. We can do a different kind of experiment to attempt to estimate F(K+J)/F(K), by starting with a viable K-mutant, and generating J-mutants of it. We might be able to reach very large K values this way and thus build a long chain of such ratio-estimates.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
You need to do the analysis in terms of some molecule that can actually replicate itself. DNA requires a polymerase and a helicase to replicate, and it requires mRNA and tRNA for metabolism to grow a cell. There are RNA replicators: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3943892/ But the simplest self-replicator discovered in a poypeptide. http://www.nature.com/nature/journal/v382/n6591/abs/382525a0.html Of course replication is environment dependent. Brent On 5/7/2015 11:46 AM, Warren D Smith wrote:
If the simplest lifeform capable of independent existence had, say, 3000 base pair long DNA --actually I think the least the "minimal genome project" has been able to come up with is more like 100 times that -- then you might say "the chance of that is 4^(-3000) which is about 10^(-1806), which is so small that even if every atom in the observable universe were trying a new 3000-long DNA sequence every femtosecond, life almost certainly would still never have come into existence anywhere ever... therefore, life is a miracle and Earth is likely the only place in the universe that has any."
However, that calculation was wrong because more than one of those 4^3000 sequences probably works to produce a viable lifeform. In fact the number that work is probably also enormous. If the number were N then the viability probability is more like P=N/4^3000, and it is that chance P that really needs to be used to assess the miraculousness.
OK, that brings me to my point. We can do an experiment to approximately measure P. Start with some near-minimal bacterial genome which say has G base pairs in its genome. Randomly mutate K of its base pairs. There are binomial(G,K)*3^K possible mutated genomes obtainable in this way. We are taking a uniform random sample among them. When you do this, count how many of the resulting bacteria remain viable, versus how many are rendered unviable.
The result of such an experiment is a function F(K) estimating the chance that mutating K of the base pairs, still yields a viable lifeform. We will know, to good accuracy, the values of F(1), F(2), F(3), etc for some set of K's. We then want to EXTRAPOLATE this function to determine the value of F(G-1), which is, essentially, equal to P, the life-viability chance we were seeking.
To perform this extrapolation we need to obtain an empirical formula that fits the data F(1), F(2), F(3) etc that we have. I doubt this extrapolation will be very difficult. In fact I might a priori suspect that F(K) = K^Q * C^K for suitable fitting constants Q and C, will work decently.
Thinking some more, it might be possible to do an even better job than that. We can do a different kind of experiment to attempt to estimate F(K+J)/F(K), by starting with a viable K-mutant, and generating J-mutants of it. We might be able to reach very large K values this way and thus build a long chain of such ratio-estimates.
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com https://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
participants (6)
-
Adam P. Goucher -
Allan Wechsler -
Charles Greathouse -
James Propp -
meekerdb -
Warren D Smith