Annoying creationists

Paul C. Anagnostopoulos · Jan 14, 2007

Kleinman said:
Paul’s interpretation that Rcapacity represent the inability of the weight matrix to find binding sites is nonsense.

Sorry, but that's your interpretation of what I said. What I said was that the binding sites do not have sufficient information capacity for the weighting process to distinguish them uniquely from all the other positions on the chromosome. No matter how many times I say this, you don't seem to get it. Imagine a megabase genome and a binding site length of 3 bases. Do you think 6 bits is enough to uniquely identify specific sites on that genome?

The weight matrix is simply finding more binding sites in the non-binding site region of the genome and that region dominates the selection process so that binding sites are no longer evolving in the binding site region when using Dr Schneider’s selection scheme.

The sites are evolving, along with the rest of the genome. They just can't evolve a unique identification code.

Unnamed has overcome this effect by weighting the selection process to the binding sites in the binding site region and neglecting the effect of errors in the non-binding site region.

As far as I know, Unnamed has run no experiments where Rcapacity could be a problem. Furthermore, he is not neglecting the effect of spurious bindings. If you think he is, please present your case.

~~ Paul

kleinman · Jan 14, 2007

Annoying Creationists

Kleinman said:
Paul’s interpretation that Rcapacity represent the inability of the weight matrix to find binding sites is nonsense.

Paul said:

Sorry, but that's your interpretation of what I said. What I said was that the binding sites do not have sufficient information capacity for the weighting process to distinguish them uniquely from all the other positions on the chromosome. No matter how many times I say this, you don't seem to get it. Imagine a megabase genome and a binding site length of 3 bases. Do you think 6 bits is enough to uniquely identify specific sites on that genome?

Click to expand...

And your restatement is still nonsense. What is happening with Dr Schneider’s selection process as the genome length is being increased is there are more potential locations in the non-binding site region where binding sites can be potentially be identified and cause mistakes. Once the genome length reaches a certain length, these mistakes in the non-binding site region dominate the selection process. It has nothing to do with the weighting process to distinguish them uniquely from all other positions on the chromosome. Why don’t you test your concept of Rcapacity with Unnamed’s selection process? Since Unnamed’s selection process is designed to almost completely ignore mistakes in the non-binding site region, see whether you can converge genome lengths longer than would be possible based on your concept of Rcapacity.

Kleinman said:
The weight matrix is simply finding more binding sites in the non-binding site region of the genome and that region dominates the selection process so that binding sites are no longer evolving in the binding site region when using Dr Schneider’s selection scheme.

Paul said:

The sites are evolving, along with the rest of the genome. They just can't evolve a unique identification code.

Click to expand...

Now you are saying that the entire genome is evolving??????? The only thing you can say is evolving in the non-binding site region of the genome is a lack of binding sites, otherwise the non-binding site region remains random.

Kleinman said:
Unnamed has overcome this effect by weighting the selection process to the binding sites in the binding site region and neglecting the effect of errors in the non-binding site region.

Paul said:

As far as I know, Unnamed has run no experiments where Rcapacity could be a problem. Furthermore, he is not neglecting the effect of spurious bindings. If you think he is, please present your case.

Click to expand...

You took Dr Schneider’s model of random point mutations and natural selection as reality before you did any study of the model. You are now taking Unnamed selection process as valid. You must really enjoy retracting your statements.

It is simple enough to test your concept of Rcapacity with Unnamed’s selection process. Simply take Dr Schneider’s original case and start doubling the genome length and see where convergence fails. If you don’t want to run the series, post Unnamed’s version of ev on the net and let us run the series.

Paul C. Anagnostopoulos · Jan 14, 2007

Kleinman said:
And your restatement is still nonsense. What is happening with Dr Schneider’s selection process as the genome length is being increased is there are more potential locations in the non-binding site region where binding sites can be potentially be identified and cause mistakes. Once the genome length reaches a certain length, these mistakes in the non-binding site region dominate the selection process. It has nothing to do with the weighting process to distinguish them uniquely from all other positions on the chromosome.

Could you explain mathematically what you mean by "dominate the selection process"? Have you watched an Evj run with an Rcapacity problem? For example, if you run this model:

population 32
genome size 4096
binding sites 16
weight width 3
site width 4

You'll have Rcapacity = Rfrequency = 8 bits. You'll never get past 16 mistakes. And yet, miraculously, the mean age at 30,000 generations is about 3,600. In what sense are spurious bindings dominating?

Since Unnamed’s selection process is designed to almost completely ignore mistakes in the non-binding site region, see whether you can converge genome lengths longer than would be possible based on your concept of Rcapacity.

Ah, now it's "almost completely" ignoring mistakes.

Unnamed: Let's try it. Run your original series of experiments, but use a weight width and binding site width of 4 and 5.

Now you are saying that the entire genome is evolving??????? The only thing you can say is evolving in the non-binding site region of the genome is a lack of binding sites, otherwise the non-binding site region remains random.

You answered your own quesiton.

You took Dr Schneider’s model of random point mutations and natural selection as reality before you did any study of the model. You are now taking Unnamed selection process as valid. You must really enjoy retracting your statements.

I might agree with your second sentence if I knew what you meant by "valid."

~~ Paul

kjkent1 · Jan 14, 2007

I'm probably misunderstanding way too much about the present argument, but allow me to chatter a bit, and then kindly correct my thinking if it's off base.

It seems to me that "unnamed"'s selection method is a more accurate description of actual evolutionary processes, than was Dr. Schneider's original algorithm.

The original version of ev was intended to show how evolution occurs in the binding site region of the genome. However, Schneider's version counted as mistakes, mutations in the non-binding portion, and thereby over-emphasized the deleterious effects of mutations which would actually have no substantive effect on a new creature. The non-binding site portion of the genome is "junk" -- it doesn't contribute anything to the organism.

In the real world, doubtless, at some point, a sufficient number of mutations to a junk portion of a genome may suddenly create a substantive effect on the organism, and I think it would be fair to try to quantify the moment when this occurs and the probability of the occurrence being deleterious vs. beneficial.

Once accomplished, this would give us a better idea of the number of generations required to perfect a creature, because it would undoubtedly slow the evolutionary process somwhat, on the assumption that mutations are empirically known to be more frequently deleterious than beneficial.

However, it also seems to me that this activity occurring in the non-binding site portion of the genome, at some point could be the cause of an entirely new function in the target organism. That is, an aggregation of otherwise irrelevant mutations which occur in the junk portion of a germ cell of an organism over time, could suddenly start providing a selective advantage by pure accident. Perhaps this is a cause of what is so frequently argued about as a being a "macro-evolutionary" event.

kleinman · Jan 14, 2007

Annoying Creationists

Kleinman said:
And your restatement is still nonsense. What is happening with Dr Schneider’s selection process as the genome length is being increased is there are more potential locations in the non-binding site region where binding sites can be potentially be identified and cause mistakes. Once the genome length reaches a certain length, these mistakes in the non-binding site region dominate the selection process. It has nothing to do with the weighting process to distinguish them uniquely from all other positions on the chromosome.

Paul said:

Could you explain mathematically what you mean by "dominate the selection process"? Have you watched an Evj run with an Rcapacity problem? For example, if you run this model:

population 32
genome size 4096
binding sites 16
weight width 3
site width 4

You'll have Rcapacity = Rfrequency = 8 bits. You'll never get past 16 mistakes. And yet, miraculously, the mean age at 30,000 generations is about 3,600. In what sense are spurious bindings dominating?

Click to expand...

What I mean mathematically by “dominate the selection process” is as follows:

Total mistakes = mistakes in binding site region + mistakes in non-binding site region

With a short genome case, the mistakes in the non-binding site region will be a small number. As you lengthen the genome, the number of possible sites for mistakes increase in the non-binding site region (where the majority of the mutations are occurring) and your selection process is stuck trying to minimize the number of mistakes in the non-binding site region rather than selecting for the occasional good mutation that might occur in the binding site region. If you want to get a quantitative measure of this effect, define a couple of variables –BindingMistakes and NonBindingMistakes for your creatures and track these variables and see which ones are affecting selection more.

Kleinman said:
Since Unnamed’s selection process is designed to almost completely ignore mistakes in the non-binding site region, see whether you can converge genome lengths longer than would be possible based on your concept of Rcapacity.

Paul said:

Ah, now it's "almost completely" ignoring mistakes.

Unnamed: Let's try it. Run your original series of experiments, but use a weight width and binding site width of 4 and 5.

Click to expand...

The reason I say “almost completely” or dominates is because valuation[p] does have a small nonzero value in the nonbinding site region and this is reflected by the small increases in generations for convergence as you increase the genome length. If you compare valuation[p] in the binding site region with the non-binding site region, it will have a much larger magnitude in the binding site region because this is where you have the best matches with the weight matrix.

This situation is different than the case where you set your variable gene=1 and nongene=0 and use mistakes to do selection. Here you have completely uncoupled the selection process in the binding site region from mutations in the non-binding site region. In this situation, you are “completely” ignoring mistakes in the non-binding site region of the genome and the generations for convergence is not affected by genome length (for a mutation rate fixed to a number of bases).

Kleinman said:
Now you are saying that the entire genome is evolving??????? The only thing you can say is evolving in the non-binding site region of the genome is a lack of binding sites, otherwise the non-binding site region remains random.

Paul said:

You answered your own quesiton.

Click to expand...

Do you understand the answer?

Kleinman said:
You took Dr Schneider’s model of random point mutations and natural selection as reality before you did any study of the model. You are now taking Unnamed selection process as valid. You must really enjoy retracting your statements.

Paul said:

I might agree with your second sentence if I knew what you meant by "valid."

Click to expand...

Here are a few synonyms for the word valid, suitable, applicable, legitimate, appropriate. Perhaps if you gave us a physical interpretation of Unnamed’s selection process you might understand why I say this process is invalid. More specifically, explain this conditional statement:
if ((valuation[p] = siteValuation(p, width)) >= threshold)
What will be the value of valuation[p] in the binding site region vs the value in the nonbinding site region?

Paul C. Anagnostopoulos · Jan 14, 2007

Kleinman said:
With a short genome case, the mistakes in the non-binding site region will be a small number. As you lengthen the genome, the number of possible sites for mistakes increase in the non-binding site region (where the majority of the mutations are occurring) and your selection process is stuck trying to minimize the number of mistakes in the non-binding site region rather than selecting for the occasional good mutation that might occur in the binding site region.

Sigh.

population 64
genome size 512
bindings sites 16
weight width 2
site width 3

Rcapacity = 6, Rfrequency = 5. Converges in 9,152 generations.

increase genome size to 1024

Rcapacity = 6, Rfrequency = 6. Never converges. Are you arguing that the increase from 512 to 1024 bases is enough for the spurious bindings to swamp selection?

How about if we fix the genome size and just vary the widths?

population 64
genome size 1536
bindings sites 2

weight width, site width, Rcapacity, Rfrequency, generations

6, 7, 14, 9.58, 8991
5, 6, 12, 9.58, 37198
4, 5, 10, 9.58, 42550
3, 4, 8, 9.58, never converges

The difference in size of the gene + binding sites between the final two cases is 34 bases. You're saying an increase of 34 out of 1536 bases in the junk region is enough to swamp selection?

Do you understand the answer?

Yes.

Perhaps if you gave us a physical interpretation of Unnamed’s selection process you might understand why I say this process is invalid. More specifically, explain this conditional statement:
if ((valuation[p] = siteValuation(p, width)) >= threshold)
What will be the value of valuation[p] in the binding site region vs the value in the nonbinding site region?

When there is no binding, the valuation is less than the threshold. When there is a binding, the valuation is greater than the threshold. Which case are you talking about?

~~ Paul

kleinman · Jan 15, 2007

Annoying Creationists

Kleinman said:
With a short genome case, the mistakes in the non-binding site region will be a small number. As you lengthen the genome, the number of possible sites for mistakes increase in the non-binding site region (where the majority of the mutations are occurring) and your selection process is stuck trying to minimize the number of mistakes in the non-binding site region rather than selecting for the occasional good mutation that might occur in the binding site region.

Paul said:

Sigh.

population 64
genome size 512
bindings sites 16
weight width 2
site width 3

Rcapacity = 6, Rfrequency = 5. Converges in 9,152 generations.

increase genome size to 1024

Rcapacity = 6, Rfrequency = 6. Never converges. Are you arguing that the increase from 512 to 1024 bases is enough for the spurious bindings to swamp selection?

How about if we fix the genome size and just vary the widths?

population 64
genome size 1536
bindings sites 2

weight width, site width, Rcapacity, Rfrequency, generations

6, 7, 14, 9.58, 8991
5, 6, 12, 9.58, 37198
4, 5, 10, 9.58, 42550
3, 4, 8, 9.58, never converges

The difference in size of the gene + binding sites between the final two cases is 34 bases. You're saying an increase of 34 out of 1536 bases in the junk region is enough to swamp selection?

Click to expand...

In answer to your questions, yes. If you track the errors in the binding site region and non-binding site region, you would see this. Once the errors in the non-binding site region are greater than in the binding site region, these errors control the selection process. The problem here is not that the weight matrix can not find matches, the weight matrix is finding matches on the non-binding site portion of the genome. These non-binding site errors are selecting out creatures no matter how far along the binding sites have evolved in the binding site region.

Kleinman said:
Perhaps if you gave us a physical interpretation of Unnamed’s selection process you might understand why I say this process is invalid. More specifically, explain this conditional statement:
if ((valuation[p] = siteValuation(p, width)) >= threshold)
What will be the value of valuation[p] in the binding site region vs the value in the nonbinding site region?

Paul said:

When there is no binding, the valuation is less than the threshold. When there is a binding, the valuation is greater than the threshold. Which case are you talking about?

Click to expand...

Both, valuation[p] will have values much greater than 1 in the binding site region and rarely if ever have a value that large in the non-binding site region.

Paul C. Anagnostopoulos · Jan 15, 2007

Kleinman said:
In answer to your questions, yes. If you track the errors in the binding site region and non-binding site region, you would see this. Once the errors in the non-binding site region are greater than in the binding site region, these errors control the selection process. The problem here is not that the weight matrix can not find matches, the weight matrix is finding matches on the non-binding site portion of the genome. These non-binding site errors are selecting out creatures no matter how far along the binding sites have evolved in the binding site region.

So everything goes along just fine and then suddenly, with an increase of about 2% in the amount of junk DNA, the spurious bindings in the junk DNA swamp selection completely. And this happens right when Rfrequency crosses Rcapacity. All righty then.

And it even happens with a mere 1% increase in the junk DNA, right on cue:

population 64
genome size 4096
bindings sites 2

weight width, site width, Rcapacity, Rfrequency, generations

6, 7, 14, 11, 164160
5, 6, 12, 11, 386780
4, 5, 10, 11, never converges

Both, valuation[p] will have values much greater than 1 in the binding site region and rarely if ever have a value that large in the non-binding site region.

I don't think you understand how the valuations work.

~~ Paul

Paul C. Anagnostopoulos · Jan 15, 2007

Ah, but Kleinman does have a point. Perfect creatures can certainly evolve with Rsequence significantly lower than Rfrequency. That means that the Rcapacity problem could be forestalled if the selection process was finer-grained. I bet Unnamed's selection process doesn't run into Rcapacity problems as quickly.

Hey, I can test this, too. I'll run the last case in my previous post with with a missing binding mistake count of 10 instead of 1 (click Advanced on the New dialog). That gives more importance to missing bindings than to spurious bindings. Let's see if it converges ...

Yes! It converges in 59,700 generations.

~~ Paul

kleinman · Jan 15, 2007

Annoying Creationists

Paul looking for the fountain of smart in Beleth’s quote said:
Intelligent Design has no answers. It can only make itself look palatable by making evolution look less palatable. It lives in a cardboard refrigerator box and throws rocks through the windows of evolution's unfinished mansion. ---Beleth

I’m still waiting for an evolutionarian to explain what the components of the DNA replicase system were doing before this system evolved. I would call this a good example of irreducible complexity. The only reason the theory of evolution had any palatability was that mathematics had been ignored. Don’t blame me that your own evolutionarian written, peer reviewed and published mathematical model of random point mutation and natural selection makes your theory look unpalatable. Your unfinished mansion does not meet code, it needs to be torn down.

Kleinman said:
In answer to your questions, yes. If you track the errors in the binding site region and non-binding site region, you would see this. Once the errors in the non-binding site region are greater than in the binding site region, these errors control the selection process. The problem here is not that the weight matrix can not find matches, the weight matrix is finding matches on the non-binding site portion of the genome. These non-binding site errors are selecting out creatures no matter how far along the binding sites have evolved in the binding site region.

Paul said:

So everything goes along just fine and then suddenly, with an increase of about 2% in the amount of junk DNA, the spurious bindings in the junk DNA swamp selection completely. And this happens right when Rfrequency crosses Rcapacity. All righty then.

And it even happens with a mere 1% increase in the junk DNA, right on cue:

population 64
genome size 4096
bindings sites 2

weight width, site width, Rcapacity, Rfrequency, generations

6, 7, 14, 11, 164160
5, 6, 12, 11, 386780
4, 5, 10, 11, never converges

Click to expand...

What you call “goes along just fine” is showing increasing generations for convergence as you decrease the binding site width. You can have more erroneous sites in the non-binding site region with a smaller site width. When you said the following:

Paul said:
What I said was that the binding sites do not have sufficient information capacity for the weighting process to distinguish them uniquely from all the other positions on the chromosome

It this statement is true, once the genome exceeds a given length, then the weight matrix should not be able to uniquely identify erroneous sites in the non-binding site region. So with a mere 1% increase in the junk DNA, you suddenly do not have sufficient information capacity for the weighting process to distinguish them uniquely from all the other positions on the chromosome?

Kleinman said:
Both, valuation[p] will have values much greater than 1 in the binding site region and rarely if ever have a value that large in the non-binding site region.

Paul said:

I don't think you understand how the valuations work.

Click to expand...

If valuation[p] is not the value generated based on the match between the weight matrix and a particular locus on the genome then why don’t you explain to use how this value is generated?

Paul said:
Ah, but Kleinman does have a point. Perfect creatures can certainly evolve with Rsequence significantly lower than Rfrequency. That means that the Rcapacity problem could be forestalled if the selection process was finer-grained. I bet Unnamed's selection process doesn't run into Rcapacity problems as quickly.

Hey, I can test this, too. I'll run the last case in my previous post with with a missing binding mistake count of 10 instead of 1 (click Advanced on the New dialog). That gives more importance to missing bindings than to spurious bindings. Let's see if it converges ...

Yes! It converges in 59,700 generations.

Paul, you just made my point. Not only is Unnamed’s selection process finer grained, it markedly reduces the effect of errors in the non-binding site region. If you reduce the effect of harmful mutations in the non-binding site region, you not only get faster convergence, you reduce the effect of increasing genome length. This concept you are using has no basis in reality but don’t let that interfere with your theory.

Paul C. Anagnostopoulos · Jan 15, 2007

Kleinman said:
It this statement is true, once the genome exceeds a given length, then the weight matrix should not be able to uniquely identify erroneous sites in the non-binding site region. So with a mere 1% increase in the junk DNA, you suddenly do not have sufficient information capacity for the weighting process to distinguish them uniquely from all the other positions on the chromosome?

What does it mean to uniquely identify erroneous sites? If there is insufficient capacity to uniquely identify the binding sites, then not only will the binding sites fail to match, but spurious bindings will occur. The weight matrix will be unable to distinguish them.

I think if you could formalize your claim about what is happening, we would find that we are saying the same thing.

If valuation[p] is not the value generated based on the match between the weight matrix and a particular locus on the genome then why don’t you explain to use how this value is generated?

The valuation is simply the value obtained when the weight matrix is applied to a particular position on the chromosome. In order to be considered a match, it must be greater than or equal to the threshold. A missing binding site will be below the threshold. A spurious binding will be above the threshold.

Paul, you just made my point. Not only is Unnamed’s selection process finer grained, it markedly reduces the effect of errors in the non-binding site region. If you reduce the effect of harmful mutations in the non-binding site region, you not only get faster convergence, you reduce the effect of increasing genome length. This concept you are using has no basis in reality but don’t let that interfere with your theory.

You still haven't explained why you think Unnamed's method reduces the effect of harmful mutations. Let me think about it. A spurious binding will be above the threshold, and its distance from the threshold added to the sort value. A missing binding site will be below the threshold, and its distance from the threshold added. You must be saying that the spurious binding distances will be smaller than the missing site distances. That might be true at the beginning of the simulation, when the missing site valuations are perhaps far from the threshold, but as the simulation progresses the missing site valuations creep toward the threshold. It is the fine-grained creep that gets the job done faster.

It appears that finer-grained selection overcomes the Rcapacity problem to some degree, finding solutions at the low end of the Rsequence range. Let's find out what the Rsequence range is: stay tuned.

I have no idea how this relates to the real world. You'll remember that I guessed that the Rcapacity problem isn't something that matters in real life.

~~ Paul

Paul C. Anagnostopoulos · Jan 15, 2007

I just watched a simulation in slow motion to see how the worst creature behaves. Because of selective sweep, the worst creature has the same number of mistakes as the best creature, all due to missing binding sites. Then, all of a sudden, the worst creature's mistake count jumps by a significant amount. I have 2 binding sites and a missed binding site point count of 10, so the baseline mistake count is 20. Here are some mistake counts of the worst creature when it jumps from 20:

279, 81, 365, 75, 215, 89, 2263

So it's not that a spurious site or two shows up. It's clearly a mutation in the gene (weight matrix or threshold) that completely destroys the creature. So why does the finer-grained selection help? This requires some investigation.

~~ Paul

Edited to add: This is an incomplete story. Spurious sites do show up occasionally.

Paul C. Anagnostopoulos · Jan 15, 2007

I ran the standard model about 50 times and got an Rsequence range of 3.03 to 4.98. Then I ran the standard model with missed site mistake points of 10 and got an Rsequence range of 1.55 to 4.14.

What's going on? Does the finer-grained selection simply allow it to find the low Rsequence values that would normally occur occasionally, or is it more interesting than that?

And is there a correlation between the number of generations and the Rsequence value?

~~ Paul

Paul C. Anagnostopoulos · Jan 15, 2007

Aha! I think it's easier for Ev to evolve away a spurious binding than it is for it to evolve a match at a binding site. So if you favor fewer missing binding sites over fewer spurious binding sites, then things converge faster. That's what setting the missed site mistake points to 10 does. Kleinman is correct, although I'm still not sure about his explanation.

Rcapacity is a separate issue. I think the finer-grained selection allows Ev to find the solutions with lower Rsequence values. But Rcapacity still sets limits. I'll see if I can find them.

Both the missing site mistake points and Rcapacity issues may be nothing more than artifacts of Ev's particular model.

~~ Paul

kleinman · Jan 15, 2007

Annoying Creationists

Kleinman said:
It this statement is true, once the genome exceeds a given length, then the weight matrix should not be able to uniquely identify erroneous sites in the non-binding site region. So with a mere 1% increase in the junk DNA, you suddenly do not have sufficient information capacity for the weighting process to distinguish them uniquely from all the other positions on the chromosome?

Paul said:

What does it mean to uniquely identify erroneous sites? If there is insufficient capacity to uniquely identify the binding sites, then not only will the binding sites fail to match, but spurious bindings will occur. The weight matrix will be unable to distinguish them.

I think if you could formalize your claim about what is happening, we would find that we are saying the same thing.

Click to expand...

“Uniquely identifying” sites is your terminology. If I understand Dr Schneider’s selection process, if the dot product of the weight matrix with values assigned to bases in a site equals or exceeds the threshold value, a site has been identified. This arithmetic should not be affected by the length of the genome.

I have formalized my claim about what is happening but I will repeat it again. Start with the following equation:

Total mistakes = mistakes in binding site region – mistakes in non-binding site region

The mistakes in the binding site region can be at most equal to the total number of sites, gamma. The mistakes in the non-binding site regions can vary and increases as the length of the genome increase. Dr Schneider has designed a selection process that requires sites be maximized on one portion of the genome and minimize on the other portion. The minimization process becomes more difficult as the genome is lengthened until a genome length is reached where it is very unlikely for minimization be achieved. It has nothing to do with the weight matrix being able to “uniquely” identifying a binding site. The weight matrix is always able to identify a match when it exists. The problem is that errors in the non-binding site region are causing evolved creatures (in the binding site region) to be selected out.

The question that should be asked is whether this selection process has any relationship to reality? I contend that it does to some extent. What this is analogous to is a creature with a completely evolve genome except for a single fatal mutation.

Kleinman said:
If valuation[p] is not the value generated based on the match between the weight matrix and a particular locus on the genome then why don’t you explain to use how this value is generated?

Paul said:

The valuation is simply the value obtained when the weight matrix is applied to a particular position on the chromosome. In order to be considered a match, it must be greater than or equal to the threshold. A missing binding site will be below the threshold. A spurious binding will be above the threshold.

Click to expand...

This is the way I understand the meaning of valuation[p]. For a weight matrix five bases wide, you obtain a ten bit number which can have the integer values between -512…+511. For larger weight matrices, valuation[p] can have values much larger than this when a good match is achieved. This is why Unnamed’s selection process using valuation[p] biases selection to the binding site region where large values of valuation[p] are obtained.

Kleinman said:
Paul, you just made my point. Not only is Unnamed’s selection process finer grained, it markedly reduces the effect of errors in the non-binding site region. If you reduce the effect of harmful mutations in the non-binding site region, you not only get faster convergence, you reduce the effect of increasing genome length. This concept you are using has no basis in reality but don’t let that interfere with your theory.

Paul said:

You still haven't explained why you think Unnamed's method reduces the effect of harmful mutations. Let me think about it. A spurious binding will be above the threshold, and its distance from the threshold added to the sort value. A missing binding site will be below the threshold, and its distance from the threshold added. You must be saying that the spurious binding distances will be smaller than the missing site distances. That might be true at the beginning of the simulation, when the missing site valuations are perhaps far from the threshold, but as the simulation progresses the missing site valuations creep toward the threshold. It is the fine-grained creep that gets the job done faster.

It appears that finer-grained selection overcomes the Rcapacity problem to some degree, finding solutions at the low end of the Rsequence range. Let's find out what the Rsequence range is: stay tuned.

I have no idea how this relates to the real world. You'll remember that I guessed that the Rcapacity problem isn't something that matters in real life.

Click to expand...

As said above, valuation[p] has much larger value for a good match in the binding site region than a not so good match in the non-binding site region. Using mistakes to do selection gives equal weight to errors in either the binding site or non-binding site regions.

Paul said:
I just watched a simulation in slow motion to see how the worst creature behaves. Because of selective sweep, the worst creature has the same number of mistakes as the best creature, all due to missing binding sites. Then, all of a sudden, the worst creature's mistake count jumps by a significant amount. I have 2 binding sites and a missed binding site point count of 10, so the baseline mistake count is 20. Here are some mistake counts of the worst creature when it jumps from 20:

279, 81, 365, 75, 215, 89, 2263

So it's not that a spurious site or two shows up. It's clearly a mutation in the gene (weight matrix or threshold) that completely destroys the creature. So why does the finer-grained selection help? This requires some investigation.

How does Dr Schneider evolve the weight matrix?

Paul C. Anagnostopoulos · Jan 15, 2007

Kleinman said:
“Uniquely identifying” sites is your terminology. If I understand Dr Schneider’s selection process, if the dot product of the weight matrix with values assigned to bases in a site equals or exceeds the threshold value, a site has been identified. This arithmetic should not be affected by the length of the genome.

Correct.

I have formalized my claim about what is happening but I will repeat it again. Start with the following equation:

Total mistakes = mistakes in binding site region - mistakes in non-binding site region

That's not the correct equation, but let's continue ...

The mistakes in the binding site region can be at most equal to the total number of sites, gamma. The mistakes in the non-binding site regions can vary and increases as the length of the genome increase.

Your second sentence is too simplistic. The potential spurious bindings increases as the genome size increases.

Dr Schneider has designed a selection process that requires sites be maximized on one portion of the genome and minimize on the other portion. The minimization process becomes more difficult as the genome is lengthened until a genome length is reached where it is very unlikely for minimization be achieved.

This is also too simplistic. It certainly becomes more difficult to eliminate the spurious bindings, but this depends on the selection method, mutation rate, population, and so forth.

It has nothing to do with the weight matrix being able to “uniquely” identifying a binding site. The weight matrix is always able to identify a match when it exists. The problem is that errors in the non-binding site region are causing evolved creatures (in the binding site region) to be selected out.

By definition, if a match exists, the weight matrix can identify it. The problem is that it becomes almost impossible for matches to evolve at the binding sites and not at other sites, when a unique pattern of bases cannot exist at the binding sites. If a sequence logo cannot form, then evolution gets nowhere.

Your final sentence is correct, but what is causing the errors in the junk DNA? Lots of things. One of those things is the Rcapacity problem.

This is the way I understand the meaning of valuation[p]. For a weight matrix five bases wide, you obtain a ten bit number which can have the integer values between -512…+511. For larger weight matrices, valuation[p] can have values much larger than this when a good match is achieved. This is why Unnamed’s selection process using valuation[p] biases selection to the binding site region where large values of valuation[p] are obtained.

That wasn't clear from your discussion, especially when you said "... valuation[p] will have values much greater than 1 in the binding site region ...". Your final sentence here still doesn't make sense. The valuation of a matched binding site does not contribute to the sort value.

As said above, valuation[p] has much larger value for a good match in the binding site region than a not so good match in the non-binding site region.

But the valuation of a matched binding site is irrelevant to the sort value.

How does Dr Schneider evolve the weight matrix?

The weight matrix, threshold, binding sites, and junk all evolve in concert by random point mutation.

~~ Paul

kleinman · Jan 15, 2007

Annoying Creationists

Kleinman said:
I have formalized my claim about what is happening but I will repeat it again. Start with the following equation:

Total mistakes = mistakes in binding site region - mistakes in non-binding site region

Paul said:

That's not the correct equation, but let's continue ...

Click to expand...

Sorry, typo error, should be:
Total mistakes = mistakes in binding site region + mistakes in non-binding site

Kleinman said:
The mistakes in the binding site region can be at most equal to the total number of sites, gamma. The mistakes in the non-binding site regions can vary and increases as the length of the genome increase.

Paul said:

Your second sentence is too simplistic. The potential spurious bindings increases as the genome size increases.

Click to expand...

You are splitting hairs using the word “potential”. What do you think the probability is that the number of spurious bindings decrease or stay the same as the genome size increases?

Kleinman said:
Dr Schneider has designed a selection process that requires sites be maximized on one portion of the genome and minimize on the other portion. The minimization process becomes more difficult as the genome is lengthened until a genome length is reached where it is very unlikely for minimization be achieved.

Paul said:

This is also too simplistic. It certainly becomes more difficult to eliminate the spurious bindings, but this depends on the selection method, mutation rate, population, and so forth.

Click to expand...

Really? Does your concept of Rcapacity depend on mutation rate and population when using Dr Schneider’s selection method? Any selection method that diminishes the effects of harmful mutations in the non-binding site region will converge more quickly. If you completely ignore harmful mutations in the non-binding region, you uncouple the evolution of the binding sites in the binding site region from the length of the genome. There just isn’t any basis in reality for such a selection method.

Kleinman said:
It has nothing to do with the weight matrix being able to “uniquely” identifying a binding site. The weight matrix is always able to identify a match when it exists. The problem is that errors in the non-binding site region are causing evolved creatures (in the binding site region) to be selected out.

Paul said:

By definition, if a match exists, the weight matrix can identify it. The problem is that it becomes almost impossible for matches to evolve at the binding sites and not at other sites, when a unique pattern of bases cannot exist at the binding sites. If a sequence logo cannot form, then evolution gets nowhere.

Your final sentence is correct, but what is causing the errors in the junk DNA? Lots of things. One of those things is the Rcapacity problem.

Click to expand...

You can not get a sequence logo to form when selection is being driven by errors in the non-binding site region.

Why don’t you enumerate the things which cause errors in the non-binding site region?

Kleinman said:
This is the way I understand the meaning of valuation[p]. For a weight matrix five bases wide, you obtain a ten bit number which can have the integer values between -512…+511. For larger weight matrices, valuation[p] can have values much larger than this when a good match is achieved. This is why Unnamed’s selection process using valuation[p] biases selection to the binding site region where large values of valuation[p] are obtained.

Paul said:

That wasn't clear from your discussion, especially when you said "... valuation[p] will have values much greater than 1 in the binding site region ...". Your final sentence here still doesn't make sense. The valuation of a matched binding site does not contribute to the sort value.

Click to expand...

The valuation of a match binding site does not contribute to the sort value but a close to threshold but failed match in the binding site region will have a large value of valuation[p]. If gamma=16 and you start with 16 mistakes in the binding site region, you get 16*valuation[p] in the binding site region that outweighs the small number of mistakes in the non-binding site region. Track valuation[p] in the binding site region and non-binding site region and see exactly what you are sorting on.

Kleinman said:
As said above, valuation[p] has much larger value for a good match in the binding site region than a not so good match in the non-binding site region.

Paul said:

But the valuation of a matched binding site is irrelevant to the sort value.

Click to expand...

Track valuation[p] in the binding site region and non-binding site region and get the data which tells exactly what you are sorting with.

Kleinman said:
How does Dr Schneider evolve the weight matrix?

Paul said:

The weight matrix, threshold, binding sites, and junk all evolve in concert by random point mutation.

Click to expand...

What is the mutation rate assigned to the weight matrix? How does the threshold evolve?

All this discussion on the selection process is simply an academic exercise since there is no selection process as formulated by Unnamed and for that matter Dr Schneider’s selection method that are seen in reality. Natural selection can only determine if a particular mutation is immediately useful or harmful. If the mutation is neutral, there is no selection for or against that mutation and you are left with the probability problem without selection to evolve your genes de novo.

One of the most ardent supporters of your theory said it quite well:

Richard Dawkins said:
Life isn't like that. Evolution has no long term goal.

Have I co-opted another evolutionarian idea?

Paul C. Anagnostopoulos · Jan 15, 2007

Kleinman said:
You are splitting hairs using the word “potential”. What do you think the probability is that the number of spurious bindings decrease or stay the same as the genome size increases?

I suspect it increases, but in a quite complex manner. We see creatures with ages of 6000 generations.

Really? Does your concept of Rcapacity depend on mutation rate and population when using Dr Schneider’s selection method? Any selection method that diminishes the effects of harmful mutations in the non-binding site region will converge more quickly. If you completely ignore harmful mutations in the non-binding region, you uncouple the evolution of the binding sites in the binding site region from the length of the genome. There just isn’t any basis in reality for such a selection method.

In the real world, a spurious binding in pure junk DNA would be harmless. There appears to be a basis for selection of weak and strong bindings. You can't simply dismiss it, although I agree that we don't know how the various Ev models relate to reality.

The valuation of a match binding site does not contribute to the sort value but a close to threshold but failed match in the binding site region will have a large value of valuation[p].

And a small distance from the threshold, which is what contributes to the sort value.

If gamma=16 and you start with 16 mistakes in the binding site region, you get 16*valuation[p] in the binding site region that outweighs the small number of mistakes in the non-binding site region.

It is not the valuation that is added to the sort value, but the distance between the valuation and the threshold. These distances will start out large and then diminish as the binding site valuations creep toward the threshold. They will certainly swamp a single spurious binding, but so does the current selection method. They will not swamp a swarm of spurious bindings, such as might arise from a mutation in the gene.

But I agree with you that Unnamed's selection method is weighing missing bindings more heavily than spurious bindings, as does my experiments with the missing binding mistake points set to 10. See my post #1594. This speeds up evolution by favoring a creature with, say, 2 missing bindings and 5 spurious bindings over a creature with 3 missing bindings and 0 spurious bindings. Do you think something like that can't occur in nature?

What is the mutation rate assigned to the weight matrix? How does the threshold evolve?

The mutation rate is constant across the entire chromosome. Mutations are randomly applied to the entire chromsome.

All this discussion on the selection process is simply an academic exercise since there is no selection process as formulated by Unnamed and for that matter Dr Schneider’s selection method that are seen in reality.

Then you have lost your mathematical argument against evolution.

~~ Paul

joobz · Jan 16, 2007

Paul C. Anagnostopoulos said:
kleinman said:

All this discussion on the selection process is simply an academic exercise since there is no selection process as formulated by Unnamed and for that matter Dr Schneider’s selection method that are seen in reality.

Click to expand...

Then you have lost your mathematical argument against evolution.

~~ Paul

CapelDodger · Jan 16, 2007

Paul C. Anagnostopoulos said:
Then you have lost your mathematical argument against evolution.

~~ Paul

Bish, bosh, job done. Now for stage two : the frantic row-back. I don't see this guy as a quitter. He has far too high an opinion of himself.

Annoying creationists

Nap, interrupted.

Banned

Nap, interrupted.

Muse

Banned

Nap, interrupted.

Banned

Nap, interrupted.

Nap, interrupted.

Banned

Nap, interrupted.

Nap, interrupted.

Nap, interrupted.

Nap, interrupted.

Banned

Nap, interrupted.

Banned

Nap, interrupted.

Tergiversator

Penultimate Amazing