Missing genetic information refutes neo-Darwinism

I also noticed how wuschel magically changed my "20 000 genes" into "20 000 characters".
I didn't. The string of 20.000 characters refers to the set of 20.000+ properties (of humans) and their respective values you proposed.

I chose 20.000 instead of 20.000+ in your favor, because the more complexity you assert, the more computing power you would need to brute force the proof.
 
Last edited:
The question then seems to be: "how may information theory be applied to biology, and what are the strongest conclusions that can be reached by doing so?"
Hmm... unless I misunderstand your question, this does not differ an awful lot from: "how may mathematics be applied to physics, and what are the strongest conclusions that can be reached by doing so?"

Well, and: Shouldn't physics restrict itself to studying the Estimated Airspeed Velocity of an Unladen Swallow? ;-)
 
These are called "models" and you wouldn't believe they are used in science a lot.
Models in science are defined as simplified descriptions of reality. That means that they'll need to have something to do with reality. In biology, there is no such thing as a clearly defined or defineable "description language".

Surprise: They do use K-complexity in biology as well!
Not for anything I am talking about though, so it is not relevant to my claims.

You assert complexity WRT "more than 20 000 properties".
No, I am only disputing the simplistic but common view that for every property there must be a gene coding for it, which causes people (such as the OP) to erroneously conclude that there is genetic information missing. That "neo-Darwinism" has no explanation for where this information comes from.

Whether all the information about the properties (of which there may be billions or infinite depending on how you count) of a human body can be compressed to the same number of bytes as the genome or even the entire DNA is entirely irrelevant, as we know that the DNA is not a compressed description of the phenotype anyway. Still, it would highly surprise me if it were possible.

It just invalidates your argument.
More likely it means that you misunderstand it.

I chose 20.000 instead of 20.000+ in your favor, because the more complexity you assert, the more computing power you would need to brute force the proof.
I fail to see why I would need to brute force the proof anyway, anymore than I would need to brute force the proof that lottery draws are randomised and as a result will be rather difficult to compress. If I can see with my own eyes that a string is not generated by some rigid algorithmic process, I simply do not care whether there is an algorithmic process that can produce the same string.

Not unless it can predict the next winning lottery numbers...
 
Last edited:
Why on earth should biologists restrict their interests to such things?
Please note that I used an "if" there; not a "should" -- though, if pressed, I might try to argue that "biologists" (in the most general sense) start with an interest in understanding organisms (and populations of organisms, and interactions between organisms, and between populations of organisms), their interests in things like cells and genes being primarily means to those ends. Then again, I might not.

My point is that where a certain feature appears to us obvious as a property of a whole organism, it's easy to conclude that we're observing an explicit feature of a genome; i.e., that the genome contains "information" regarding that feature. The OP argues that rejecting this view forces the conclusion that the information must reside somewhere else.

Your point about costs is well taken, but it's part of the same approach that views a feature like "cornering" as something explicitly "coded for". What I see as the error is the attempt to apply the term "information" to the genetic sources of such a feature, with the expectation that it (here, "information about cornering") can then be quantified. It is the assumption that because the sources of that feature (at the level of genes) can be treated mathematically as "information", their consequences (at the level of phenotype) must reside with those mathematical relationships. It's like assuming that the "meaning" of a phrase resides with the relationships between the letters.

There are other alternatives besides the ones the OP lists. Describing some features as "emergent" is one. Gould's "spandrels" is another. Noting the influence of environmental factors is yet another. And Earthborn argues for the importance of the chemical state of the cell in which the processes of transcription, translation, and protein synthesis take place (which leads to consideration of the fact that individual cells have individual histories, and these histories influence the cell's state at a given point as well).

What these all have in common is that they are not concerned with information at the level of the gene, but with meaning at the level of phenotype-interacting-with-environment; with what happens when genetic "information" is taken out of the conveniently sterile mathematical framework we placed it in by taking an information theoretical approach ...and the results put to work.

In information theory, meaning always resides somewhere else.
 
I fail to see why I would need to brute force the proof anyway, anymore than I would need to brute force the proof that lottery draws are randomised and as a result will be rather difficult to compress.
I hate to repeat myself, but even as far as the lottery draws, the only way to prove that a given data set of size N cannot be compressed is by trying out all possible data sets of sizes up to N. This is commonly referred to as "brute force".

What this means is that even if you "know" the source is "random", you cannot prove that the resulting data cannot be compressed.

If I can see with my own eyes that a string is not generated by some rigid algorithmic process, I simply do not care whether there is an algorithmic process that can produce the same string.

This would be argument from ignorance, at which stage I would like to rest my case.
 
No, I am only disputing the simplistic but common view that for every property there must be a gene coding for it, which causes people (such as the OP) to erroneously conclude that there is genetic information missing.

I need to add that we are on the same side as far as this. It is just that I reckon you are cutting them too much slack as far as not forcing them to express their "concerns" AFA "Information" and stuff within the context of a rigid scientific framework actually addressing "Information" and stuff.

Rather, you would get dragged down to the level of "If it doesn't look compressible to me, it most likely isn't!"

Did it ever occur to you that if indeed the decision whether a certain string is compressible or not requires an amount of computing power that grows exponentially with the size of the string - well - that the amount of computing power required to make that decision for sufficiently long strings may (in your case just slightly) exceed the computing power of a human brain?
 
It is just that I reckon you are cutting them too much slack as far as not forcing them to express their "concerns" AFA "Information" and stuff within the context of a rigid scientific framework actually addressing "Information" and stuff.
I'm not going to drag a rigid scientific framework to a place where it doesn't belong.

"If it doesn't look compressible to me, it most likely isn't!"
More like: "If a string has not been made by compressing a dataset, it makes no difference whether or not the dataset can be compressed to the string. It is irrelevant!"

the amount of computing power required to make that decision for sufficiently long strings may (in your case just slightly) exceed the computing power of a human brain?
There are lots of things that exceed the computing power of a human brain. I don't care, why should I?

It should also be mentioned that the human brain works rather differently from such a formalised mathematical construct like a Turing Machine. It is rather bad at "brute forcing" its way through all possibilities of compression algorithms. But it is rather good a deciding whether it is worth to try.

What this means is that even if you "know" the source is "random", you cannot prove that the resulting data cannot be compressed.
But I can prove that trying to prove it one way or another serves no purpose at all. It certainly won't give us any insight in how the lottery machine works.

I cannot prove that are no two headed purple and green striped cows living on Pluto, but I still have a pretty good idea that trying to search for them is likely going to be a waste of effort. Similarly I can try measuring myself down to atomic level and then try to find an algorithm that can compress all that information to the 750MB capable of being stored in human DNA. It is not that I can prove that this is impossible, it is just that I can be very sure that it is all a useless thing to do and certainly won't bring us any closer to an understanding of genetics. Even more unlikely that there is such an algorithm is that it will produce a 750MB or so (maybe as little as 513MB) string that in anyway resembles human DNA. That's because DNA isn't the product of a compression algorithm to begin with.

This would be argument from ignorance, at which stage I would like to rest my case.
An argument from ignorance would be "I can't imagine it, therefore it cannot be true." My argument is "I seriously doubt it, and I can't imagine why it matters."

You certainly don't explain why it makes any difference whether information about my phenotype is compressible or not. I'm probably (maybe the universe is deterministic after all) not the result of an algorithm, so why should I care whether I am describeable as one?
 
I'm not going to drag a rigid scientific framework to a place where it doesn't belong.

You wrote (WRT genes):

Earthborn said:
And in humans there are only about 20 000 of them, humans have more than 20 000 properties so the genes cannot be an intricate description of a human.

This is a syllogism of the form: A, therefore NOT B.

"A" being a claim towards the human phenotype exceeding a certain complexity.

If it wasn't the framework of Information Theory that you implied as the scientific foundation of your claim, one has to wonder what other scientific framework - addressing issues of information, complexity... etc. - was?

More like: "If a string has not been made by compressing a dataset, it makes no difference whether or not the dataset can be compressed to the string. It is irrelevant!"

O.k., I'll try to guess what you could have meant with the above statement: I assume you mean that if you know that a string has been generated by a presumably random process, you also know that it isn't compressible i.e. complex

Apart from not being entirely true because you cannot prove randomness either - and even then only provable for half of the strings but not provable for any one particular string - this would be a syllogism of the form:

NOT B, therefore A

i.e. "There are environmental factors during development, therefore the phenotype is not exclusively described by the genes, therefore the phenotype is more complex than the genome"

NOT B, therefore A

So far, so good. But then you go ahead and use the conclusion "A" that you have drawn from the premises "NOT B", when you write that:

Earthborn said:
And in humans there are only about 20 000 of them, humans have more than 20 000 properties so the genes cannot be an intricate description of a human.

IOW, you use "A", that you derived from "NOT B" in order to prove "NOT B"

This is what is commonly referred to as circular logic.

It should also be mentioned that the human brain works rather differently from such a formalised mathematical construct like a Turing Machine. It is rather bad at "brute forcing" its way through all possibilities of compression algorithms. But it is rather good a deciding whether it is worth to try.

If leave that claim - essentially implying that the operation of the human brain for some fundamental reason cannot be simulated in a TM - for others to have a go on. You are very much claiming to be able to prove a negative.

I cannot prove that are no two headed purple and green striped cows living on Pluto,

You've chosen the wrong analogy, last but not least because we have very good reason to assume that the property data of the human phenotype is compressible.

Similarly I can try measuring myself down to atomic level and then try to find an algorithm that can compress all that information to the 750MB capable of being stored in human DNA. It is not that I can prove that this is impossible,

Yet you use this unproven assertion as premises when you write:

Earthborn said:
And in humans there are only about 20 000 of them, humans have more than 20 000 properties so the genes cannot be an intricate description of a human.

This is what I originally was going to point out to you.

You certainly don't explain why it makes any difference whether information about my phenotype is compressible or not.

Well, it does make a difference as far as the validity of your statement:

Earthborn said:
And in humans there are only about 20 000 of them, humans have more than 20 000 properties so the genes cannot be an intricate description of a human.

...which my objection originally has been about.
 
we have very good reason to assume that the property data of the human phenotype is compressible.
"Property data" sounds like the same thing as "meaning information". In information theory, this would be gibberish.

Wife: "I'm leaving you"
Husband: "Who is he?"

Twenty-five characters, including spaces. We know instantly that the pronoun "he" refers to the man with whom the husband assumes his wife has been having an affair. Is the missing referent identified by expanding compressed data?
 
"Property data" sounds like the same thing as "meaning information". In information theory, this would be gibberish.

First, Earthborn would likely object to your claim that information theory applies to the subject matter at all.

If the phenotype can be exhaustively described by a finite set of properties and their respective possible values, then, within this framework and for any given phenotype the finite size data set derived from these values I called "property data" for the sake of simplicity and to accommodate the language used by the opponent (Earthborn) when doing so is not likely to cause harm. From within the context of my argument, I thought that this should be obvious.

Glad you asked, though!

Wife: "I'm leaving you"
Husband: "Who is he?"

Twenty-five characters, including spaces. We know instantly that the pronoun "he" refers to the man with whom the husband assumes his wife has been having an affair.
Actually we don't, rather we guess. Could be the name of the divorce lawyer he is asking for. Could be anyone. And in languages that assign gender to things it even could be anything.

Is the missing referent identified by expanding compressed data?

There is nothing identified whatsoever, but rather it is guessed. And how this relates to the issue at hand in any conceivable way, I would ask you to clarify!

Before possibly jumping the gun and calling
gibberish
on this one myself, that is.
 
If the phenotype can be exhaustively described by a finite set of properties and their respective possible values,

It can't. The phenotype is continous, while any finite set of finitely-describable properties is of course discrete.

That alone makes it relatively easy to show that this line of inquiry is not likely to produce a fruitful result.
 
Earthborn would likely object to your claim that information theory applies to the subject matter at all.
Earthborn and I often disagree. On this, however, I believe we are so far in perfect agreement. Information theory has legitimate applications, some of which may produce results biolgists can use. But (like thermodynamics) what is a useful scientific framework when used responsibly leads to confusion and flawed conclusions when applied inappropriately, as here. Both have suffered horribly at the hands of creationists and their ilk.

If the phenotype can be exhaustively described by a finite set of properties and their respective possible values...
And if (as Earthborn, drkitten and I all agree) it can't?

And in languages that assign gender to things it even could be anything.
Precisely. It could be anything. That's exactly my point. The "meaning" is not a property of the message content, but what it becomes within a given intepretive framework. I cannot accept your excuse for the use of the phrase "property data"; the overall thrust of your arguments make it clear that you view "properties" as something which resides with "data". I'm suggesting that this is the same as mistaking "information" (as it is defined in information theory) with "meaning" (which is most decidedly outside the bounds of that framework). If you do not understand the importance of the distinction, any attempt to invoke information theory to support your position can only lead to error.

And how this relates to the issue at hand in any conceivable way, I would ask you to clarify!
That you don't understand the relevance to the issue at hand further confirms that you don't even see the distinction. Looks like you aren't quite ready to invoke information theory responsibly.
 
Earthborn and I often disagree. On this, however, I believe we are so far in perfect agreement. Information theory has legitimate applications, some of which may produce results biolgists can use. But (like thermodynamics) what is a useful scientific framework when used responsibly leads to confusion and flawed conclusions when applied inappropriately, as here. Both have suffered horribly at the hands of creationists and their ilk.

Such as by invoking "Argument from complexity"

Such as this one:

Earthborn said:
And in humans there are only about 20 000 of them, humans have more than 20 000 properties so the genes cannot be an intricate description of a human.

How is that not argument from complexity?

And this and only this was the statement I was originally objecting to.

Since you state that you "agree with him" on this matter, I could as well ask you how stating that (WRT genes):

Earthborn said:
And in humans there are only about 20 000 of them, humans have more than 20 000 properties so the genes cannot be an intricate description of a human.

is NOT a misapplication of information theory. How is it not? We keep in mind that I objected to exactly this particular statement.

And, if information theory was not the scientific framework the above assertion was based upon, which was, then? Mind explaining?

Precisely. It could be anything. That's exactly my point. The "meaning" is not a property of the message content, but what it becomes within a given intepretive framework.

Never have I brought up anything even remotely resembling the concept of "meaning". You're constructing one huge straw person right here!

I cannot accept your excuse for the use of the phrase "property data"; the overall thrust of your arguments make it clear that you view "properties" as something which resides with "data".

Could you please stick to what I actually wrote rather than making attempts at mind reading? The reason the word "properties" is in here at all is because Earthborn used it in its statement that I objected to. Simple as that.

I'm suggesting that this is the same as mistaking "information" (as it is defined in information theory) with "meaning" (which is most decidedly outside the bounds of that framework).

The only thing you're proving right now is that mind reading most likely does not work. You'd be hard pressed to produce anything I wrote which would justify your accusation.

I call BS on what essentially is an argument from complexity and you accuse me of misappropriating information theory? This is rich indeed!
 
Last edited:
Well, it does make a difference as far as the validity of your statement:
No, it doesn't.

Earthborn said:
And in humans there are only about 20 000 of them, humans have more than 20 000 properties so the genes cannot be an intricate description of a human.
...which my objection originally has been about.
Sure, whatever. Let me reformulate a bit:
Earthborn said:
And in humans there are only about 20 000 of them, humans have more than 20 000 properties and we know that the information about those properties is never compressed so the genes cannot be an intricate description of a human.


Yet you use this unproven assertion as premises when you write:
Notice the difference between "750MB" and "20 000 genes".

O.k., I'll try to guess what you could have meant with the above statement: I assume you mean that if you know that a string has been generated by a presumably random process, you also know that it isn't compressible i.e. complex
I thought I was abundantly clear on what I said, but I'll reformulate it once again:
If you know that a string has been generated by a presumably random process, you know that it is irrelevant whether or not it is compressible i.e. complex.

You do know what "irrelevant" means, don't you?

If leave that claim - essentially implying that the operation of the human brain for some fundamental reason cannot be simulated in a TM - for others to have a go on.
It is of course a bit of philosophical issue. If the Universe is entirely deterministic and everything in it is nicely quantised, then obviously the human brain could be simulated in a Turing Machine, because everything can. It will just be crazily difficult, and for a meaningful simulation of a human brain, you'll also need to simulate a body in which it lives and a complex environment around that body. But if there is even the slightest bit of randomness in the Universe, then it cannot be done because a Turing Machine cannot generate true randomness.

Either way it makes absolutely no difference to the biology involved, and is a matter best left to philosophers and theoretical physicists.

You've chosen the wrong analogy, last but not least because we have very good reason to assume that the property data of the human phenotype is compressible.
Really? I've done a wee bit of biology once on a blue monday, and I figured the existence of two headed purple and green striped cows on Pluto is the more likely of the two. Please tell us what "very good reason" we have to assume that the property data of the human phenotype is compressible, and further explain why anyone should care whether or not it is.
 
Sure, whatever. Let me reformulate a bit:

Earthborn said:
And in humans there are only about 20 000 of them, humans have more than 20 000 properties and we know that the information about those properties is never compressed so the genes cannot be an intricate description of a human.

I can only hope that it is clear to you that you're out on your way constructing a tautology here. I comes down to:

We know that the generation of the phenotype does not exclusively depend on the genotype...

therefore

We know that the generation of the phenotype does not exclusively depend on the genotype...

I thought I was abundantly clear on what I said, but I'll reformulate it once again:

If you know that a string has been generated by a presumably random process, you know that it is irrelevant whether or not it is compressible i.e. complex.

This is not what you originally said. Originally you posted a syllogism arguing that from the mere fact that - in your words - "humans have more than 20.000 properties" supposedly follows that these cannot possibly be encoded in 20.000 genes.

For reference and convenience, I shall once more cite your original statement:

Earthborn said:
And in humans there are only about 20 000 of them, humans have more than 20 000 properties so the genes cannot be an intricate description of a human.

Could you possibly agree, that, to the feeble minded at least, this could come across as an argument from complexity and that rephrasing it would inevitably make it tautological?

After all, we're on the same side of the fence, but this was something I thought I couldn't let you get away with ;-)
 
How is that not argument from complexity?
"Humans have more than 20 000 properties" is not a statement about the complexity of a data set. And, no matter how emphatically you may insist that viewing it as such is the only possibility, it is not an application of information theory.

if information theory was not the scientific framework the above assertion was based upon, which was, then?
I would not regard it as derived by application of any rigid scientific framework at all, but by simple observation. As such, it relies heavily on what is intuitively obvious, and is therefore quite possibly flawed. Either way, Kolmogorov Complexity has damn-all to do with it.

Never have I brought up anything even remotely resembling the concept of "meaning".
I noticed that, and find it significant. I'm arguing that in biological organisms, "genotype" is analogous with "information", and "phenotype" (or "properties") is analogous with "meaning". From that perspective, any attempt to quantify "properties" in terms of "information" is doomed from the start. "Meaning" is not "expanded information". "Phenotype" is not "expanded genotype". That you jumped in immediately with Kolmogorov Complexity evidences a lack of this insight, and your continued focus on complexity and compressibility despite efforts by several posters to disabuse you of the notion that it is relevant suggests that your flawed assumptions are both deeply embeded and completely transparent to you. Happens to the best of us. Pointing that sort of thing out is what we do here, and if you insist on calling it a form of mind reading, then so be it.

I would also like to see you expand on this: "we have very good reason to assume that the property data of the human phenotype is compressible" using words you would have chosen if not accomodating an opponent.
 
For reference and convenience, I shall once more cite your original statement:
I know what I said. Quoting the same thing a million times doesn't make you appear more convincing.

Could you possibly agree, that, to the feeble minded at least, this could come across as an argument from complexity
Well, yes. You have effectively convinced me that that's what it appears to the feeble minded. :oldroll:

and that rephrasing it would inevitably make it tautological?
Not at all.

After all, we're on the same side of the fence
It doesn't look like it to me. In fact I agree more with the OP than with you. I think wogoga has effectively argued that much of the information about the phenotype has not been encoded in the genome and its origin must be searched elsewhere. Wogoga has proposed 3 possibilities where it might come from (none of which I can definitely rule out, unlike the "all the information is in the genome, but it is compressed" hypothesis), but has also implicitely acknowledged that there may be others, of which I consider the environment as likely the most influential.

The only thing about which I disagree with wogoga is that the hypothesis that the "all the information (compressed or not) is in the genome" hypothesis is typical of "neo-Darwinism". Not even the champions of the Gene Centered View of Evolution -- a view I think is extremely limiting -- seem to believe in it.

Originally you posted a syllogism arguing that from the mere fact that - in your words - "humans have more than 20.000 properties" supposedly follows that these cannot possibly be encoded in 20.000 genes.
Note that I said "genes". Whether all human properties can be encoded in the same number of bits as those 20 000 genes is therefore irrelevant. What matters is whether it can be encoded in that many genes. And it can't.
 
The open/closed-system confusion is pointless in this context.)

Confusion is pointless in almost any context, surely?

The system closes for an organism when it dies. After which its entropy increases pretty damn' fast, mostly due to other living organisms scavenging it for the energy and nutrients they need to ingest. Living organism : open system. Dead organism : closed system. Very simple and unconfusing.



Panpsychism takes the fact seriously that enzymes do not conform to the laws of thermodynamics and Brownian motion ...

They do so conform. Their actions are generally powered by the positive entropy of ATP-ADP (or ADP-AMP) degradation. Some enzymes simply accelerate a natural (positive entropy) degradation. Whatever, the outcome of the action is more entropy.

ATP is created from the positive entropy of sugar degeneration. Sugars are created by photosynthesis. At the centre of things the Sun liberates positive entropy at a prodigious rate.

I'm painting a simple (but, I hope, essentially accurate) picture, in an effort to avoid confusion.
 
It can't. The phenotype is continous, while any finite set of finitely-describable properties is of course discrete.

That alone makes it relatively easy to show that this line of inquiry is not likely to produce a fruitful result.

I've been lurking for years and thus, I certainly know what a "CFLarsen" is. I hereby pull one:

Got any evidence?

If you do find any evidence of the phenotype being continuous I'd suggest you publish it quickly so nobody else beats you to that Nobel prize. After all, quantum mechanics is consistent with reality as a whole being discrete and so any evidence of a piece of reality being truly continuous would refute (or at least significantly revise) quantum mechanics.
 

Back
Top Bottom