IUBio

Program for testing incongruences

Andrew J. Roger aroger at ac.dal.ca
Mon Apr 29 05:31:38 EST 1996


Thomas K. Dibenedetto wrote:
> 
> Guy Hoelzer (hoelzer at unr.edu) wrote:
> : > tdib said:
> : > the two phylogenies are the same or they are not. Why would you need to
> : > prove anything in order to combine the two data-sets?
> 
> : The means or variances of any two data sets can
> : differ due to sampling error or because they actually represent different
> : populations (i.e., different phylogenies).  Phylogenetic data are no
> : different.  They, too, are samples of taxa, of the extant variation within
> : those taxa, of characters, etc.  Furthermore, the extant taxa and variation
> : is a sample of historic taxa and variation.  Therefore, when you have two
> : data sets, even when they contain samples from identical sets of taxa, they
> : may differ either due to sampling error or because they are actually
> : samples of different phylogenies (e.g., sequences from two different genes
> : in the same set of individuals may have different phylogenetic histories
> : (see Pamilo & Nei 1988)).
> 
> So far I'm with you.
> 
> : Therefore, it is important to know whether the
> : data sets really contain conflicting phylogenetic signals prior to
> : combining the data.
> 
> This is where I start wondering.... Why the "therefore"? What difference
> does it make WHY the phylogenies are different at this point in the analysis?
> 
> : It has been argued that combining data sets with conflicting phylogenies,
> : caused by the use of different characters, is still a useful way to get at
> : the phylogenetic relationships of the whole taxa, rather than just of the
> : set of characters in a particular data set.  The basic idea is that the
> : areas of conflict will become noise and the remaining phylogenetic signal
> : will better represent the organismal relationships.
> 
> I would agree with this. It seems to be one of the more basic concepts in
> systematics.
> 
> : Others argue that it is inappropriate to combine statistically
> : different data sets.
> 
> Why?

Have you heard of lateral gene transfer? In case you haven't, it is
a phrase which applies to cases where an organism picks up a gene from
an organism from a completely different phylogenetic lineage. Now if
you were trying to work out the phylogenetic position of an organism
and you used all of the genes available from this organism and related
ones, including a gene which this organism picked up from another lineage
(which the other taxa in the dataset did not), you would be including
a datum which will be positively misleading about the phylogenetic relationships
of this organism (if you were interested in the relationships which
are the result of vertical descent rather than lateral transfer of one
of its genes).

The need to keep data like these separate seems so obvious to me
that I cannot conceive why you might object to it. Can you enlighten me?

> 
> : In this
> : case, one should examine the set of distinct trees available for the taxa
> : under study.  The differences among them might be informative
> 
> of what?

Lateral gene transfer.

> 
> : and the similarities are likely to indicate real patterns in the
> : history of the whole taxa.
> 
> The similarities would certainly show up in a combined analysis, no?
> 
> : BTW, subsets of what is collected as a single
> : data set can also contain significantly distinct phylogenetic signals;
> : so, the question of combining data sets is identical to the question of
> : searching for conflicting signals within any one data set.
> 
> But would those who advocate _not_ combining data-sets actually break up a
> single data set simply because partitions of it may support different
> topologies? If not, then why would it be justified to "combine"
> conflicting data in an originally singular data-set, but not to combine
> two conflicting data-sets? If they do break up originally singular
> data-sets, where does it stop? What stops the deconstruction before it
> reaches single contradictory characters?

If we are dealing at the level of molecular data then one must try to
think of what patterns of conflicting data will be indicative of true
chimaerism as opposed to "noisy" data. For instance if a protein were
chimaeric you might expect that a blocks of consecutive codons would
be units which are transferable between genes. So if you were to find
that a particular block of consecutive characters in your matrix were
giving similar significantly different phylogenetic signals to
the rest of the protein, then you would have evidence for chimaerism
at this level. What you do not expect from gene chimaeras is to
find conflicting characters randomly distributed across the dataset.

Thus it is clear in principle and in practice in this case how you
would destinguish between noisy data and real gene chimaerism.

> 
> : To state my answer to the question posed above (Why would you need to prove
> : anything in order to combine the two data-sets?) more directly; you don't
> : need to prove anything, but you might be missing out on interesting and
> : important information.
> 
> I dont see the relevance of knowing  whether the conflicts are caused by
> sampling error or by different character trees until you arrive at a final
> assessment of the overall phylogeny, something I have a hard time
> imagining emerging from anything but a total evidence analysis of all the
> relevant taxa (IOW, just go ahead and combine them, add in all other
> evidence, and _then_ see how things can be interpreted).
> --

This would be fine if you could be certain that a significant portion
of your dataset were derived from vertical descent only. But it is
possible that your total dataset may contain a large portion of
data which is postively conflicting. The result may be a phylogeny
which makes no sense at all because it is derived from multiple
datasets which had different histories. 

The success of total evidence depends on the anomalous phylogenies
being in the minority...and that the true phylogeny will "come out
in the wash".  I wouldn't be comfortable with this assumption.

Cheers
Andrew J. Roger




More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net