IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Some thoughts on what to do

Joe Felsenstein joe at GENETICS.WASHINGTON.EDU
Tue Feb 5 07:18:07 EST 1991

John Gillespie wrote (relative to what to do next in molecular phylogney:

> Here, here!! We have have known since '71 (Ohta and Kimura) that rates of
> substitution vary.  We also know the the frequency of the four nucleotides vary
> through time.  It is hard to imagine a characterization of the substitution
> process that is farther from those assumed by most tree-construction
> algorithms.

Well, I can imagine LOTS of models that are even farther!  Seriously, though,
(1) variation of rate of evolution with time (lack of clockness) is definitely
   allowed in most methods of inferring phylogenies (Distance methods, ML,
   parsimony, invariants/evolutionary-parsimony),
(2) variation of frequencies of nucleotides is not allowed in most programs but
 (a) if one is willing to accept the admittedly questionable independence
   of different sites, resampling methods such as the bootstrap allow one to
   investigate the empirical variability of inferences made with imperfect
 (b) check out Barry and Hartigan's 1987 paper in Statistical Science, which
   puts forward (among others) a model where the transition probability matrix
   varies arbitrarily from branch to branch and they can do maximum likelihood
   for it (in fact, it's easier than my ML).  This would allow varying
   base frequencies in different parts of the tree.
 (c) we've got to do _something_, so we do what we know how.  If John uses his
   considerable powers to formulate a model that is more realistic and
   continues to be computationally tractable, we will all be quite interested
   in it.  A better model would have some specification of the distribution of
   possible equilibrium base frequencies and how quickly they can change as
   one moves along the tree,
(3) Variation of nucleotide composition is real but I think a much more serious
   departure from reality in the models used for ML and distance methods is
   the equal rates of substitution at all sites.  I have some ways one can
   specify unequal rates in my current ML programs and am working on ways
   the method can infer them instead of you having to specify rates.

Joe Felsenstein, Dept. of Genetics, Univ. of Washington, Seattle, WA 98195
 Internet:         joe at genetics.washington.edu     (IP No.
 Bitnet/EARN:      felsenst at uwavm
 UUCP:             ... uw-beaver!evolution.genetics!joe

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net