IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Invariable sites question

Joe Felsenstein joe at evolution.genetics.washington.edu
Wed Nov 27 02:01:42 EST 1996


In article <Pine.A32.3.95.961126174949.20393A-100000 at wap18.zi.biologie.uni-muenchen.de>,
Korbinian Strimmer  <strimmer at zi.biologie.uni-muenchen.de> wrote:
>I haven't seen PAUP* so far (it's not out yet, is it?)  but removing sites
>'mathematically' should be simply incorporating f (or whatever other
>parameter) in the likelihood function so that all pairwise distances
>and all branch lengths on trees can account for invariable sites. 

As I have just said in a posting here, this can be done with the
Hidden Markov Model features of the current release of PHYLIP's DNAML.


>Whether one is using PAUP* or DNAML or whatever to calculate the ML
>distance

In PHYLIP you get distances with DNADIST, not with DNAML.  The model is
the same as for DNAML.


>I think one other question seems to be important:  How
>are the base frequencies estimated?  In theory, you have to differentiate
>between the base frequencies for the variable positions (= stationary
>frequencies of the underlying Markov model) and the bas frequencies
>of the invariable sites (= probability to see a given  pattern
>- say AAAAAAAA - at an invariable site).  It seems to me that now
>simply the avarage frequencies are used over the complete data set
>for both sort of frequencies though they have a completely different
>meaning (this is done in Hasegawa et al papers, in one of Adachi et al
>paper, in some Churchil et al papers ) etc.  SO, what really interests
>me, how is this accounted for in PAUP* (I think it is not considered
>in DNAML, is it?)

Not sure what it is that you say is not being considered.  DNAML takes
as the base frequencies (default ones -- the user can put in their own
values if they want, too) the average base frequencies over the sequences.
Of course this weights different sequences as if they were independent,
which they aren't.  Optimally one would instead estimate them by
maximum likelihood.  I think PAUP* will be able to to do that.  But the
results will, I think, rarely be noticeably better that way.

Or are you saying that invariable sites have different base composition
than variable sites?

-- 
Joe Felsenstein         joe at genetics.washington.edu     (IP No. 128.95.12.41)
 Dept. of Genetics, Univ. of Washington, Box 357360, Seattle, WA 98195-7360 USA



More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net