In article <4gfh7q$jmc at mserv1.dl.ac.uk>,
Nicolas Chalwatzis, MIPS <CHALWATZIS at MIPS1.dnet.mips.biochem.mpg.de> wrote:
>- the program DNADIST of the PHYLIP package has the option to calculate
> sequence distances based on a gamma-distribution (Jin and Nei 1990).
> However, the program requires the "Coefficient of variation of substitution
> rate among sites (must be positive)" - What's the best way to evaluate
> this coefficient for a given dataset (sequence alignment) ?
Mostly that is up to the user, who is supposed to have some idea as to
how variable rates of evolution are from site to site. The Coefficient of
Variation is the ratio of the standard deviation to the mean, so if rates
varied by about a factor of two, for example, a good value would be
about 1/3 = (2.0-1.5)/1.5.
In likelihood methods (such as Ziheng Yang's methods that use gamma-distributed
rates in his PAML package, or my Hidden Markov Model approach in DNAML)
one can change the rate variation parameters until the overall likelihood of
the tree is maximized. With distance methods, I am not as sure how to do
this. Perhaps one could try various values until the goodness of fit of the
tree (say, the sum of squares) is optimized, but doing this may have biases in
it.
--
Joe Felsenstein joe at genetics.washington.edu (IP No. 128.95.12.41)
Dept. of Genetics, Univ. of Washington, Box 357360, Seattle, WA 98195-7360 USA