phylip package question

joe at removethispart.gs.washington.edu joe at removethispart.gs.washington.edu
Wed Oct 2 12:47:30 EST 2002

In article <and3qk$pm7$1 at mercury.hgmp.mrc.ac.uk>,
Chris Hoffman  <choffman at lucas.cis.temple.edu> wrote:
>I have a question regarding the DNADIST program from Phylip Package.
>I run SEQBOOT with my seqs and get my new data sets produced using
>bootstrap and so far so good. but when i use these new data sets to run
>DNADIST, the program can't run it because it finds one or more sequences
>that are supposedly too different to allow the computation to proceed.
>I tried all the methods available in the program and all give similar
>btw:  I haven't found any similar msgs running DNAPARS or DNAML

This occurs for a reason inherent to bootstrapping.  When you have
sequences, some of which are fairly distant, and bootstrap, two
sequences can become so far apart that their distance would be infinite.

For example, when you use a Jukes-Cantor distance, any two sequences
that are more then 75% different will have an infinite distance.  Thus
when your original sequences are (say) 70% different, bootstrapping
can occasionally make those sequences 76% different.

What should the distance program do in such a case?  I chose in PHYLIP
to make it complain and stop.  Other peoples' programs sometimes are
set to assign a large number (say 10) as the distance.  Both of these
policies have disadvantages.  One denies you the ability to use that
replicate, the other puts in somewhat fictional information.

Parsimony and likelihood don't have this problem, though likelihood
could put a species on the end of a very long branch.  In Dnaml, I
just have the branch length get fairly long, then at some point the
program has iterated it enough and leaves it at that length.

Joe Felsenstein         joe at removethispart.gs.washington.edu
 Department of Genome Sciences, University of Washington,
 Box 357730, Seattle, WA 98195-7730 USA

