Michael A. Lonetto lonetto at CGL.UCSF.EDU
Wed May 15 18:13:33 EST 1996

At 10:51 AM 5/15/96, Ignacio Marin wrote:
>I am looking for programs to perform multiple alignments of protein
>sequences plus build phylogenetic trees from these alignments in the
>I have found plenty of places where CLUSTALW can be used, but I have
>been unable to find any place to plot the trees from the outputs of the
>program (Is there any Phylip site??).



>Moreover, I would like to know your opinion about how good these types
>of programs are for building trees.

How good the trees are depend on how you run the programs and how good the
alignment is.  The real question is how much do you care about getting the
"right" tree, or knowing how good your best tree is?  The phylip
documentation contains a number of references as well as descriptions of a
number of caveats.  Of particular concern are:

1)  How are "missing" characters treated?  Is a 3 amino acid gap one event
or three?
2)  How sure are you of the gap placements?  If you're not very sure then
perhaps you should limit the analysis to segments of the alignment you have
more confidence in.  Note that clustal and other "pairwise" aligners do NOT
optimize the overall alignment, but usually get the more conserved segments
pretty close to right.  This point is especially important if you are
treating each gap as an idependent event (usually a mistake, but often the
easiest thing to do).
3)  How many trees are you willing to run?  For a small number of sequences
it is possible to examine every tree and thus you are guaranteed to find
the shortest one.  However, the number of possible trees is proportional to
the factorial of the number of species.  This number get BIG very fast.

Phylogeny programs generally solve this by means of heuristics, which
drastically cut the computation time by decreasing the number of trees
examined.  However, these methods do not guarantee that the shortest tree
will be found.  This in turn is treated by running the program multiple
times with different sequence input orders.

4)  After all this the shortest tree found can still be misleading.  It's a
good idea to test how robust the tree is by running a bootstrap analysis,
which drops a proportion of the characters at random from each run.  This
tells you which branches are solid and which could go either way.

5)  Finally, a tree is much more believable if it is independent of the
algorithm used to determine it.  For instance, if you get the same tree
using both parsimony and maximum likelihood you are  less likely to be
misled.  If the minimum trees from both methods differ, they will tend to
differ at points where the data does not distinguish well between
alternative phylogenies.  This is good to know.

This is probably more than you want to deal with, no?  Good luck,


Michael A. Lonetto   lonetto at cgl.ucsf.edu
UCSF Dept. of Stomatology, 513 Parnassus Ave
San Francisco, CA 94143-0512

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net