David H. A. Fitch fitch at ACF2.NYU.EDU
Tue Jun 4 17:18:28 EST 1996

Oliver Hobert writes...

>Hey -
>I'm doing multiple sequence alignments with families of sequences from the
>worm genome project + cross-species homologs. Using GCG I get nice
>alignment but the phylogenetic trees I get with the "distances" and
>"growtree" command give very unreliable results (e.g. cross-species
>homologues appear in completely divergent branches). Does anybody know how
>to either get rid of this problem and/or whether there are any good other
>programs to have phylogenetic trees constructed ?
>Thanks a lot for a reponse,


There could be a whole range of "problems".

First and foremost is the probability that you are reconstructing gene
trees as opposed to species trees.  That is, you may be reconstructing the
divergences of genes after they arose from an ancient duplication--such
duplications could have occurred long before the species diverged.  It is
often difficult, just by sampling genes from gene banks, to resolve the
differences between paralogous and orthologous comparisons--you want
strictly orthologous comparisons if your objective is to reconstruct
species phylogenies.  (By the way, what is your objective, anyway?)

Second, you are probably using distances without correcting for
superimposed substitutions.  If so, this will tend to greatly underestimate
the evolutionary divergences.  If the divergences are great, you will end
up with groupings that don't mean anything because there will be too much
evolutionary "noise" (homoplasy--i.e., convergences, parallelisms or

So, assuming your objective is to reconstruct species phylogenies, I would
concentrate on just those genes that accumulate changes at a slow rate (or
on protein sequences that accumulate amino acid replacements at a fairly
slow rate), making sure that the comparisons could be shown to be
orthologous.  Then, I would analyze the data cladistically as well as
phenetically (after appropriate correction for superimpositions), obtaining
good statistical measures for the amount of support for particular clades
(e.g., by bootstrapping or statistical tree comparisons).

A good place to begin learning about phylogenetic reconstruction is a book
by David Hillis and Craig Moritz called "Molecular Systematics" (pub.
Sinauer--a 2nd edition has just come out with the same title).  There are
also several good web sites (e.g.,

Good luck!

