In: <9206180454.AA08297 at genbank.bio.net> Jochen Kleinschmid
(kleinschmidt at mcclb0.med.nyu.edu) notes:
> Harald Steen from the University of Oslo asked about
> programs that build phylogenetic trees from DNA sequence
> data.
> Two good packages I know of that can construct
> phylogenetic trees from nucleotide sequence data, are
> PHYLIP and PAUP. There are other programs that can do it
> from protein data, e.g. ClustalV, TreeAlign, the
> dendrogram one gets from the PileUp program in the GCG
> package etc, etc. PHYLIP v.3.4 (Phylogeny Inference
It should be pointed out that the dendrogram from the PileUp
program of the GCG should *_not_* be used to obtain
phylogenetic trees; the following is an excerpt from the GCG
documentation for PileUp:
PILEUP
CONSIDERATIONS
Because a rigorous optimal alignment of even a small
number of short sequences would be intractable,
PILEUP uses an approach that may not produce the
most optimal multiple sequence alignment. (See
the ALGORITHM topic above for a description of this
approach.)
CLUSTERING
... The dendrogram is not a phylogenetic
reconstruction, although the vertical branch lengths
are proportional to the similarity between the
sequences. Its purpose is to represent the
clustering order used to create the final alignment.
This order is the only information from the dendrogram
used by PILEUP.
Since the method used in PileUp is a clustering method
(similar to UPGMA), it makes certain assumptions regarding
the way that similarity is distributed throughout the
dendrogram. Many of these assumptions rely on a strict
molecular clock (if you believe in that, fine; just so you
know). If your data depart from the assumptions of a
molecular clock, the dendrogram will be incorrect. Other
methods are available that do not rely so strongly on a
molecular clock (see below).
> Package) is a set of some 20-30 programs written over the
> years by Joe Felsenstein of the Univ. of Washington. You
> can get it by anonymous ftp from ftp.bio.indiana.edu
> (directory molbio/evolve; IP address 129.79.224.25) as
> well as from evolution.genetics.washington.edu. These
> programs were written in Pascal and come compiled in
> versions for VAX/VMS, Unix, MS-DOS etc. I have installed
> the package on three systems, VAX-4000, SGI Iris, and a
> 25 MHz 386 MS-DOS machine. The main program I've been
> using from this package (protpars = protein parsimony
> analysis) runs slow on the VAX and on the Iris and
I would very much prefer PROTPARS, for example, for
phylogenetic estimation from protein sequence data; it uses
an estimation of a minimum number of nucleic acid
substitutions from the amino acid data. Also the PAUP program
has a protein data option.
> ultraslow on the PC; in fact, I would say that the PC
> version cannot be used for anything more than half a
> dozen sequences of 300 AA's each. So these programs are
> rather slow, but they do a great deal of analysis and are
IMHO: The quality of results from phylogenetic analysis is
directly related to the time you invest: you spend months
of research time generating the data and you expect to get
the phylogenetic answer in an hour or so? Of course faster
computation speed is inately better 8-)= .
> well documented. There is probably nothing else
> available that is as full-featured and well-rounded as
> this package is. Felsenstein has also written a very
> good review article on this whole field
> (Annu.Rev.Genetics 22:521-565 (1988).
In my opinion, this and Swofford and Olsen (see below)
should be required reading for _anyone_ who would like to
publish phylogenetic analyses.
> I had also heard very good things about PAUP
> (Phylogenetic Analysis Using Parsimony), written by David
> Swofford of the Illinois Natural History Survey. So I
> ordered it (U.S.$ 50) and just received it last week
> (after a two week wait). The most up-to-date and more
> or less finished version available right now is for the
> Macintosh, even though the manual is still an incomplete
> draft. Swofford is also working on a version for the
> IBM-PC but that one isn't finished yet. There also seem
> to be command-line driven versions available for Unix
> systems and mainframes but those may be obsolete and
> unsupported by now. The point is, right now you can only
> get the Mac version.
> My impression, from going through the 180 page
> (incomplete) manual and seeing the program run on a
> colleague's Mac, is that PAUP is an excellent program,
> well worth the few bucks you spend on it. The manual
Some Japanese company (Hitachi?) is selling an integrated
software package for about $1K; in my opinion (again) PAUP
is well worth 4 or 5 times the $50 (and here I'm talking
about _my_ money, not my Uncle's). Also, for those of us in
the UNIX world, look at the GDE package, which includes the
PHYLIP package as well as a couple of other phylogenetic
methods.
> alone is a mini course in molecular phylogeny and on the
> different methods used for inferring phylogenetic
> relationships many of which are supported by the program.
A more complete primer in applying phylogenetic methods is
D.L. Swofford and G.J. Olsen, Phylogeny reconstruction, pp
411-501 in D.M. Hillis and C. Moritz (eds.) _Molecular_
_Systematics_ Sinauer, Sunderland, MA (as I said above,
"required reading").
> It seems that PAUP offers more algorithms for doing this
> kind of analysis than PHYLIP, and it is much faster and
Sorry: PHYLIP provides more algorithms (at least in the
technical usage of this word); PAUP (as implemented on the
Macintosh) is much easier to use. In addition, PAUP is more
useful for exploring your data.
> easier to use. I don't think it runs on any garden
> variety Mac; you have to have one of the more souped-up
> Macs (obviously I'm not a Mac user). The program is
> distributed on a single 3.5 in. floppy diskette.
PAUP _will_ run on _any_ garden variety Mac (you may need
to update your System to 6; it runs fine under Sytem 7). Of
course the beefier your Mac the quicker it will run. For
large data sets (>> the 6 x 300 mentioned above) you will
want memory enhancement.
> To get PAUP, you have to send a check drawn on a U.S.
> bank or an International Money Order for U.S.$ 50 (or
> for U.S.$ 60 for orders from abroad) made out to
> "University of Illinois/INHS" to Mary Lou Williamson,
> Illinois Natural History Survey, 607 East Peabody Drive,
> Champaign, Illinois 61820. The phone number is 217-333
> -6846. If you need more information, call that number and
> they will fax you an order form and a form letter
> explaining the current status of the program. Since
> you're in Norway, you can save yourself an overseas call;
> e-mail me a message giving a fax number, and I will fax
> you a copy of the form letter. When you order, they ask
> you to include a statement that you agree to receiving an
> incomplete draft version of the manual, with the
> understanding that the final manual will be shipped
> whenever it is completed.
> If you have any questions, send me a message. I will be
> away from 6-21 to 6-26 though.
> Jochen Kleinschmidt
> NYU Medical Center
> New York, NY 10016
Jerry Learn
-------------------------------------------------------------------------
| Dept. of Botany & Plant Sciences | LEARN at UCRVMS.BITNET |
| University of California | learn at moe.ucr.edu |
| Riverside, CA 92521 | (714) 787-3543 |
| USA | FAX: (714) 787-4437 |
-------------------------------------------------------------------------