In response to your letter received Tue, Oct 5, 1993 at 4:17 PM
there are plenty of ibm based programs. in fact i believe that paup was
originally written for the ibm. at any rate if you have gopher you can
access
the software library at iubio in indiana. they have quite a bit of
such
software.
larry goldstein
* * * * *
Message-Id: <9310052220.AA29114 at net.bio.net>
To: info-gcg at net.bio.net
From: jenkins at uk.ac.nibsc.aidsun
Subject: Phylogeny Software
Date: 5 Oct 93 08:46:07 GMT
Sender: list-admin at daresbury.ac.uk
Precedence: first-class
Original-To: info-gcg at uk.ac.daresbury
----- Begin Included Message -----
I've had some indirect experience with the PAUP program
(version 3.1) in making parsimony trees (viral).
I believe they are all Mac-based. I would like to try
to obtain a similar program for IBM-PC/ MS-DOS.
I would even be willing to pay for it ! :-(
Is there such a thing, or will I have to get a Mac
just for this one application?
Thanks in advance.
Brian
----- End Included Message -----
There is a summary of phylogeny software written by Joe Felsenstein. Here it
is.
SOME AVAILABLE PHYLOGENY PROGRAMS
The following material (except item 0) is from the PHYLIP version
3.5
documentation. I post it because it may be a useful compilation. Here
are
some of the phylogeny packages that I know about. Some of them are
available
over Internet from ftp server machines. If you are on Internet you
should
familiarize yourself with ftp and with them (see entries 6 and 7 below for
more
information).
Table of Contents
0. PHYLIP 9. VOSTORG 16. COMPROB
1. PAUP 10. MEGA 17. MARKOV
2. BIOSYS-1 11. Evomony 18. PHYSYS
3. MacClade 12. COMPONENT 19. SINCAIDEN
4. Hennig86 13. Turbotree 20. MUST
5. ClaDOS 14. Molevol 21. GDE
6. TreeAlign 15. CLINCH 22. TreeTool
7. Clustal
0. PHYLIP is a free package of programs for inferring
phylogenies,
including programs to carry out parsimony, compatibility, distance
matrix,
invariants ("evolutionary parsimony") and likelihood methods on a variety
of
different kinds of data. It is available in the recently-released
versions
3.5c and 3.5p as C or Pascal source code and documentation, and in four
forms
of executables: (i) for 386 and 486 systems under PCDOS, (ii) for 386 and
486
systems under Windows, (iii) for non-386 and non-486 PCDOS systems, and
(iv)
for Macintosh systems. The C source code will also compile easily on
most
workstations and mainframes that have a C compiler. PHYLIP has
been
distributed by me since 1980, with over 2000 registered installations.
New
features include programs to compute protein sequence distances,
to
interactively modify a phylogeny, and to compute likelihoods in
coalescent
models from samples of genealogies. Most programs in the C version no
longer
have arbitrary limits on the numbers of sites of or species. Many other
new
features have been added as well, such as new models for variation
of
evolutionary rates among sites in the DNA likelihood programs.
PHYLIP is available by anonymous ftp
from
evolution.genetics.washington.edu (IP number 128.95.12.41) in
directory
pub/phylip. Users who cannot get it this way can also send enough
formatted
diskettes, which will be returned with the particular form of the package
and
its documentation written on them. Contact me (preferably by electronic
mail)
for details of the diskette distribution or further information about anonymous
ftp distribution. The latest version of PHYLIP is version 3.51 which
fixes
some bugs present in 3.5.
1. David Swofford of the Laboratory of Molecular Systematics,
National
Museum of Natural History, Smithsonian Instition, Washington, D.C. has
written
PAUP (Phylogenetic Analysis Using Parsimony). It can be ordered from
the
Center for Biodiversity, Illinois Natural History Survey, 607 East
Peabody
Drive, Champaign, Illinois 61820, U.S.A.
Since December, 1985, Swofford has been distributing a
precompiled
executable object-code versions of PAUP for the IBM PC and other MSDOS
systems.
As of this writing (February, 1993) he has released version 3 (PAUP/Mac)
for
the Macintosh, and later hopes to release version 3 for PCDOS systems
and
ultimately for mainframes. The cost was $50, which will increase to $100
soon.
Orders received for the Mac version will be filled but the final
printed
documentation will arrive later, as it is not completed yet.
PAUP 3.0 is probably the most sophisticated parsimony program. It
allows
multistate characters, user-defined weights on individual state
transitions,
Wagner, Camin-Sokal and Dollo parsimony methods, bootstrap
confidence
intervals, and finding all most parsimonious trees by branch-and-bound.
It
also has provision for computing Lake's linear phylogenetic invariants.
PAUP
is (a great) many times faster than the parsimony programs in PHYLIP.
2. Swofford also distributes an older package of programs,
BIOSYS-1,
including some phylogeny estimation programs, for use with gene frequency
data,
with particular attention to distance methods. BIOSYS-1 is distributed on
an
IBM PC-formatted floppy disk. Included are precompiled versions for the IBM
PC
and source code for uploading to IBM, VAX/VMS, Unix, Prime and CDC
mainframes
and minicomputers. The price is $25.00, from the same address as
PAUP.
BIOSYS-2 is under development, but it is too early to anticipate a
completion
date.
3. If you have a Macintosh computer and any interest in
discrete-state
parsimony methods (including DNA and protein parsimony), you should
definitely
get MacClade. It was written by Wayne Maddison and David Maddison of
the
University of Arizona. All distribution is by Sinauer Associates,
Sunderland
Massachusetts 01375, USA. Their phone number is: (413) 665 3722, FAX:
(413)
665 7292. A disk with program, help file, and example data files, plus
book
(which has about 100 pages of intro to phylogenetic theory, and 250 pages
of
program instructions), is $75 U.S. ($40 for the book alone). Site
licenses
also available. An earlier and less capable Version 2 (which for
example
cannot read nucleic acid sequences and has fewer features for
discrete
characters) is also available by anonymous ftp from the EMBL, Indiana
and
Houston molecular biology software servers. Their addresses are given
below
under the descriptions of TreeAlign and ClustalV. MacClade 2.1 will be
found
among their Mac software, as a squeezed and then binhexed file.
MacClade enables you to use the mouse-window interface to specify
and
rearrange phylogenies by hand, and watch the number of character steps and
the
distribution of states of a given character on the tree change as you do
so.
MacClade is positively addictive and will give you a much better feel for
the
tree and your data. It's the closest thing to a phylogeny video game that
I
have seen. It has been influential in spurring the inclusion of
interaction
and graphics into other phylogeny programs. (I have tried to supply
this
functionality in PHYLIP by incorporating the programs MOVE, DOLMOVE,
and
DNAMOVE, which act somewhat like MacClade). MacClade does not have
a
sophisticated search algorithm to find best trees: it largely relies on you
to
do it by hand (which is surprisingly effective), with only a
local
rearrangement algorithm available to improve on that tree.
4. J. S. Farris has produced Hennig86, a fast parsimony program
including
branch-and-bound search for most parsimonious trees and interactive
tree
rearrangement. Although complete benchmarks have not been published it is
said
to be faster than Swofford's PAUP; both are a great many times faster than
the
parsimony programs in PHYLIP. The program is distributed in executable
object
code only and costs $50, plus $5 mailing costs ($10 outside of of the
U.S.).
The user's name should be stated, as copies are personalized as a
copy-
protection measure. It is distributed by Arnold Kluge, Amphibians
and
Reptiles, Museum of Zoology, University of Michigan, Ann Arbor,
Michigan
48109-1079, U.S.A. It runs on PC-compatible microcomputers with at least
512K
of RAM and needs no math coprocessor or graphics monitor. It can handle up
to
180 taxa and 999 characters. An 80386 version, Hennig386, is currently
being
tested but no release date has yet been announced.
5. ClaDOS, an interactive program which allows rearrangement of trees
and
their evaluation, mapping of characters into them, and more, is available
for
PCDOS systems from Kevin Nixon, L. H. Bailey Hortorium, Cornell University,
467
Mann Library, Ithaca, New York 14853. I have been unable to get
information
on its cost or method of distribution.
6. Jotun Hein, (Institute of Genetics and Ecology, University of
Aarhus,
8000 Aarhus C, Denmark) has produced TreeAlign, a multiple sequence
alignment
program that builds trees as it aligns DNA or protein sequences. It uses
a
combination of distance matrix and approximate parsimony methods.
TreeAlign
uses too much memory for it to run on PC's (DOS or Mac systems) but is
really
designed for a workstation or mainframe. It is available by anonymous ftp
at
the Indiana, Houston, and EMBL molecular biology software distribution
sites.
Their network addresses are respectively:
ftp.bio.indiana.edu,
ftp.bchs.uh.edu, and ftp.embl-heidelberg.de. In the Indiana archive one
must
enter directory molbio/align, in the Houston archive it is in
directory
pub/gene-server in the directories unix and vms, and on the EMBL archive it
is
in pub/software/unix and pub/software/vax. If you are on Internet and
use
molecular data it is important that you learn to use anonymous ftp and
become
familiar with these ftp servers.
7. Another multisequence alignment program that estimates trees as
it
aligns multiple sequences is ClustalV. An older version in PCDOS
executable
form was distributed previously (see below for information on how to
get
executables for PC or Mac for the current version). Currently it
is
distributed as C source code by its author, Desmond Higgins. Clustal
was
originally developed at Trinity College, Dublin, Ireland, but version V
was
done at Higgin's current address, the European Molecular Biology
Laboratory,
Heidelberg, Germany. Clustal V successfully compiles and runs on VAX/VMS
C,
Apple Macintosh Think C, MSDOS Turbo C, Decstation ULTRIX C,and
Sun
workstations with GNU C. It is a complete rewrite and upgrade of the
Clustal
package which was described by Higgins and Sharp (1989).
New features include the ability to detect read different input
formats
(NBRF/ PIR, Fasta, EMBL/Swissprot); align old alignments; produce
phylogenetic
trees after alignment (Neighbor Joining trees with a bootstrap option);
write
different alignment formats (Clustal, NBRF/PIR, GCG, PHYLIP); full command
line
interface.
The program is available by anonymous ftp at the Indiana, Houston,
and
EMBL molecular biology distribution sites. Their network addresses
are
respectively: ftp.bio.indiana.edu, ftp.bchs.uh.edu, and ftp.embl-heidelberg.de.
In the Indiana archive one must enter directory molbio/align, in the
Houston
archive it is in directory pub/gene-server in all of the four directories
dos,
Mac, unix, and vms, and on the EMBL archive it is in pub/software/unix
or
pub/software/vax. If you are on Internet and use molecular data it
is
important that you learn to use anonymous ftp and become familiar with one
or
more of these ftp servers.
If you do not have any access to Internet, you could alternatively
start
by sending e-mail to Des Higgins at:
higgins at EMBL-Heidelberg.DE (Internet)
If you do not have access to e-mail, send a formatted PC or MAC
diskette
(PLEASE state which) to:
Des Higgins
European Molecular Biology Laboratory
Postfach 10.2209
Meyerhofstrasse 1
6900 Heidelberg
Germany
He will return the diskette with the source code and documentation. He
can
also include an executable image for PC's or MAC.
8. Gary Olsen, of the Department of Microbiology, University of
Illinois,
has developed a speeded-up version of my program DNAML coded in C, which
he
calls "fastDNAml". It achieves a number of economies and also is organized
so
that it can be run on parallel processors -- he and his co-workers
have
constructed trees of very large size on a high-speed parallel processor.
The
program can be compiled using the "p4" portable parallel processing
toolkit.
It can also be run in ordinary serial mode on workstations where it is
fatser
than DNAML. The C program is available by anonymous ftp from the
Ribosomal
Database Project at info.mcs.anl.gov in directory pub/RDP/programs/fastDNAml.
9. Andrey A. Zharkikh, Andrey Rzhetsky, and their co-workers in
the
Institute of Cytology and Genetics, Siberian Branch of the Russian Academy
of
Sciences, Novosibirsk, Russia, Ex-USSR, have produced VOSTORG, a package
of
programs for alignment (both manual and automatic) and inferring phylogenies
by
distance methods and parsimony for molecular sequences. It runs on IBM
PC-
compatibles and includes some rather fancy graphics. The authors are
currently
in the U.S., not in Siberia, and their program is sold for about $250 by
Exeter
Software, 100 North Country Road, Setauket, NY 11733, USA. Their
telephone
number is 1-800-842-5892; Fax (516)751-3435. The programs are described in
a
paper by Zharkikh et. al. (1991).
10. MEGA (Molecular Evolutionary Genetic Analysis) is due to be
released
at the beginning of 1993 by Sudhir Kumar, Koichiro Tamura, and Masatoshi Nei
of
the Institute of Molecular Evolutionary Genetics, 328 Mueller Lab,
Pennsylvania
State University, University Park, Pennsylvania 16802, U.S.A. It will be
an
executable program for PCDOS machines, and will be menu-driven with
context-
sensitive help. It will analyze data from DNA, RNA and protein sequences,
and
distance matrices produced from other kinds of data as well. It will
include
the Neighbor-Joining method distance matrix method, a branch and
bound
parsimony method, and bootstrapping. It will also plot trees on many kinds
of
printers. The program will be provided free of charge if you send one 1.2
Mb
5.25-inch or 1.44 Mb 3.5-inch floppy diskette, and will be sent as soon as
it
is available. Inquiries can also be made by mail to M. Nei at the
above
address or by electronic mail to nxm2 at psuvm (Bitnet) or
nxm2 at psuvm.psu.edu
(Internet).
11. James Lake will soon distribute "Evomony", a program for using
the
"evolutionary parsimony" (invariants) method for inferring phylogenies from
DNA
or RNA sequences. It runs on 286 and 386 PCDOS systems with at least
500k
bytes of memory. Lake intends to distribute a PCDOS version by April 1,
1993
(his choice of date, not mine!), with a Macintosh version to follow in
1994.
Both will be distributed free to scientists in this field. Exact
procedures
for ordering Evomony have not yet been announced. Lake's address is
Department
of Biology, University of California, Los Angeles, California 90024.
12. Rod Page has written COMPONENT, a program for PCDOS systems
for
comparing cladograms for use in phylogeny and biogeography studies. It has
far
more features for biogeographic studies (such as comparing species and
area
cladograms) than any other package. It runs on PCDOS 286 or 386 systems
under
Windows 3.0 or higher. It will be released in the very near future. Its
cost
will be "in the $50-$75 range", and it can be ordered from Rod Page at
the
Department of Botany, Natural History Museum, Cromwell Road, London SW7
5BD,
U.K. His phone and fax numbers are respectively (071)-938 9068 and 9260,
and
his e-mail address is R.Page at natural-history-museum.imperial.ac.uk
or
rdp at nhm.ic.ac.uk.
13. David Penny (Department of Botany and Zoology, Massey
University,
Palmerston North, New Zealand) has been offering for free distribution
several
PCDOS programs, one a fast parsimony program, TurboTree. There are also
two
others, Hadtree which computes expected frequencies of all
possible
distributions of nucleotides among species, and Great Deluge, an
approximate
search for the most parsimonious tree by a quasi-random method. He tells
me
that funding exigiencies are such that he may soon have to start charging
for
these. His electronic mail address is dpenny at massey.ac.nz.
14. Walter Fitch (Department of Ecology and Evolutionary
Biology,
University of California, Irvine, California 92717, U.S.A.) has a
package
"Molevol" available free (on receipt of an appropriate number of
PCDOS
formatted floppy disks) with about 20 FORTRAN programs for not only
estimating
trees by parsimony and distance methods but doing various other
manipulations
of data that might be needed such as format interconversions and searching
for
homology and secondary structure. They are available as FORTRAN source
and/or
as PCDOS executables. The FORTRAN programs will also run on Sun
workstations
(and probably others too, I would suspect). His electronic mail address
is
wfitch at daedalus.bio.uci.edu.
15. Kent Fiala, now of SAS Institute, has written a compatibility
(clique)
program, based on an earlier program written by Kent and George
Estabrook.
Christopher Meacham has put the latest version of CLINCH (6.2), with
Kent's
permission, as a self-extracting DOS archive on Jim Beach's TAXACOM
fileserver,
huh.harvard.edu, for anonymous FTP. The self-extracting archive
is
"CLINCH62.EXE" in directory /pub/software/clinch. This should be FTPed as
a
binary file. CLINCH62.EXE is about 150 kb. When you run it, it will expand
to
14 files requiring about 280 kb. The executable program is
CLINCH.EXE.
Readme, documentation, sample input and output, and FORTRAN source code
are
included. PC-CLINCH is probably the most sophisticated compatibility
analysis
program. The Taxacom server, by the way, also has other material related
to
botanical systematics, including flora information.
16. Christopher Meacham (Department of Integrative Biology, University
of
California, Berkeley, California 94720, U.S.A.) produces COMPROB, a
Pascal
program to compute probabilities that characters would be compatible at
random,
thus telling us which clique is "most surprising". It is available
for
anonymous ftp as a PCDOS executable from the Taxacom server
(huh.harvard.edu)
in directory pub/mip.
17. The program MARKOV computes a distance measure between pairs
of
nucleotide sequences. It also constructs phylogenies from these and
summarizes
the 4x4 substitution matrices between the pairs of species. It uses a
more
general model of substitution than used in PHYLIP, the Stationary Markov
Model
described in the paper by Saccone et. al. in Methods in Enzymology volume
183,
pages 570-583, 1990. Bootstrapping is used to analyze the statistical error
of
the results. Output files from CLUSTAL and PILEUP, as well as some
other
formats, can be used for input, and analysis can be confined to certain
codon
positions in coding sequences. The program is written in FORTRAN and runs
on
VMS systems. It was produced by Dr. Graziano Pesole and Professor
Cecilia
Saccone at the University of Bari, Italy, and is available (for free?) from
Dr.
Cecilia Lanave at CSMME-CNR, Dipartimento di Biochimica e Biologia
Molecolare,
Universita` di Bari, via Orabona 4, 70126 Bari, Italy. Her phone number
is
39-80-243305, her fax number is 39-80-243317, and her e-mail address
is
lanave at vaxba0.ba.it or mvx36 at ibacsata.it
18. J. S. Farris and Mary Mickevich earlier released a package
of
phylogeny programs, PHYSYS, which, at about $5,000, was extremely expensive
(in
my opinion, which is certainly a biased one). I am not sure whether,
from
whom, or under what conditions it is still available.
19. Fujitsu Ltd. ("a $21 billion global leader in advanced
computer,
telecommunications, and electronic devices") sells for $28,000 US a Fujitsu
S
family workstation complete with a program, SINCAIDEN, which
allows
"experimental researchers, even those unfamiliar with such analyses,
[to]
easily create phylogenetic trees in their own laboratories." The program
also
allows searches of the major nucleic acid sequence and protein databases
(the
ad I saw does not make it clear whether these databases are provided with
the
workstation). The methods available are UPGMA, neighbor-joining,
Farris's
(Distance Wagner) and the modified Farris distance matrix methods.
The
workstation is SPARC compatible and runs SunOS. The SYNCAIDEN program
was
developed by the group at the National Institute of Genetics, Japan under
Dr.
Takashi Gojobori. Fujitsu Ltd. may be contacted at 21-8, Nishi-Shinbashi
3-
chome, Minato-ku, Tokyo 105, Japan (phone 81-3-3437-5111 ext. 2831, fax
81-3-
5472-4354), or in the U.S. at Fujitsu America Inc., 3055 Orchard Drive,
San
Jose, California 95134-2017 (phone 1-408-432-1300 ext. 5168, fax
1-408-434-
1045).
20. MUST, a package of sequence management programs, is distributed on
a
shareware basis by Herve Phillippe, Laboratoire de Biologie Cellulaire
(URA
CNRS 1134 D), Batiment 444, Universite de Paris-Sud, 91405 Orsay cedex,
France.
His e-mail address is: adoutte at frciti51 on Bitnet/EARN. His phone and
fax
numbers are respectively 33.1.69.41.64.81 and 33.1.69.41.21.30. MUST
is
available on a shareware basis ($100 registration fee if you do not
send
diskettes) and runs on PCDOS systems using PCDOS version 3 or later. It
is
intended as complementary to existing phylogeny and alignment programs and
can
produce output files in the formats of PHYLIP, PAUP, Hennig86, and CLUSTAL.
It
contains a variety of sequence input, editing, checking, and storage
functions,
as well as a sequence editor and a phylogeny plotter. It also allows
further
analyses of the results from these phylogeny programs.
21. Steve Smith, formerly of the Harvard Genome Laboratory, has
written
an X-Windows interactive sequence editor, GDE (Genetic Data Environment)
which
allows the user to edit sequences and align them by hand, and to select
subsets
of sites and sequences and call a variety of analysis proprams
including
ClustalV and many of the PHYLIP 3.4 programs. The GDE 2.0 system will run
on
many workstations that have the X windowing system. It also includes
the
TreeTool tree-plotting program (see below). GDE 2.0 is free and is
available
for anonymous ftp transfer at either at golgi.harvard.edu in
directory
pub/GDE2.0 and also at ftp.bio.indiana.edu in directory molbio/unix/GDE.
22. Mike Maciukenas, at the Department of Microbiology of the
University
of Illinois, has written a wonderful X-windows based interactive
tree-plotting
program called TreeTool. It takes as input a PHYLIP tree file, with
branch
lengths if they are provided, displays the tree in either rooted or
unrooted
form on any X-windows screen, and allows the user to modify the form of
the
tree and the placement of nodes and labels. When the tree is in final form
the
user can have it written to a Postscript file and/or printed to a
Postscript-
compatible printer. TreeTool is free as a C program for X windows and
is
available for anonymous ftp from ftp.bio.indiana.edu in
directory
molbio/unix/GDE. It is also included in the GDE 2.0 sequence
analysis
environment mentioned above.
-----
Joe Felsenstein, Dept. of Genetics, Univ. of Washington, Seattle, WA 98195
--> Internet: joe at genetics.washington.edu (IP No. 128.95.12.41)
Bitnet/EARN: felsenst at uwavm
------------------------- RFC822 Header Follows -------------------------
Received: by hgl-mail.harvard.edu with SMTP/TCP;5 Oct 93 20:17:36 U
Received: from DECNET-MAIL (SYSTEM at HUBIO2) by HUSC3.HARVARD.EDU (PMDF V4.2-13
#4724) id <01H3RLK2DCBKB8KM8G at HUSC3.HARVARD.EDU>; Tue, 5 Oct 1993 20:26:10
EDT
Date: Tue, 05 Oct 1993 20:26:08 -0400 (EDT)
From: "SMTP%\"BIOSCI-REQUEST at net.bio.net\""@HUBIO2.HARVARD.EDU
Subject: Phylogeny Software
To: Larry_Goldstein at hgl-mail.harvard.edu
Message-id: <01H3RLK2IFKYB8KM8G at HUSC3.HARVARD.EDU>
X-VMS-To: info-gcg at net.bio.net
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=US-ASCII
Content-transfer-encoding: 7BIT
End of returned message
------------------------- RFC822 Header Follows -------------------------
Received: by hgl-mail.harvard.edu with SMTP/TCP;6 Oct 93 09:19:22 U
Date: Wed, 6 Oct 1993 9:36:26 -0400 (EDT)
From: Postmaster at HUBIO2.HARVARD.EDU
Subject: Undeliverable Mail
To: <Larry_Goldstein at hgl-mail.harvard.edu>