Reviews of Seq Analysis Progs (longish)

mangalam at SALK-SC2.SDSC.EDU mangalam at SALK-SC2.SDSC.EDU
Tue Sep 8 18:18:28 EST 1992

Harry Mangalam                                   Vox:(619) 453-4100, x250
Dept of Biocomputing                                   Fax:(619) 552-1546
The Salk Institute                             mangalam at salk-sc2.sdsc.edu
10010 N Torrey Pines Rd                        mangalam at salk-sgi.sdsc.edu
La Jolla CA 92037                                    mangalam at salk.bitnet
Greetings, Netlandos,

   In response to the recent spate of advisories, queries, and warnings
about sequence analysis programs (SAPs), I thought I'd muddy the metaphoric
waters further by throwing my own $.02 into the ring.

   I work in a mostly Mac environment hence the emphasis is on Mac
software.  A few PC packages are included mostly because they are useful
enough to warrant putting up with the hassle of transferring and modifying
files.  I have recently begun to work in an X window situation and
therefore a few X Window programs are also included.  Notably absent from
this review is Steve Smith's Genetic Data Environment (GDE), but I will try
to get to it Real Soon Now.  It is, for those who don't know, an
extensible, X window app for molecular biology whose server program
(client, in the bizzarro world of X) runs on a Sun SPARC hardware (and now
on SGI, thanks to Anthony Persechini).  But enough of GDE for now.

   If you're looking for a program that will do everything from restriction
maps to multiple alignments, will make only a polite dent in your wallet,
is well-debugged, takes advantage of the latest network services, co-exists
peacefully with the rest of your applications and is easy to use, you will
have to search beyond the narrow confines of this planet.  
   Many of the packages mentioned below are commercially produced and thus
have to make a profit. Because of the restricted market for sales (as
opposed to a general purpose package like a spreadsheet or word processor),
they must be relatively expensive and many address the all-too-common
practice of software piracy by requiring the presence of a hardware lock. 
Those that are freeware obviously cannot support the level of debugging and
support of a commercial program, however there are some that are of
surprisingly high quality.  

   The following notes are not meant as an exhaustive, objective review. 
They are my opinions (and a few measurements) based on having used the
programs or the demo versions and are obviously biased by my approach to
various problems - what I may dismiss without a thought may be a critical
determinant for someone else.  And certainly, don't take my word for it -
I've (fitfully) tried to include the email addresses of people who have
taken opposing views.  Also, while I have made a reasonable attempt to be
accurate, features, updates, and prices change so often in this field that
like many reviews, this one is sliding into obsolescence as you read it. 
All prices are approximate (and sometimes negotiable, especially at the end
of a fiscal period).
   Consider this a work in progress - as time allows, I plan to post
reviews of other packages not covered here and increase the detail of the
reviews, but for now a quick overview will have to do.  I invite comments,
corrections, and flames and will certainly post explanations, expansions,
apologies, and retractions if they are warranted.

Additional Sources of Information on SAPs:
  You can also search the archived biosci postings for additional
information by WAIS and gopher.  One gopher path is: 

Title: BioSci-Bionet-News.src
Host: fly.bio.indiana.edu
Path: waissrc:/Other-Bio-Gophers-Etc/Wide-Area-Info-Servers/
Type: Query

   There are additional reviews on sequence analysis software, notably by
Peter Markiewicz, available from the Bio-archives.  His review (titled
pm-macinmolbio.txt on the Indiana archives) has a very good introduction
and covers more ground than this review.  I highly recommend it.
   Dan Jacobson (danj at welchgate.welch.jhu.edu) recently posted a more
extensive review of public domain primer/oligo analysis programs, including
chunks from their documentation.  You can access this via the gopher
mentioned above.

   The views expressed below are my own. I have not received any
considerations, monetary or otherwise, from any of the entities mentioned
here.  I have acted as an uncompensated beta tester for the "Sequencher"
program, for a module of DNASTAR, and a friend (Lisa Caballero) wrote much
of the guts for IBI's AssemblyLign (a competitor to Sequencher, incidentally).

   In most cases of freeware, I have included the author's email address;
these should be used sparingly - you should first try the appropriate
archive, read the included documentation, and only as a last resort or to
report a bug, contact the author - let them keep working to keep bringing
us these programs.  
   At the end of the text is a table that compares a (very) few of the
execution times for some of the programs that do approximately the same

Opening Diatribe:
   Crippled demos are a lousy idea - DNASTAR, whatever you might think
about their corporate leadership or programs, has implemented the correct
introductory strategy - a 60 day free trial of the full,
everything-enabled, working program; after 60 days, the program
irreversibly suicides.  It is a rare demo that gives you a good feel for
the program when you can't save your work, or print, or import your own
data.  There are some exceptions (see the blurb for Gene Construction Kit
below), but in general, crippled demos are not worth the floppies they rode
in on.   Rather, see if you can get the company to give you a 30 or 60 day
trial period.

High Quality Freeware/Shareware:

The Don Gilbert Collection:
{I nominate Don Gilbert for the BioGNUdos Prize (apologies to Dan Jacobson
for the nested pun), awarded annually to the author of the most useful free
software for the Biological Sciences.} 
   Just about everything I have ever tried by Don Gilbert
(gilbertd at sunflower.bio.indiana.edu - also keeper of the Indiana Archives)
has been exceptionally useful. This includes:

- DottyPlot, a diagonal comparison program that plots identities or
similarities between 2 easily input sequences.  You can magnify the area of
interest and save the output for further evaluation or as a PICT file for
inclusion into a graphic.  DNA Strider 1.2 now provides a very similar
comparison, but to my surprise, Dottyplot is faster by quite a bit.  Using
proteins of 789 aas (M11969) and 1127 aas (M69238), I measured the
following times on a MacIIci:

			DottyPlot				DNA Strider 1.2
Window:15		7.5"					27"

Window:9		9.5"					32"

Window:7		13.5"				18"

- READSEQ, a sequence format converter that is unsurpassed in flexibility
and portability. 

- GopherApp, a Mac version of the U of Minn gopher protocol that allows you
to attach the computer resources of the Internet to your Mac as sort of an
extended hard disk.  Not surprisingly, DG's BioGopher hole at Indiana is
one of the best, with his gopher access to Genbank beating most of the CD
ROMS on our local network ("but", he whined, "when are you going to
implement Boolean searches?").  This is one of those programs that is so
useful that it is worth buying a Mac (and ethernet card) for.

   [Another aside - assuming you are within reach of an ethernet backbone,
the single most cost-effective piece of equipment you can buy for your Mac
or PC is an ethernet card.  For ~$200, you get (almost) instantaneous
access to terabytes of helpfully sorted information, reasonably supported
software, BBSs, e-mail, etc.]

- loopDloop, a visual RNA secondary structure editor, sort of like Canvas
specifically for RNA. It takes the as input, the output from the Zuker RNA
folding programs and helps you turn them into quite pretty figures.

- SeqApp, an Internet aware, extensible, multiple sequence editor and
analysis package. This is what sequence analysis packages of the future
will look like if they want to sell.  It is still in 'alpha' testing, and
as such, is still rough around the edges, but it is definitely the shape of
things to come. From within this program, you can send and receive mail via
a POP mailer, send off sequences for FASTA, BLAST, GRAIL, and GeneID
searches, retrieve Genbank sequences, initiate gopher sessions, inter
convert sequence formats as well as a number of the usual sequence analysis
functions. And, if the function you want is not included, you can also add
your own. (clustalv, a multiple sequence alignment program is included as
an example).   As well, there is an almost-hypertextual help system and, possibly the
most responsible gripe reporter available - instant mail to the author from
within the program.
   A warning - because of it's neonatal state, it's not yet ready for those
who need to be spoon fed, as DG says himself - "expect it to fail in many
ways."  However, if you are reasonably Ma

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net