IUBio

Multiple alignment of protein sequences

John Edward Hill hill at mcclb0.med.nyu.edu
Thu Feb 6 08:37:59 EST 1992


In article <1992Feb5.202309.22818 at coe.montana.edu>,
umbjs at cs.montana.edu (Jean Starkey) writes:

> 
> 	I would like to know the programs that will
> allow me align primary sequences of polypeptides of
> about 400 amino acids long, to see if there any 
> significant homology.  
<deleted>

In addition to PILEUP in version 7 of GCG, as described by Reinhard
Doelz, and MACAW as described by Donald A. Lehn in another previous
message, CLUSTALV is available from the EMBL server and probably other
molecular biology software servers.  We have it running under VAX/VMS,
but I think it is also available for MS-DOS and/or UNIX.

PILEUP is based on a paper by Feng and Doolittle; CLUSTALV is based on
a paper by Higgins and Sharp describing CLUSTAL, the parent of
CLUSTALV.  The two programs give similar answers, but no algorithm has
yet been developed to do a rigorous (i.e., theoretically robust)
multiple alignment if gaps are allowed.  As the experts in the field
will tell you, you may have to spend as much time deriving the best
multiple alignment as you did in generating the data (assuming at least 
one of the sequences came out of your blood, sweat, and tears :-}).  
So, whatever program(s) you use, plan to spend time looking at the
alignments and adjusting them based on your own knowledge of these
proteins.  And as Reinhard indicated, you should keep in mind the old
axiom: Garbage in, garbage out.  The computer WILL give you an
alignment, but you have to decide how relevant it is.

One important difference between PILEUP/CLUSTALV and MACAW: MACAW
comess from on a more theoretical basis and does not allow gaps. 
Thus, if you have proteins with domains of similarity that don't
require gaps within a domain, it is very useful.  For a multiple
alignment of the entire sequences, however, you will likely be taking
the domains that are found by MACAW and "pasting" them together with
gaps made up of the rest of the sequence.  Because of its theoretical
foundation, it will give you some statistics about the alignments it
finds.  It is also a very elegant piece of user-friendly programming
thanks to Greg  Schuler's efforts.  (It requires Windows 3.0.)

Other programs are out there, but these are the ones with which I've
had the most experience.  If you need the actual references for these
programs, I can send them.

John
___________________________________________________________________________
John Edward Hill, Ph.D.            |  Department of Cell Biology             
Internet: HILL at MCCLB0.MED.NYU.EDU  |  New York University Medical Center
  EARN/Bitnet: HILL at NYUMED.BITNET  |  550 Fifth Avenue                  
212-263-7135    FAX: 212-263-8139  |  New York, New York  10016-6402
___________________________________________________________________________



More information about the Proteins mailing list

Send comments to us at biosci-help [At] net.bio.net