Massive Multiple Sequence Alignment tools?

Win Hide winhide at icon.co.za
Sat May 11 09:12:53 EST 1996

Sean Eddy wrote:
> In article <4me5tb$1vi at swen.emba.uvm.edu> brianf at med.uvm.edu (Brian Foley) writes:
>   >Thank you very much for ideas on tools such as AMPS.
>   >In addition to performing an alignment on sequences, once
>   >I have all of the sequences ready for alignment, I was
>   >hoping to find a tool that would use the information obtained
>   >in a BLAST or FASTA run to help me obtain the sequences and
>   >clip out the region withhigh similarity to my query sequence.
> You might check out hidden Markov model software. Two packages are
> publicly distributed that I know of: SAM from UC Santa Cruz
> (http://www.cse.ucsc.edu/research/compbio/sam.html) and HMMER from
> myself at Washington University
> (http://genome.wustl.edu/eddy/hmmer.html).
> HMM multiple alignment algorithms are O(N) instead of O(N^2) in the
> number of sequences, so they are much more efficient for huge sequence
> sets. They also (in my hands) tend to be more accurate than other
> popular methods for large sequence sets (though for more reasonable
> numbers of sequences (10-50) I still prefer Clustal).  We've aligned
> sets as large as 2000+ sequences. Your 6000 would pose no problem.
> HMMs can also allow you to align to a previous smaller multiple
> alignment. You can carefully hand-craft an alignment of a
> representative set of sequences, then align the rest of your 6000
> relative to that.
> You can use an HMM built from your alignment to search for matches in
> other sequences. HMMER includes four different search algorithms: one
> for complete global alignment; one for Smith/Waterman local alignment;
> one for finding complete matches to the HMM in longer sequences (say,
> if you're trying to find several complete copies of immunoglobulin
> domains in a neural cell adhesion molecule sequence), and one for
> finding multiple non-overlapping Smith/Waterman local alignments.
> I agree, that this massive alignment situation can be overwhelming.
I have not had the pleasure of handling Sean's HMMer (*YET*) and so talk 
only from current experience. The SAM package is parallelized, and 
arrangements for large projects can be made I believe.

see http://www-hgc.lbl.gov/inf/maspar.html

There is active availability for massive alignment projects using 
simulated annealing and HMMs. Although there can be memory constraints 
from the hardware IE: there is an overlap here: 10-300 sequences can be 
handled using TIGR-MSA, excess of that via HMM's. Accounts for use on a 
MasPar funded by DOE are available are available at the Berkely Labs. A 
good manual is also available for SAM from the HTTP Sean has listed 

We are tackling similar projects from South Africa to LBL and have been 
able to get quite a lot done, so it is "possible" to attempt these large 
problems with a realistic outcome in sight.


More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net