Sean Eddy wrote:
>> In article <4me5tb$1vi at swen.emba.uvm.edu> brianf at med.uvm.edu (Brian Foley) writes:
> >Thank you very much for ideas on tools such as AMPS.
> >In addition to performing an alignment on sequences, once
> >I have all of the sequences ready for alignment, I was
> >hoping to find a tool that would use the information obtained
> >in a BLAST or FASTA run to help me obtain the sequences and
> >clip out the region withhigh similarity to my query sequence.
>> You might check out hidden Markov model software. Two packages are
> publicly distributed that I know of: SAM from UC Santa Cruz
> (http://www.cse.ucsc.edu/research/compbio/sam.html) and HMMER from
> myself at Washington University
>> HMM multiple alignment algorithms are O(N) instead of O(N^2) in the
> number of sequences, so they are much more efficient for huge sequence
> sets. They also (in my hands) tend to be more accurate than other
> popular methods for large sequence sets (though for more reasonable
> numbers of sequences (10-50) I still prefer Clustal). We've aligned
> sets as large as 2000+ sequences. Your 6000 would pose no problem.
>> HMMs can also allow you to align to a previous smaller multiple
> alignment. You can carefully hand-craft an alignment of a
> representative set of sequences, then align the rest of your 6000
> relative to that.
>> You can use an HMM built from your alignment to search for matches in
> other sequences. HMMER includes four different search algorithms: one
> for complete global alignment; one for Smith/Waterman local alignment;
> one for finding complete matches to the HMM in longer sequences (say,
> if you're trying to find several complete copies of immunoglobulin
> domains in a neural cell adhesion molecule sequence), and one for
> finding multiple non-overlapping Smith/Waterman local alignments.
> I agree, that this massive alignment situation can be overwhelming.
I have not had the pleasure of handling Sean's HMMer (*YET*) and so talk
only from current experience. The SAM package is parallelized, and
arrangements for large projects can be made I believe.
There is active availability for massive alignment projects using
simulated annealing and HMMs. Although there can be memory constraints
from the hardware IE: there is an overlap here: 10-300 sequences can be
handled using TIGR-MSA, excess of that via HMM's. Accounts for use on a
MasPar funded by DOE are available are available at the Berkely Labs. A
good manual is also available for SAM from the HTTP Sean has listed
We are tackling similar projects from South Africa to LBL and have been
able to get quite a lot done, so it is "possible" to attempt these large
problems with a realistic outcome in sight.