>> This is because there are no real good multiple alignment sequences that can
>> tackle more than a few dozen sequences at once without an explosion of time
>> and memory requirements. (Stoye et al have some nice publications on this).
>> Actually there are some decent multiple sequence alignment algorithms
> that don't explode. MUSCLE does fairly well up to a few thousand
> sequences and HMM-based methods (though not quite as good at multiple
> sequence alignments) are linear in the number of sequences and do
> fairly well up to tens of thousands of sequences.
Supposedly, the new version of MAFFT is able to handle more than 50000
sequences, by employing PartTree, a very fast method to construct a
guide tree.
See
http://bioinformatics.oxfordjournals.org/cgi/content/full/23/3/372
and
http://align.bmr.kyushu-u.ac.jp/mafft/software/
Andreas
--
Andreas Wilm <andreas.wilm{at}ucd.ie>
Postdoctoral Fellow
Higgins Laboratory, UCD Conway Institute
UCD, Dublin 4, Ireland
http://bioinf.ucd.ie/