Massive Multiple Sequence Alignment tools?

Geoff Barton gjb at bioch.ox.ac.uk
Tue Apr 30 08:23:18 EST 1996

higgins at ebi.ac.uk wrote:
> In article <31849227.794B at bioch.ox.ac.uk>, Geoff Barton <gjb at bioch.ox.ac.uk> writes:
> > Brian Foley wrote:
> >
> >>         I am looking for an automated (or even semi-autimated)
> >> method for generating a multiple sequence alignment of something
> >> like 6,000 sequences, all of them > 80% identical to one another.
> >>         I wish to align the envelope gene (or portions there-of)
> >> which have been sequenced from the Human Immunodeficiency Virus
> >> type 1 or types 1 and 2.  A BLAST search against the nr dataset
> >> provided by NCBI reveals that there are several thousand HIV
> >> env sequences in the database today.
> >
> > Aligning this number of sequences shouldn't be a problem for most
> > multiple alignment programs since the sequences are very similar to
> > each other.  However, you may need to redimension
> > arrays to cope with 6,000 sequences.
> >
> > My alignment program AMPS ought to cope with this in
> > "single order" mode  (see our WWW and ftp site below for on-line
> > manual and download instructions), though I have never tried to align
> > this many sequences!
> >
> > Geoff.
> >
> Yes, Geoff's program (AMPS) is worth a try for this.  Clustal can do it in
> principle if you redimension it but is likely to take a LONG time and I am sure
> you will blow it up somewhere if you try.   The time will get used up just
> comparing all the sequences to each other in order to make a guide tree.
> 6000 sequences to be self compared will require 15,000,000 comparisons
> and then you will need a few days cpu to make the tree.

A good point Des.  I should have stressed that you shouldn't bother 
with the pairwise comparison step in AMPS.  Just skip straight to single 
order multiple alignment.  If your sequences are ordered the same as
they are returned from a BLAST database scan, then the order will 
be close to ideal anyway.
> I hope you have a good idea what you want to do with the sequences when
> they are aligned :-).  It will take a long time to print out if you wish
> to eye-ball it (or you will need a VERY small font on a VERY big monitor).

ALSCRIPT might help with displaying and/or printing and AMAS should
help with the analysis, but please don't send the alignment to the
AMAS web server, at least not until we upgrade the machine that this
runs on :-))


More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net