In article <31849227.794B at bioch.ox.ac.uk>, Geoff Barton <gjb at bioch.ox.ac.uk> writes:
> Brian Foley wrote:
>>> I am looking for an automated (or even semi-autimated)
>> method for generating a multiple sequence alignment of something
>> like 6,000 sequences, all of them > 80% identical to one another.
>> I wish to align the envelope gene (or portions there-of)
>> which have been sequenced from the Human Immunodeficiency Virus
>> type 1 or types 1 and 2. A BLAST search against the nr dataset
>> provided by NCBI reveals that there are several thousand HIV
>> env sequences in the database today.
>> Aligning this number of sequences shouldn't be a problem for most
> multiple alignment programs since the sequences are very similar to
> each other. However, you may need to redimension
> arrays to cope with 6,000 sequences.
>> My alignment program AMPS ought to cope with this in
> "single order" mode (see our WWW and ftp site below for on-line
> manual and download instructions), though I have never tried to align
> this many sequences!
Yes, Geoff's program (AMPS) is worth a try for this. Clustal can do it in
principle if you redimension it but is likely to take a LONG time and I am sure
you will blow it up somewhere if you try. The time will get used up just
comparing all the sequences to each other in order to make a guide tree.
6000 sequences to be self compared will require 15,000,000 comparisons
and then you will need a few days cpu to make the tree.
I hope you have a good idea what you want to do with the sequences when
they are aligned :-). It will take a long time to print out if you wish
to eye-ball it (or you will need a VERY small font on a VERY big monitor).
> Geoffrey J. Barton, Laboratory of Molecular Biophysics, University of
> Rex Richards Building, South Parks Road, Oxford OX1 3QU, U.K.
> mailto:gjb at bioch.ox.ac.uk, Tel: +44 1865 275368, Fax: +44 1865 510454,