Massive Multiple Sequence Alignment tools?

Ewan Birney birney at molbiol.ox.ac.uk
Sat Apr 27 05:56:15 EST 1996

>         I am looking for an automated (or even semi-autimated)
> method for generating a multiple sequence alignment of something
> like 6,000 sequences, all of them > 80% identical to one another.
>         I wish to align the envelope gene (or portions there-of)
> which have been sequenced from the Human Immunodeficiency Virus
> type 1 or types 1 and 2.  A BLAST search against the nr dataset
> provided by NCBI reveals that there are several thousand HIV
> env sequences in the database today.
>         If I cannot find a tool already suitable for this, I'd
> like advice on building a program (perhaps using ASN.1 code from
> the NCBI Software Developers Toolkit) that will build a massive
> multiple sequence alignment, given a query sequence (I plan to
> use a "consensus sequence" from an alignment of 50 HIV env genes
> from diverse subtype) and the GenBank/EMBL database.
>         My first thought is to use a tool such as FASTA to
> obtain information about each sequence from GenBank (Is
> it highly similar to HIV env?  If so, what region of it
> aligns with what region of my query) and then use that information
> as a starting point for the multiple sequence alignment.
>         Any thoughts or help will be greatly appreciated.

The best method I would suggest would be to use a HMM method. You could
build an HMM using a representitive subset and then use the HMM
to build the larger alginment. The two main HMM packages are

HMMer - http://genome.wustl.edu/eddy/hmm.html


SAM  http://www.cse.ucsc.edu/research/compbio/sam.html

Give the high level of similarity there should be no problem using 
these tools.


More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net