Massive Multiple Sequence Alignment tools?

Brian Foley brianf at med.uvm.edu
Fri May 3 18:46:51 EST 1996

Thank you very much for ideas on tools such as AMPS.
In addition to performing an alignment on sequences, once 
I have all of the sequences ready for alignment, I was
hoping to find a tool that would use the information obtained
in a BLAST or FASTA run to help me obtain the sequences and
clip out the region withhigh similarity to my query sequence.

There are some 6,000 HIV sequences in the GenBank database
and it is easy to obtain the accession numberds for all
6,000.  The problem is that some are complete genomes, some 
contain only the 5' half of the region I want to align 
(the envelope gene), some conatin only the 3' half, and
some are only fragments in the middle.

I'd like to automate the process of retrieving the GenBank
entry, and either clipping the env gene out of complete
genomes, or padding the ends of incomplete sequences until
all of the sequences are in approximate alignment.

The results of a BLAST search give me almost all the
information, I would need to write a script to do this.
However, BLAST does not allow gaps, and often gives me 
several fragmentary alignments for a single sequence
paired to my query.  I suspect that FASTA will do
better, but I have not gotten FASTA running locally,
and I have not found a FASTA server which will return
the best 7,000 matches to me (most FASTA servers are 
limited to less than 100 "hits" via e-mail).

I do have some tools for dealing with the final product alignment.
I have already aligned over 1,000 HIV-1 env V3 region sequences.
I have tools to count the frequency of ocurrance of each base
(or amino acid in a protein alignment) at each position.  This then
shows me the relative variability at each position in the
gene or protein.  

Two opposing forces are at work on the evolution of the HIV-1
env gene.  Conservative forces are selecting for envelope protein 
which can function well in binding to the CD4 receptors on
host cells.  Diversifying forces are selecting for envelope
proteins which can evade the host's immune system.  Forgive
the unscientific use of "forces" there.

We need to develop vaccines based on conserved regions.  The
variable regions will diverge too quickly.  By the time we
have a HIV-1 subtype B vaccine generated, tested and approved,
the subtype B will have moved on.  We have some good information
about which regions of the envelope protein are immunogenic.
Are any of the immunogenic regions conserved?

*  Brian Foley               *  btf at t10.lanl.gov                   *
*  T-10, MS-K710, LANL       *  http://hiv-web.lanl.gov            *
*  Los Alamos, NM 87545 USA  *  http://hiv-web.lanl.gov/~btf       *

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net