I have found the answer to my problem and although
it is not simple, it is potentially very powerful. I have
not run the analysis yet, so I can't provide too many details yet
but hopefully I can follow up in a couple of weeks with more
information. If anyone else is interested, feel free to
e-mail me directly.
The solution is to use Hidden Markov Methods
such as SAM http://www.cse.ucsc.edu/research/compbio/sam.html
and HMMER http://genome.wustl.edu/eddy/hmm.html
to generate a "model" which is somewhat like a "consensus
sequence" but retains information about the percent of
sequences which contain a particular base (or amino acid)
at each site. THe model is a full matrix of data about
an alignment, rather than just a single consensus sequence.
The "model" can then be used to query the database, and
I have been told that I should be able to adjust that
query to give me out put that contains the information
needed to automate a full multiple sequence alignment of all
the sequences in the database that match my model to some
cut-off value I specify.
I'll let you know how it works.
--
********************************************************************
* Brian Foley * btf at t10.lanl.gov *
* T-10, MS-K710, LANL * http://hiv-web.lanl.gov *
* Los Alamos, NM 87545 USA * http://www.uvm.edu/~bfoley *
********************************************************************