I have an experimental compute service, called SRS-Fasta,
which will eventually, I think do what you want. If you can
settle for DNA query against DNA library, you can use this
now for similarity searches against subsets of GenBank.
There are some bugs in the code that I must find before I can
offer the form of search you want to do: Amino query against all
translations of Genbank (TFasta).
This SRS-Fasta service is found at
http://iubio.bio.indiana.edu:81/srsfasta/
And see here for the SRS keyword query system at this site:
http://iubio.bio.indiana.edu:81/srs/srsc
To use it properly, you need to learn how to compose a query
using SRS (Etzold's sequence retreival system). To find the subset
of sequences that includes just
'complete sequences of human herpesviruses'
I used this SRS query
Key words: complete & human & herpesvirus*
Selected from the field: Definition
With this, the software finds some 34 entries in the current GenBank
release. It is best to try SRS directly first to make sure your query
is composed to get the subset you want, then use SRS-Fasta to do the
similarity search.
- Don
Here is an example output of SRS-Fasta:
GenBank Subset Search
Subset selection query:
[genbank-Definition:complete&human&herpesvirus*]
Sequence name:
test
/tmp/gbsub28567.seq : 18 nt
>test: 18 nt
vs library
searching /b4/srs/tmp/srslib28567 12 library
302051 residues in 34 sequences
statistics extrapolated from 36 to 36 sequences
results sorted and z-values calculated from opt score
36 scores better than 1 saved, ktup: 6, fact: 6
DNA matrix, gap penalties: -16,-4
joining threshold: 45, optimization threshold: 30, width: 16
scan time: 0:00:00
The best scores are: initn init1 opt z-sc E(36)
GENBANK:HHV6AGNM Human herpesvirus-6 (HHV-6) U1 45 45 45 61.1 0.63
GENBANK:HH6STURPRO Human herpesvirus 6 structural 35 35 52 83.8 0.7
GENBANK:HHU13194 Human herpesvirus 6 replicatio 45 45 45 63.1 1.4
GENBANK:HHV6AGNM Human herpesvirus-6 (HHV-6) U1 40 40 41 51.4 1.9
GENBANK:HHV6AGNM Human herpesvirus-6 (HHV-6) U1102, (59968 nt)
initn: 45 init1: 45 opt: 45 z-score: 61.1 E(): 0.63
100.0% identity in 9 nt overlap
10
test GGCGGAGATGAGGACGAC
X:::::::X
GENBAN TTTATAACGCGTTGTATAAAACCCCTTTGTATGAGGACGGAATTGTTCCGTGTATCGTGT
115790 115800 115810 115820 115830 115840
GENBAN GTGTGGGTTCGCCCACGCAGAGTAATGCTTTGGTGACTTCATTTAATCCGCTGACTCAAA
115850 115860 115870 115880 115890 115900
[... more matches ...]
In article <45trki$j44 at newshost.rpms.ac.uk>,
Dr Mick Jones <mjones at rpms.ac.uk> wrote:
..
>What I want to do is search all 6 possible orfs of specific sequences
>with a very short peptide sequence (6 - 10 aa in length). The specific
>sequences are the complete sequences of human herpesviruses, each one is
>>150,000 bp in size.
>>If I send to a server to do a BLAST/FASTA search of the EMBL or GenBank
>databases I will probably have to sift through lots of irrelevant
>matches.
--
-- d.gilbert--biocomputing--indiana u--bloomington--gilbertd at bio.indiana.edu