IUBio

BLAST/FASTA search of a specific sequence

Don Gilbert gilbertd at sunflower.bio.indiana.edu
Mon Oct 16 18:23:01 EST 1995


I have an experimental compute service, called SRS-Fasta,
which will eventually, I think do what you want.  If you can
settle for DNA query against DNA library, you can use this
now for similarity searches against subsets of GenBank.  
There are some bugs in the code that I must find before I can 
offer the form of search you want to do: Amino query against all 
translations of Genbank (TFasta).

This SRS-Fasta service is found at
  http://iubio.bio.indiana.edu:81/srsfasta/
And see here for the SRS keyword query system at this site:
  http://iubio.bio.indiana.edu:81/srs/srsc

To use it properly, you need to learn how to compose a query 
using SRS (Etzold's sequence retreival system).  To find the subset 
of sequences that includes just 
  'complete sequences of human herpesviruses'
I used this SRS query
  Key words: complete & human & herpesvirus*
  Selected from the field: Definition

With this, the software finds some 34 entries in the current GenBank
release. It is best to try SRS directly first to make sure your query 
is composed to get the subset you want, then use SRS-Fasta to do the 
similarity search. 

- Don


Here is an example output of SRS-Fasta:
GenBank Subset Search

Subset selection query:
  [genbank-Definition:complete&human&herpesvirus*]
Sequence name:
  test

 /tmp/gbsub28567.seq :   18 nt
 >test: 18 nt
 vs  library
 searching /b4/srs/tmp/srslib28567 12 library

 302051 residues in    34 sequences
 statistics extrapolated from 36 to 36 sequences
 results sorted and z-values calculated from opt score
   36 scores better than 1 saved, ktup: 6, fact: 6
 DNA matrix, gap penalties: -16,-4
 joining threshold: 45, optimization threshold: 30, width: 16
  scan time:  0:00:00


The best scores are:                             initn init1 opt  z-sc E(36)
GENBANK:HHV6AGNM   Human herpesvirus-6 (HHV-6) U1   45  45   45 61.1   0.63
GENBANK:HH6STURPRO Human herpesvirus 6 structural   35  35   52 83.8    0.7
GENBANK:HHU13194   Human herpesvirus 6 replicatio   45  45   45 63.1    1.4
GENBANK:HHV6AGNM   Human herpesvirus-6 (HHV-6) U1   40  40   41 51.4    1.9


GENBANK:HHV6AGNM   Human herpesvirus-6 (HHV-6) U1102,  (59968 nt)
initn:   45  init1:   45  opt:   45 z-score: 61.1 E():   0.63
 100.0% identity in 9 nt overlap

                                      10                           
test                          GGCGGAGATGAGGACGAC                   
                                     X:::::::X                     
GENBAN TTTATAACGCGTTGTATAAAACCCCTTTGTATGAGGACGGAATTGTTCCGTGTATCGTGT
          115790    115800    115810    115820    115830    115840 

GENBAN GTGTGGGTTCGCCCACGCAGAGTAATGCTTTGGTGACTTCATTTAATCCGCTGACTCAAA
          115850    115860    115870    115880    115890    115900 


[... more matches ...]


In article <45trki$j44 at newshost.rpms.ac.uk>,
Dr Mick Jones  <mjones at rpms.ac.uk> wrote:
..
>What I want to do is search all 6 possible orfs of specific sequences 
>with a very short peptide sequence (6 - 10 aa in length).  The specific 
>sequences are the complete sequences of human herpesviruses, each one is 
>>150,000 bp in size.
>
>If I send to a server to do a BLAST/FASTA search of the EMBL or GenBank 
>databases I will probably have to sift through lots of irrelevant 
>matches.
-- 
-- d.gilbert--biocomputing--indiana u--bloomington--gilbertd at bio.indiana.edu




More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net