IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Wordsearch anyone?

Reinhard Doelz doelz at comp.bioz.unibas.ch
Mon Apr 25 13:05:23 EST 1994


Peter Rice (rice at embl-heidelberg.de) wrote:
: I wonder, does anyone use WORDSEARCH any more for full database searches?

: I just had a complaint from a user that it crashed with integer overflow
: in the SEARCHHIST routine. It turns out that the count of diagonals
: searched had overflowed (2*10^9 maximum integer value reached). A quick
: back-of-the-envelope calculation shows that for a 6000bp search sequence
: against the full GenEmbl database the overflow is expected.

: Strange that I have had no other reports here. I assume nobody searches
: GenEmbl with it these days.

I just tried it with a couple of sequences which I use, and screened 
the log files but had little luck - people don't seem to do it but rather
use fasta and my BLAST GCG interface (43 wordsearches last month, vs. 
195 fasta's and 330 blast's). 

The following data are on a AXP/VMS 4000 system, using genembl of today.

100 bp: 
         6-mers found:   147,220,371
 Diagonals with words:     7,135,117
      Total diagonals:   432,325,206
   Sequences searched:       190,890
             CPU time:      09:26.35

200 bp: 
         6-mers found:   325,500,612
 Diagonals with words:    14,586,361
      Total diagonals:   470,503,206
   Sequences searched:       190,890
             CPU time:      09:49.31

500 bp:
         6-mers found:   862,481,772
 Diagonals with words:    31,163,722
      Total diagonals:   585,037,206
   Sequences searched:       190,890
             CPU time:      11:05.25


1 KB: 
         6-mers found: 2,000,000,000
 Diagonals with words:   102,979,266
      Total diagonals:   824,029,236
   Sequences searched:       190,890
             CPU time:      13:15.32
               
6 KB: 

         6-mers found: 2,000,000,000
 Diagonals with words:   452,715,951
      Total diagonals: -1,610,140,090
   Sequences searched:       190,890
             CPU time:      30:18.15

If this is not an output or other error, I would think that already the 
numbers of 6-mers is limited. 

Regards
Reinhard 


-- 
  +---------------------------+-------------------------------------------+
  |    Dr. Reinhard Doelz     | Tel. x41 61 2672247    Fax x41 61 2672078 |
  |      Biocomputing         | electronic Mail       doelz at urz.unibas.ch |
  |Biozentrum der Universitaet+-------------------------------------------+



More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net