Peter Rice (rice at embl-heidelberg.de) wrote:
: I wonder, does anyone use WORDSEARCH any more for full database searches?
: I just had a complaint from a user that it crashed with integer overflow
: in the SEARCHHIST routine. It turns out that the count of diagonals
: searched had overflowed (2*10^9 maximum integer value reached). A quick
: back-of-the-envelope calculation shows that for a 6000bp search sequence
: against the full GenEmbl database the overflow is expected.
: Strange that I have had no other reports here. I assume nobody searches
: GenEmbl with it these days.
I just tried it with a couple of sequences which I use, and screened
the log files but had little luck - people don't seem to do it but rather
use fasta and my BLAST GCG interface (43 wordsearches last month, vs.
195 fasta's and 330 blast's).
The following data are on a AXP/VMS 4000 system, using genembl of today.
100 bp:
6-mers found: 147,220,371
Diagonals with words: 7,135,117
Total diagonals: 432,325,206
Sequences searched: 190,890
CPU time: 09:26.35
200 bp:
6-mers found: 325,500,612
Diagonals with words: 14,586,361
Total diagonals: 470,503,206
Sequences searched: 190,890
CPU time: 09:49.31
500 bp:
6-mers found: 862,481,772
Diagonals with words: 31,163,722
Total diagonals: 585,037,206
Sequences searched: 190,890
CPU time: 11:05.25
1 KB:
6-mers found: 2,000,000,000
Diagonals with words: 102,979,266
Total diagonals: 824,029,236
Sequences searched: 190,890
CPU time: 13:15.32
6 KB:
6-mers found: 2,000,000,000
Diagonals with words: 452,715,951
Total diagonals: -1,610,140,090
Sequences searched: 190,890
CPU time: 30:18.15
If this is not an output or other error, I would think that already the
numbers of 6-mers is limited.
Regards
Reinhard
--
+---------------------------+-------------------------------------------+
| Dr. Reinhard Doelz | Tel. x41 61 2672247 Fax x41 61 2672078 |
| Biocomputing | electronic Mail doelz at urz.unibas.ch |
|Biozentrum der Universitaet+-------------------------------------------+