IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

GCG 9.0 FASTA and EPD - core dump!

Steve Thompson stevet at wsu.edu
Mon Sep 22 12:13:03 EST 1997


Hello Everybody -

In Article <342652A3.3F54 at icrf.icnet.uk>, Aengus Stewart
<aengus.stewart at icrf.icnet.uk> wrote (in part):

>I am getting a core dump - floating point exception when runnning 9.0
>FASTA against the Eukaryotic Promoter Database (EPD) from Phillip . . .

We too maintain EPD just for this sort of search and I hadn't tested it
against our GCG version 9.0 FastA until I saw Aengus' note.  As Aengus
suggests the previous version, 8.1, worked just fine with EPD.  However, the
Z-score normalization routine (which is new to GCG's version of FastA for
9.0) does crash with EPD in version 9.1, but on our system, SGI IRIX 6.2, it
does NOT produce a core dump, but the results aren't very pretty.  I'll
include some of the output to give you all a flavor for what happens:

!!SEQUENCE_LIST 1.0

(Nucleotide) FASTA of: promoter.seq  from: 1 to: 513  September 22, 1997 09:50

. . .

 TO: epd:*  Sequences:      1,285  Symbols:    771,000  Word Size: 2

 Databases searched:
   epd, Release 48.0, Released on 0Oct1996, Formatted on 0Jan1997

 Searching with both strands of the query.
 Scoring matrix: GenRunData:fastadna.cmp
 Constant pamfactor used
 Gap creation penalty: 16      Gap extension penalty: 4

. . . 

 Results sorted and z-values calculated from opt score
 1501 scores saved that exceeded 2147483647
 1265 optimizations performed
 Joining threshold: 71, optimization threshold: 56, opt. width: 16

The best scores are:                    init1 initn   opt    z-sc E(0)..

EPD:EP030025    Begin: 327  End:  342  Strand: -
! E30025 Mm c-abl 6.5 kb E1; range -4...   44    44    44 nan0x7fffffff       0
EPD:EP030025    Begin: 505  End:  516
! E30025 Mm c-abl 6.5 kb E1; range -4...   42    42    42 nan0x7fffffff       0
EPD:EP030003    Begin: 241  End:  262  Strand: -
! E30003 Hs c-N-ras; range -499 to 10...   56    56    56 nan0x7fffffff       0
EPD:EP030003    Begin: 319  End:  370
! E30003 Hs c-N-ras; range -499 to 10...   72    72    73 nan0x7fffffff       0

. . . 

EPD:EP016062    Begin: 454  End:  509
! E16062 Rn c-myc P2+; range -499 to ...   55    55    55 nan0x7fffffff       0
EPD:EP014067    Begin: 231  End:  256  Strand: -
! E14067 Mm c-myc P2+; range -499 to ...   48    48    49 nan0x7fffffff       0
EPD:EP014067    Begin: 239  End:  271
! E14067 Mm c-myc P2+; range -499 to ...   75    75    75 nan0x7fffffff       0
EPD:EP011146    Begin: 546  End:  561  Strand: -
! E11146 Hs c-myc P1; range -499 to 1...   44    44    44 nan0x7fffffff       0
EPD:EP011146    Begin: 423  End:  437
! E11146 Hs c-myc P1; range -499 to 1...   56    56    57 nan0x7fffffff       0
EPD:EP016061    Begin: 9  End:  45  Strand: -
! E16061 Rn c-myc P1; range -499 to 1...   59    59    65 nan0x7fffffff       0
EPD:EP016061    Begin: 94  End:  145
! E16061 Rn c-myc P1; range -499 to 1...   63    63    68 nan0x7fffffff       0
\\End of List

promoter.seq /rev
EPD:EP030025

ID   EP030025   standard; DNA; EPD;   600 BP.
AC   E30025;
DE   Mm c-abl 6.5 kb  E1; range  -499 to   100.
CC   Source: Eukaryotic Promoter Database / Release 48
CC   Mm c-abl 6.5 kb  E1 :+M  ROD:MMABLC1B   1+     665; 30025.
CC . . .


SCORES      Init1:    44  Initn:    44  Opt:    44 z-score: nan0x7fffffff E():
    0
  75.0% identity in 16 bp overlap

                199       189       179       169       159       149
promoter.seq AAAATTTTCCAACTTAAAATTAAATATATAAAAATATATTTTTAAATCAATATCTAACTT
                                           ||||  ||  ||||||
EP030025     GCGCTTCCTCATCTCTCACCTTGAGCTCAGAAAAGCTACCTTTAAAAGGTCGTGCGGAGC
              300       310       320       330       340       350


As you can see, the search is probably valid, but the statistics are pretty
hard to interpret as it's darn difficult for Pearson's Expectation Function
to calculate a probability from a z-score of nan0x7fffffff  :^)

Don't know what the problem is, maybe all the N's that Aengus suggests.  But
this line is certainly suspect: "1501 scores saved that exceeded
2147483647."  2147483647 is WAY too big of an "opt" score - something screwy
IS going on.

                                            Cheers - Steve

                                Steven M. Thompson
              Consultant in Molecular Genetics and Sequence Analysis
        Visualization, Analysis & Design in the Molecular Sciences (VADMS)
             Washington State University, Pullman, WA 99164-4660, USA
                  AT&Tnet:  (509) 335-3179  FAX:  (509) 335-9688
                    INTERnet:  thompson at ribozyme.vadms.wsu.edu




More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net