IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Formatting Problems: Genbank --> GCG

Malin Masreliez masrelia at ava.bcc.orst.edu
Tue Apr 19 18:49:56 EST 1994


Hello -

Has anyone had any problems reformatting Genbank release 82.0
for GCG?  When I ran the genbanktogcg command, I got a series
of errors that prevented the database from formatting
properly.  Here are the error messages:

---
.
.   (some correctly formatted subsections...)
.

    Input Entries: 968
   Output Entries: 968
 Excluded Entries: 0
     Total Length: 1,414,274
           Errors: 0

 gb_pl                .................................................................................................................................................................
 Error in DBIndex. Cannot read entry "YSCRNAHEL".

.
.   (some correctly formatted subsections...)
.
    
    Input Entries: 3,603
   Output Entries: 3,603
 Excluded Entries: 0
     Total Length: 2,176,197
           Errors: 0

 gb_ro                .............................................................................................................................................................................................................
 Error in DBIndex. Cannot read entry "U01914".


    Input Entries: 20,581
   Output Entries: 20,581
 Excluded Entries: 0
     Total Length: 22,836,624
           Errors: 0

 gb_sy                .................
 *** Reference: "YEP13" and Sequence: "YEP213" DO NOT MATCH! ***
 *** Reference: "YEP213" and Sequence: "YEP353" DO NOT MATCH! ***
 *** Reference: "YEP353" and Sequence: "YRP7" DO NOT MATCH! ***
 Error in DBIndex. Cannot read entry "YRP7".


    Input Entries: 1,717
   Output Entries: 1,717
 Excluded Entries: 0
     Total Length: 2,572,139
           Errors: 0

 gb_un                ..............
 Error in DBIndex. Cannot read entry "SYNPCH75".


    Input Entries: 1,490
   Output Entries: 1,490
 Excluded Entries: 0
     Total Length: 1,391,910
           Errors: 0

 gb_vi                ...............................
 Error in DBIndex. Cannot read entry "GPCDNPO".


    Input Entries: 3,151
   Output Entries: 3,151
 Excluded Entries: 1
     Total Length: 4,883,586
           Errors: 0

 GenBankToGCG complete:

        Libraries: 13
    Input Entries: 150,613
   Output Entries: 150,613
 Excluded Entries: 1
     Total Length: 157,605,316
           Errors: 0
              CPU: 7:20:12.56

---

Now, when my script got to the seqcat stage, none of the
subsections with errors got processed:

---

.
.   (some correctly formatted subsections...)
.

 gb_pl                
 gb_pr                
 gb_ro                
 gb_st                
 gb_sy                
 gb_un                
 gb_vi                

 SEQCAT complete:

   Categories: 13
      Entries: 71,948
 Total Length: 66,289,524
          CPU: 06:03.16


 ACCESSIONNUMBERS complete

               Input files: 13
               Input lines: 3,518,239
         Accession numbers: 93,207

               Output file: "/usr/local/data/gcgdata/gcggenbank/genbank.exclude"
---

What I need to know is how to get the database formatted
properly. If anyone knows of a fix, or if the genbank flat files
have been corrected, please (!) let me know.

Thanks in advance,


Malin Masreliez
--------------------------------------------------------------------------------
Malin Masreliez
masrelia at ava.bcc.orst.edu		"It works better if you plug it in."
Center for Gene Research			-Sattinger's Law
Oregon State University
________________________________________________________________________________





More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net