Hello -
Has anyone had any problems reformatting Genbank release 82.0
for GCG? When I ran the genbanktogcg command, I got a series
of errors that prevented the database from formatting
properly. Here are the error messages:
---
.
. (some correctly formatted subsections...)
.
Input Entries: 968
Output Entries: 968
Excluded Entries: 0
Total Length: 1,414,274
Errors: 0
gb_pl .................................................................................................................................................................
Error in DBIndex. Cannot read entry "YSCRNAHEL".
.
. (some correctly formatted subsections...)
.
Input Entries: 3,603
Output Entries: 3,603
Excluded Entries: 0
Total Length: 2,176,197
Errors: 0
gb_ro .............................................................................................................................................................................................................
Error in DBIndex. Cannot read entry "U01914".
Input Entries: 20,581
Output Entries: 20,581
Excluded Entries: 0
Total Length: 22,836,624
Errors: 0
gb_sy .................
*** Reference: "YEP13" and Sequence: "YEP213" DO NOT MATCH! ***
*** Reference: "YEP213" and Sequence: "YEP353" DO NOT MATCH! ***
*** Reference: "YEP353" and Sequence: "YRP7" DO NOT MATCH! ***
Error in DBIndex. Cannot read entry "YRP7".
Input Entries: 1,717
Output Entries: 1,717
Excluded Entries: 0
Total Length: 2,572,139
Errors: 0
gb_un ..............
Error in DBIndex. Cannot read entry "SYNPCH75".
Input Entries: 1,490
Output Entries: 1,490
Excluded Entries: 0
Total Length: 1,391,910
Errors: 0
gb_vi ...............................
Error in DBIndex. Cannot read entry "GPCDNPO".
Input Entries: 3,151
Output Entries: 3,151
Excluded Entries: 1
Total Length: 4,883,586
Errors: 0
GenBankToGCG complete:
Libraries: 13
Input Entries: 150,613
Output Entries: 150,613
Excluded Entries: 1
Total Length: 157,605,316
Errors: 0
CPU: 7:20:12.56
---
Now, when my script got to the seqcat stage, none of the
subsections with errors got processed:
---
.
. (some correctly formatted subsections...)
.
gb_pl
gb_pr
gb_ro
gb_st
gb_sy
gb_un
gb_vi
SEQCAT complete:
Categories: 13
Entries: 71,948
Total Length: 66,289,524
CPU: 06:03.16
ACCESSIONNUMBERS complete
Input files: 13
Input lines: 3,518,239
Accession numbers: 93,207
Output file: "/usr/local/data/gcgdata/gcggenbank/genbank.exclude"
---
What I need to know is how to get the database formatted
properly. If anyone knows of a fix, or if the genbank flat files
have been corrected, please (!) let me know.
Thanks in advance,
Malin Masreliez
--------------------------------------------------------------------------------
Malin Masreliez
masrelia at ava.bcc.orst.edu "It works better if you plug it in."
Center for Gene Research -Sattinger's Law
Oregon State University
________________________________________________________________________________