[Genbank-bb] GenBank Update Problem : 0912 : Incorrect files between 04:30am and 10:08am

Cavanaugh, Mark (NIH/NLM/NCBI) [E] via genbankb%40net.bio.net (by cavanaug from ncbi.nlm.nih.gov)
Fri Sep 12 10:35:21 EST 2008

Dear GenBank Users,

Processing for the GenBank Incremental Update (GIU) and for GenBank WGS
data products was moved to new hardware on Thursday, September 11. 

Unfortunately, some configuration files that were used during previous
tests of the new hardware were *not* updated with the files from the
production system.

This led to the creation of unnecessarily large GIU files on September
(nc0912), containing records that date back to (at least) August 10th.

The affected 0912 GIU files had these timestamps and sizes:

-rw-r--r--   1 gbupdate gbproces 30892117 Sep 12 04:44
-rw-r--r--   1 gbupdate gbproces 533951491 Sep 12 04:31 nc0912.flat.gz
-rw-r--r--   1 gbupdate gbproces 317692100 Sep 12 04:37 nc0912.fsa_nt.gz
-rw-r--r--   1 gbupdate gbproces 34925179 Sep 12 04:10 nc0912.fsa.gz
-rw-r--r--   1 gbupdate gbproces 60205075 Sep 12 04:10 nc0912.gnp.gz
-rw-r--r--   1 gbupdate gbproces 92824768 Sep 12 04:14 nc0912.qscore.gz

-rw-r--r--   1 gbupdate gbproces 33507308 Sep 12 04:44
-rw-r--r--   1 gbupdate gbproces 423510078 Sep 12 04:16 nc0912.aso.gz

Note that the uncompressed size of nc0912.flat.gz is over 2.5 GB :

         compressed        uncompressed  ratio uncompressed_name
          533951491          2652522424  79.9% nc0912.flat


This problem was discovered on the morning of September 12. The
GIU files were removed, a new GIU run was started, and this yielded
corrected 0912 update products at about 10:00am :

-rw-r--r--   1 gbupdate gbproces 55850950 Sep 12 10:08 nc0912.flat.gz
-rw-r--r--   1 gbupdate gbproces 36800819 Sep 12 10:08 nc0912.fsa_nt.gz
-rw-r--r--   1 gbupdate gbproces 1919440 Sep 12 10:08 nc0912.fsa.gz
-rw-r--r--   1 gbupdate gbproces 3169378 Sep 12 10:08 nc0912.gnp.gz
-rw-r--r--   1 gbupdate gbproces 2254192 Sep 12 10:08 nc0912.qscore.gz

-rw-r--r--   1 gbupdate gbproces 38159306 Sep 12 10:08 nc0912.aso.gz

Note that the uncompressed size of the corrected nc0912.flat.gz GIU
is only a tenth of the size of the incorrect version:

         compressed        uncompressed  ratio uncompressed_name
           55850950           252578593  77.9% nc0912.flat

Note also that there are no CON-division GIU products for 0912 .


The invalid 0912 GIU products were available via FTP for approximately
six hours. If you transferred them between 4:00am ET and 10:08am ET,
please check their sizes to see if you need to obtain new, corrected,
smaller versions of the files.


Fortunately, the effect on our WGS project files was very minimal :
the data files for a single project, CABB, were unnecessarily refreshed.


Our apologies for the inconvenience that this error has caused.

Mark Cavanaugh

More information about the Genbankb mailing list

Send comments to us at biosci-help [At] net.bio.net