Greetings GenBank Users,
As described in the announcement for GenBank 193.0 availability,
we are providing files which catalog the contents of a release.
The genbank/catalog directory at the NCBI FTP site now contains
these files:
gb193.catalog.est.txt.gz
gb193.catalog.gss.txt.gz
gb193.catalog.other.txt.gz
gb193.gene_list.gss.txt.gz
gb193.gene_list.other.txt.gz
gb193.pmid_list.est.txt.gz
gb193.pmid_list.gss.txt.gz
gb193.pmid_list.other.txt.gz
The format and content of these files is described in Section 1.3.4
of the GenBank 193.0 release notes (gbrel.txt).
Note that there is no gene_list file for EST, because EST records
at the NCBI are not annotated with anything other than source
features.
There is one known issue involving the Division-Code field
of the catalog : Finished sequence records that originated
in clone-based high-throughput genome sequencing (HTG) projects
have a division code of "HTG", even though those sequence
records may have moved to (for example) the PRI division,
upon completion. We're considering a change that would make
this column contain multiple values, to reflect the fact
that a sequence can be categorized in multiple ways. For
example: "HTG,PRI" or "GSS,ENV" .
So obviously these products are still in a bit of flux.
Now would be a good time to pass along any suggestions
that you might have for the content and structure of these
catalog, and related, files.
Mark Cavanaugh
GenBank
NCBI/NLM/NIH/HHS