IUBio

[Genbank-bb] Re-Send : PROJECT linetype to be replaced by DBLINK

Cavanaugh, Mark (NIH/NLM/NCBI) [E] via genbankb%40net.bio.net (by cavanaug from ncbi.nlm.nih.gov)
Fri Sep 26 16:03:42 EST 2008


[This listserv seems to impose a fairly short line-wrap for
 text messages, which made my previous post difficult to
 read. Hence this re-send, with shorter line lengths, where
 possible.]

Greetings GenBank Users,

The PROJECT linetype allows a sequence record to be linked to
information about the sequencing project that generated the data
which ultimately resulted in the record's submission to the
International Nucleotide Sequence Database ( INSD; see
http://www.insdc.org ).

This complete bacterial GenBank record illustrates the use of
the PROJECT line:

LOCUS       CP000964             5641239 bp    DNA     circular BCT
24-SEP-2008
DEFINITION  Klebsiella pneumoniae 342, complete genome.
ACCESSION   CP000964
VERSION     CP000964.1  GI:206564770
PROJECT     GenomeProject:28471

When viewed on the web in NCBI's Entrez:Nucleotide, the record's
project identifier (28471) links to an entry in the Genome Project
Database (GPDB) :

 
http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=
Overview&uid=28471

where information about the sequencing center, the bacterium, and
other GenBank records (eg, plasmids) associated with the sequencing
project can be found.

Since the introduction of PROJECT, the scope of the "Genome" Project
Database has expanded, to include projects that are not necessarily
targetted to the sequencing of a complete genome.

In addition, there can be other resources which underlie an INSD
sequence record, such as the Trace Assembly Archive at the NCBI:

 
http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=show&f=t
ree&m=main&s=tree

Because of the expanded scope of the GPDB, and because we
anticipate a need to link to more resources than just the GPDB,
the PROJECT linetype is going to be replaced by a new linetype:

   DBLINK

Further details about this change, and its timetable, follow.

Mark Cavanaugh
GenBank
NCBI/NLM/NIH/HHS

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Modifications to linetypes can be disruptive, so the switch to
DBLINK will occur in several stages.

Starting in October 2008, links to the NCBI Trace Assembly Archive
will be supported via a line of text in the COMMENT section of
sequence records.

Here is a mock-up, based on CP000964, to illustrate this change:

LOCUS       CP000964             5641239 bp    DNA     circular BCT
24-SEP-2008
DEFINITION  Klebsiella pneumoniae 342, complete genome.
ACCESSION   CP000964
VERSION     CP000964.1  GI:206564770
PROJECT     GenomeProject:28471
....
COMMENT     Trace Assembly Archive:123456
            The source for the DNA and/or cells is:  Professor Eric W.
            Triplett, Chair, Department of Microbiology and Cell
Science,
            Institute of Food and Agricultural Sciences, University of
Florida,
            P.O. Box 110700, Gainesville, FL 32611-0700, ewt from ufl.edu.

Note: Use of the Trace Assembly Archive is still in its early
stages, so only a few records are expected to have these links in
the short term.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

The new DBLINK linetype will be introduced as of GenBank Release
170.0 (February 15, 2009) .

The Genome Project ID and the Trace Assembly Archive ID will be
presented via DBLINK, and the existing PROJECT line will continue
to be displayed:

LOCUS       CP000964             5641239 bp    DNA     circular BCT
24-SEP-2008
DEFINITION  Klebsiella pneumoniae 342, complete genome.
ACCESSION   CP000964
VERSION     CP000964.1  GI:206564770
PROJECT     GenomeProject:28471
DBLINK      Project:28471
            Trace Assembly Archive:123456
....
COMMENT     The source for the DNA and/or cells is:  Professor Eric W.
            Triplett, Chair, Department of Microbiology and Cell
Science,
            Institute of Food and Agricultural Sciences, University of
Florida,
            P.O. Box 110700, Gainesville, FL 32611-0700, ewt from ufl.edu.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

PROJECT and DBLINK will co-exist for one GenBank release, until
Release 171.0 (April 15, 2009), at which point the PROJECT line
will be removed.

In its final state, our mock-up for CP000964 becomes:

LOCUS       CP000964             5641239 bp    DNA     circular BCT
24-SEP-2008
DEFINITION  Klebsiella pneumoniae 342, complete genome.
ACCESSION   CP000964
VERSION     CP000964.1  GI:206564770
DBLINK      Project:28471
            Trace Assembly Archive:123456
....
COMMENT     The source for the DNA and/or cells is:  Professor Eric W.
            Triplett, Chair, Department of Microbiology and Cell
Science,
            Institute of Food and Agricultural Sciences, University of
Florida,
            P.O. Box 110700, Gainesville, FL 32611-0700, ewt from ufl.edu.

In summary:

   PROJECT -> DBLINK

   'GenomeProject' -> 'Project'

   Additional linkages, such as Trace Assembly, added to DBLINK
   as-needed

   The PROJECT line will be removed as of April 15 2009.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

For those who process sequence data in NCBI's ASN.1 format:

The underlying representation for (Genome) Project IDs will remain
unchanged; there will be no changes to the ASN.1 User-object that 
is used to store them:

    user {
      type
        str "GenomeProjectsDB" ,
      data {
        {
          label
            str "ProjectID" ,
          data
            int 28471 } ,
        {
          label
            str "ParentID" ,
          data
            int 0 } } } ,

However, to support linkages to other resources, like the Trace
Assembly Archive, a new "DBLink" User-object will be introduced:

    user {
      type
        str "DBLink" ,
      data {
        {
          label
            str "Trace Assembly Archive" ,
          data
            ints { 123456 } } } }

As new types of linkages are established, they will be added to
the DBLink User-object, and displayed via the DBLINK linetype in
the GenBank flatfile format. 

There is a possibility that the GenomeProjectsDB User-object
might someday be incorporated into the new DBLink User-object.
But at the moment, there are no firm plans to do so.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=





More information about the Genbankb mailing list

Send comments to us at biosci-help [At] net.bio.net