Greetings GenBank Users,
The PROJECT linetype allows a sequence record to be linked to
information
about the sequencing project that generated the data which ultimately
resulted in the record's submission to the International Nucleotide
Sequence
Database ( INSD : http://www.insdc.org ).
This complete bacterial GenBank record illustrates the use of PROJECT:
LOCUS CP000964 5641239 bp DNA circular BCT
24-SEP-2008
DEFINITION Klebsiella pneumoniae 342, complete genome.
ACCESSION CP000964
VERSION CP000964.1 GI:206564770
PROJECT GenomeProject:28471
When viewed on the web in NCBI's Entrez:Nucleotide, the record's project
identifier (28471) links to an entry in the Genome Project Database
(GPDB) :
http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=
Overview&uid=28471
where information about the sequencing center, the bacterium, and other
GenBank records (eg, plasmids) associated with the sequencing project
can
be obtained.
Since the introduction of PROJECT, the scope of the "Genome" Project
Database
has expanded, to include projects that are not necessarily targeted to
the sequencing of a complete genome.
In addition, there can be other resources which underlie an INSD
sequence
record, such as the Trace Assembly Archive at the NCBI:
http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi?cmd=show&f=t
ree&m=main&s=tree
Because of the expanded scope of the GPDB, and because we anticipate a
need
to link to more resources than just the GPDB, the PROJECT linetype is
going
to be replaced by a new linetype:
DBLINK
Further details about this change, and its timetable, follow.
Mark Cavanaugh
GenBank
NCBI/NLM/NIH/HHS
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-=
Modifications to linetypes can be disruptive, so the switch to DBLINK
will
occur in several stages.
Starting in October 2008, links to the NCBI Trace Assembly Archive will
be
supported via a line of text in the COMMENT section of sequence records.
Here is a mock-up, based on CP000964, which illustrates this change:
LOCUS CP000964 5641239 bp DNA circular BCT
24-SEP-2008
DEFINITION Klebsiella pneumoniae 342, complete genome.
ACCESSION CP000964
VERSION CP000964.1 GI:206564770
PROJECT GenomeProject:28471
....
COMMENT Trace Assembly Archive:123456
The source for the DNA and/or cells is: Professor Eric W.
Triplett, Chair, Department of Microbiology and Cell
Science,
Institute of Food and Agricultural Sciences, University of
Florida,
P.O. Box 110700, Gainesville, FL 32611-0700, ewt from ufl.edu.
Note: Use of the Trace Assembly Archive is still in its early stages, so
only a few records are expected to have these links in the short term.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-=
The new DBLINK linetype will be introduced as of GenBank Release 170.0
(February 15, 2009) .
The Genome Project ID and the Trace Assembly Archive ID will be
presented
via DBLINK, and the existing PROJECT line will continue to be displayed:
LOCUS CP000964 5641239 bp DNA circular BCT
24-SEP-2008
DEFINITION Klebsiella pneumoniae 342, complete genome.
ACCESSION CP000964
VERSION CP000964.1 GI:206564770
PROJECT GenomeProject:28471
DBLINK Project:28471
Trace Assembly Archive:123456
....
COMMENT The source for the DNA and/or cells is: Professor Eric W.
Triplett, Chair, Department of Microbiology and Cell
Science,
Institute of Food and Agricultural Sciences, University of
Florida,
P.O. Box 110700, Gainesville, FL 32611-0700, ewt from ufl.edu.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-=
PROJECT and DBLINK will co-exist for one GenBank release, until Release
171.0
(April 15, 2009), at which point the PROJECT line will be removed.
In its final state, our mock-up for CP000964 becomes:
LOCUS CP000964 5641239 bp DNA circular BCT
24-SEP-2008
DEFINITION Klebsiella pneumoniae 342, complete genome.
ACCESSION CP000964
VERSION CP000964.1 GI:206564770
DBLINK Project:28471
Trace Assembly Archive:123456
....
COMMENT The source for the DNA and/or cells is: Professor Eric W.
Triplett, Chair, Department of Microbiology and Cell
Science,
Institute of Food and Agricultural Sciences, University of
Florida,
P.O. Box 110700, Gainesville, FL 32611-0700, ewt from ufl.edu.
In summary:
PROJECT -> DBLINK
'GenomeProject' -> 'Project'
Additional linkages, such as Trace Assembly, will be added to
DBLINK as-needed
The PROJECT line will be removed as of April 15 2009.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-=
For those who process sequence data in NCBI's ASN.1 format:
The underlying representation for (Genome) Project IDs will remain
unchanged; there will be no changes to the ASN.1 User-object that
is used to store them:
user {
type
str "GenomeProjectsDB" ,
data {
{
label
str "ProjectID" ,
data
int 28471 } ,
{
label
str "ParentID" ,
data
int 0 } } } ,
However, to support linkages to other resources, like the Trace
Assembly Archive, a new "DBLink" User-object will be introduced:
user {
type
str "DBLink" ,
data {
{
label
str "Trace Assembly Archive" ,
data
ints { 123456 } } } }
As new types of linkages are established, they will be added to
the DBLink User-object, and displayed via the DBLINK linetype in
the GenBank flatfile format.
There is a possibility that the GenomeProjectsDB User-object
might someday be incorporated into the new DBLink User-object.
But at the moment, there are no firm plans to do so.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
=-=