> From: Namjin Chung <nc1 at acpub.duke.edu>
>> This is purely out of curiosity: why there [will] have been new sequence
> release since yeast genome sequencing had been complete and released? I
> understand updating the existing sequences, but what about new
>> This could be a stupid question, but it has haunted me very often.
it's actually a very good question, which many people are wondering
about, specially now, as yopu point out, that the sequence of
the genome is completed.
I think part of the misconception is that GenBank (and the other
nucleotide databases) only has one copy of everything. This is not
true. We act more like a repository of all sequences, even if this one
is already in the database.
And let me also add a few 'rules':
1) DDBJ/EMBL/GenBank only accept sequence data that was sequenced
by the group submitting the sequence. (ie you can't resubmit
something you constructed from what was in the database,
or your computer)
2) DDBJ/EMBL/GenBank will *not* accept consensus sequences, or
sequences derived from computer analysis.
3) DDBJ/EMBL/GenBank will accept most other sequences, even if these
already exist in the database. We should be viewed as a
repository, as opposed to a curated set.
This being said, we do take every measure to ensure that the data
deposited in the databases are of the highest quality, that the
annotations are correct, and that the features validate (ie the CDS
starts and ends with the translation shown). As simple as these task
may be, they are quite time consuming and do take mucjh of our human
(now for my personal view!) ================================
Along with the completion of the yeast genome, I think it will be of
extreme importance for the whole yeast community to work in concert
with the curated databases (SGD, YPD and MIPS) to ensure that the
'master' sequences, those derived from the genome sequencing effort,
are updated with the new gene names and product names once these are
known/discovered. It will be up to the sequencing labs to maintain and
update these records in a way for them to be useful to the yeast
community, as well (and this is very important) as all the other
communities (which are not using yeast specific tools, like SGD, YPD
and MIPS). As it turns out, SGD will be doing the updates for all the
'north american' sequences, and since there are only a few other
players (ie MIPS, Sanger center, Rikin in Japan), it should be very
easy to update the appropriate record as the work gets done. I would
suggest that updates messages sent to YEAST-CURATOR at GENOME.STANFORD.EDU
will reach the correct destination in a timely fashion.
(end of my personal view) ==================================
Nonetheless, there are quite a few cases where it is quite normal to
see a new accession number, here are a few examples:
o You sequence gene 'x' and wrote a paper about it, the journal you
published in may require that you give them an accession number, as a
proof taht you did sequence the DNA you said you sequenced.
(this exception being mentioned, I am sure you will be able
to write a paper about a published accession number and simply
refer to that number without having to resubmit it)
o You sequenced gene 'x' which turns out to be different
from what is in the database, and you wrote a paper about this,
you would get a new accession number.
o You wrote and published an article last year, which just came out
this month ... it will have a new accession number coming out
o You did a population/mutation study, and want to submit all
the sequences part of your dataset, then you would get an accession
number for each sequence.
o You sequenced it, and found a new function, gene name, product etc,
the simplest way to get it in is to get your own accession number. The
cutrators of the master sequences will harmonize your data with what is
present in databases, but meanwhile you have your accession number with
what you think is the correct gene name.
Speaking of gene names, I would urge you all to register them with SGD
(info from YEAST-CURATOR at GENOME.STANFORD.EDU if you need it) as they are
maintaining the official registry.
Ok, that's it for now ... sorry it was so long!
regards to all,
| B.F. Francis Ouellette | tel: (301) 496-2477 ext 247 |
| GenBank | fax: (301) 435-2433 |
| | NCBI/NLM/NIH Building 38A |
|francis at ncbi.nlm.nih.gov | Bethesda, MD 20894, USA |