> On Wed, 26 Jun 1996, hazuka at cmgm.stanford.edu (hazuka) wrote:
>> > i need to access a yeast ORF sequence but can't seem to do it by
> > retrieve. The access number for gi is U51921. When i access this i get a
> > long cosmid sequence that does not actually include the sequence i need.
> > if there is someone out there with knowledge of the yeast databases i
> > would appreciate some advice. thanks in advance. chris
> From ball at genome.stanford.edu Wed Jun 26 19:50:22 1996
>> The reason you're having difficulty finding an ORF with the name U51921
> is that it is an incomplete name. There are 12 ORFs derived from the
> GenBank sequence YSCL9362, all of which start with the letters U51921.
> Each coding sequence is differentiated by a second number (for example,
> in SGD,
> U51921_1.cds is the first coding sequence in that larger GenBank
> sequence, U51921_2.cds is the second, and so on). If the ORF you're
> looking for has U51921 in its name, it must be contained within the
> larger YSCL9362 sequence.
Actually, the _1, _2 etc nomenclature to identify a given ORFs is
pretty poor, because it looses its meaning when a CDS is added or
removed within a cosmid or large piece of sequence (and let me assure
you, it happens).
Chris, you started with "U51921", that's an accession number for a
yeast cosmid from yeast chrromosome XII
LOCUS YSCL9362 29427 bp DNA PLN 21-MAR-1996
DEFINITION Saccharomyces cerevisiae chromosome XII cosmid 9362.
The NID (nucleotide identifier) is also known as a 'gi number'.
The gi number will stay the same unless the sequence changes
(even at a singlr nucleotide). The Accession Number will stay
the same, even if the sequence changes. If the submiter
of the record updates the reference, or a gene name on the record,
only the date on the LOCUS line changes. The gi stays the same,
and the accession numbers stays the same.
So what I just said for the nucleotides (NID, aka gi) is also
valid for the protein sequences, and here we talk of PID or gi
as well. These are shown as:
/product="L-Asparaginase II (Swiss Prot. accession number
P11163). Note that this ORF is longer at the 3' end than
ASP3 in GenBank accession number J03926. This gene is
included in four 3.6 kb repeats present in this cosmid at
the junction with rDNA."
So the PID (protein identifier) here is shown as:
The 'g' in front of the PID is from GenBank. There are also PID which
start with 'e' which are issued from our colleagues at EMBL/EBI. When
our Japanese colleaugues (DDBJ) use these PID/NID, you will see them
with 'd'. For more on these, you can see the collab WWW page at
There you will see (under db_xref) all the valid databases used for
PID is one, and so is SGD (Saccharomyce Genome Database, also see
example above) where specific CDS are linked (via the WWW) or
labelled (such as: /db_xref="SGD:L0000130") to the SGD records.
So to get back to your original question:
> > i need to access a yeast ORF sequence but can't seem to do it by
> > retrieve. The access number for gi is U51921.
does not identify an ORF, but a large DNA sequence which contain 12 CDS
(Coding Sequences). How did you come accross U51921, and where/how did
you look for the protein sequence you wanted?
In Entrez (http://www3.ncbi.nlm.nih.gov/Entrez/) you will find
an easy and simple way to find all of these protein sequences
as well as graphical views of the proteins, DNA sequences or entire
this last item is only true for those chromosomes for which we have
been completed, released and the annotated DNA sequences is in
GenBank/EMBL/DDBJ. There are presently a few unfinished chromosome at
We are patiently waiting for the fruit of their hard work.
All the best,
| B.F. Francis Ouellette
||francis at ncbi.nlm.nih.gov