IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

yeast sequence?

francis at NCBI.NLM.NIH.GOV francis at NCBI.NLM.NIH.GOV
Thu Jun 27 00:26:55 EST 1996


> On Wed, 26 Jun 1996, hazuka at cmgm.stanford.edu (hazuka) wrote:
> 
> > i need to access a yeast ORF sequence but can't seem to do it by
> > retrieve.  The access number for gi is U51921.  When i access this i get a
> > long cosmid sequence that does not actually include the sequence i need. 
> > if there is someone out there with knowledge of the yeast databases i
> > would appreciate some advice.  thanks in advance.  chris


> From ball at genome.stanford.edu Wed Jun 26 19:50:22 1996
>
> The reason you're having difficulty finding an ORF with the name U51921 
> is that it is an incomplete name.  There are 12 ORFs derived from the 
> GenBank sequence YSCL9362, all of which start with the letters U51921.  
> Each coding sequence is differentiated by a second number (for example, 
> in SGD, 
> U51921_1.cds is the first coding sequence in that larger GenBank 
> sequence, U51921_2.cds is the second, and so on).  If the ORF you're 
> looking for has U51921 in its name, it must be contained within the 
> larger YSCL9362 sequence.


Actually, the _1, _2 etc nomenclature to identify a given ORFs is
pretty poor, because it looses its meaning when a CDS is added or
removed within a cosmid or large piece of sequence (and let me assure
you, it happens).

Chris, you started with "U51921", that's an accession number for a
yeast cosmid from yeast chrromosome XII

LOCUS       YSCL9362    29427 bp    DNA             PLN       21-MAR-1996
DEFINITION  Saccharomyces cerevisiae chromosome XII cosmid 9362.
ACCESSION   U51921
NID         g1234842

The NID (nucleotide identifier) is also known as a 'gi number'.  
The gi number will stay the same unless the sequence changes 
(even at a singlr nucleotide).  The Accession Number will stay 
the same, even if the sequence changes.  If the submiter
of the record updates the reference, or a gene name on the record, 
only the date on the LOCUS line changes.  The gi stays the same,
and the accession numbers stays the same.

So what I just said for the nucleotides (NID, aka gi) is also 
valid for the protein sequences, and here we talk of PID or gi
as well.  These are shown as:


     CDS             complement(13739..14827)
                     /gene="ASP3"
                     /note="L9632.8"
                     /codon_start=1
                     /db_xref="SGD:L0000130"
                     /product="L-Asparaginase II (Swiss Prot. accession number
                     P11163).  Note that this ORF is longer at the 3' end than
                     ASP3 in GenBank accession number J03926.  This gene is
                     included in four 3.6 kb repeats present in this cosmid at
                     the junction with rDNA."
                     /db_xref="PID:g1234850"
                     /translation="MRSLNTLLLSLFVAMSSGAPLLKIREEKNSSLPSIKIFGTGGTI
                     ASKGSTSATTAGYSVGLTVNDLIEAVPSLAEKANLDYLQVSNVGSNSLNYTHLIPLYH
                     GISEALASDDYAGAVVTHGTDTMEETAFFLDLTINSEKPVCIAGAMRPATATSADGPM
                     NLYQAVSIAASEKSLGRGTMITLNDRIASGFWTTKMNANSLDTFRADEQGYLGYFSND
                     DVEFYYPPVKPNGWQFFDISNLTDPSEIPEVIILYSYQGLNPELIVKAVKDLGAKGIV
                     LAGSGAGSWTATGSIVNEQLYEEYGIPIVHSRRTADGTVPPDDAPEYAIGSGYLNPQK
                     SRILLQLCLYSGYGMDQIRSVFSGVYGG"

So the PID (protein identifier) here is shown as:

                     /db_xref="PID:g1234850"

The 'g' in front of the PID is from GenBank.  There are also PID which
start with 'e' which are issued from our colleagues at EMBL/EBI.  When
our Japanese colleaugues (DDBJ) use these PID/NID, you will see them
with 'd'.   For more on these, you can see the collab WWW page at
NCBI:

http://www.ncbi.nlm.nih.gov/collab/

There you will see (under db_xref) all the valid databases used for
db_xref.

PID is one, and so is SGD (Saccharomyce Genome Database, also see
example above) where specific CDS are linked (via the WWW) or
labelled (such as: /db_xref="SGD:L0000130") to the SGD records.

So to get back to your original question:  

> > i need to access a yeast ORF sequence but can't seem to do it by
> > retrieve.  The access number for gi is U51921. 

does not identify an ORF, but a large DNA sequence which contain 12 CDS
(Coding Sequences).  How did you come accross U51921, and where/how did
you look for the protein sequence you wanted?


In Entrez (http://www3.ncbi.nlm.nih.gov/Entrez/) you will find 
an easy and simple way to find all of these protein sequences 
as well as graphical views of the proteins, DNA sequences or entire
chromosomes ... 

this last item is only true for those chromosomes for which we have
been completed, released and the annotated DNA sequences is in
GenBank/EMBL/DDBJ.  There are presently a few unfinished chromosome at
MIPS.

We are patiently waiting for the fruit of their hard work.

All the best,

francis

--
| B.F. Francis Ouellette  
| GenBank
|
| francis at ncbi.nlm.nih.gov   



More information about the Yeast mailing list

Send comments to us at biosci-help [At] net.bio.net