Mon Feb 6 08:20:41 EST 1995

Dear Yeastnetters,

Version 3.0 of the Yeast Protein Database (YPD) is now available from 
the QUEST Protein Database Center at the Cold Spring Harbor Laboratory.

YPD is a spreadsheet containing 10 categories of information (see below)
for each of the S. cerevisiae proteins of known sequence.  These include
sequences from genomic sequencing projects and all yeast GenBank sequences
through Jan. 30, 1995.  There are currently 3512 entries in YPD.

The files can be loaded by ftp from isis.cshl.org.  Look in directory
pub/yeast/YPD/version3.0.  Read the README and YPD.doc files first.  There
are files formatted for Excel on Macintosh, a tab-delimited file for
loading into any spreadsheet program, and a formatted text file suitable  
for use in a word-processor or text-editor.  The spreadsheet format allows 
one to search, sort, or select the yeast proteins based on any of the 

YPD can also be accessed online through the World Wide Web (WWW) server
maintained by Jerry Latter of the QUEST Protein Database Center.  At QUEST,
the YPD database is integrated with our 2D gel protein database for yeast. 
YPD data has also been integrated into the Stanford Saccharomyces Genome 
Database (SacchDB) maintained by Mike Cherry.  The WWW address for QUEST is 
http://siva.cshl.org, and for Stanford it is http://genome-www.stanford.edu. 

The 10 categories of data and the data columns within each category
are listed here. A more complete description of each field is found in the file
YPD.doc (ftp) or Release_notes (WWW).

   a. Gene Names
      YPD name
      ListA name
      Synonym list

   b. Calculated data
      Isoelectric point 
      Isoelectric point after adding 1 positive charge
      Isoelectric point after adding 1 negative charge
      Molecular weight
      Codon bias
   c. Accession numbers
      YEPD  (2D gel database)

   d. Subcellular localization and functional classification
      Major localization category
      Minor localization category
      Molecular environment
      Functional classification

   e. Post-translational modifications
      N- or O-linked glycosylation
      N-terminal modification (acetylation, myristoylation)
      C-terminal modification (farnesylation, geranylgeranylation, etc.)
      N-terminal precursor length

   f. Motifs
      Potential sites for phosphorylation by Cdc28 protein kinase
      Potential sites for phosphorylation by CKII protein kinase
      Potential sites for phosphorylation by PKA protein kinase
      Potential sites for N-linked glycosylation
      Potential transmembrane domains

   g. N- and C-terminal sequence fragments
      N-terminal sequence of precursor protein
      N-terminal sequence of mature protein
      C-terminal sequence of mature protein

   h. Length and amino acid composition
      Length of N-terminal precursor peptide 
         (including met removal if known)
      Length of mature protein in amino acids
      Number of residues for each of 20 amino acids
      Prediction of methionine removal
         (Disregard if length of N-terminal precursor peptide is >0).

   i. Protein name/description

   j. Reference numbers  
      (The reference list is provided in the file YPD_REFS).

DATA SUMMARY (for 3512 proteins in Release 3.0)

   Includes: GenBank entries through Jan. 30, 1995
             SWISS-PROT entries through Jan. 25, 1995
             PIR-International entries from Release 42.
   2899 Sequences from systematic sequencing projects
   1871 Proteins characterized through genetics or biochemistry.  Most of
         these have meaningful mnemonic names.
    524 Proteins known only by homology to characterized proteins.  These
         proteins have descriptions such as "Protein with similarity to".
   1117 Proteins of unknown function.  Some of these contain known motifs
         but no extensive homology to known proteins.  These proteins have
         descriptions starting with "Protein of unknown function".

   a.  Of the 1871 proteins known from genetic or biochemical studies:
         401 (21.4%) Nuclear
         377 (20.1%) Cytoplasmic
         240 (12.8%) Mitochondrial
          82  (4.4%) Plasma membrane
          56  (3.0%) Endoplasmic reticulum
          51  (2.7%) Unspecified membrane
          45  (2.4%) Cytoskeletal
          34  (1.8%) Extracellular or cell wall
          28  (1.5%) Vacuolar
          23  (1.2%) Vesicles of secretory pathway
          22  (1.2%) Golgi
          15  (0.8%) Peroxisomal
         497 (26.6%) Unknown

         Note:  The unknown category contains many metabolic and 
            housekeeping proteins that are likely to be cytoplasmic, but 
            definitive studies on their localization are difficult to find.

         N-terminal modifications
             76    (4.1%) Known to be N-terminally acetylated
             91    (4.9%) Known to be N-terminally unmodified
              8    (0.4%) Known to be N-myristylated
           1696   (90.6%) N-terminal status unknown

         C-terminal modifications
              9 (0.5%) Known to be farnesylated
             10 (0.5%) Known to be geranylgeranylated
              6 (0.3%) Known to have GPI anchors

            127 (6.8%) Known to be phosphorylated
             46 (2.5%) Known to be N-glycosylated only
             11 (0.6%) Known to be O-glycosylated only
              4 (0.2%) Known to be N- and O-glycosylated 
            223 (11.9%) Known to have N-terminal precursor peptide 
            193 (10.3%) Known to have N-met removal only
             70  (3.7%) Known to have no precursor peptide and no N-met 

   b.  Of the 2395 proteins known by genetics, biochemistry or homology
         By molecular environment
            318 (13.3%) Integral membrane
            299 (12.5%) DNA-associated (not necessarily direct DNA-binding)
            130  (5.4%) Ribosomal
             83  (3.5%) Peripheral membrane
             80  (3.3%) RNA-associated
             39  (1.6%) Protein synthesis factors
             16  (0.7%) Actin cytoskeleton-associated
             13  (0.5%) Tubulin cytoskeleton-associated

         By functional category
            106  (4.4%) Transcription factors
             81  (3.4%) Protein kinases
             61  (2.5%) Enzymes of amino acid metabolism   
             43  (1.8%) GTPases
             31  (1.3%) Heat shock
             30  (1.3%) tRNA synthetases
             27  (1.1%) Protein phosphatases
             24  (1.0%) Proteases other than proteasome subunits
             20  (0.8%) Conserved ATPase domain family (SEC18/PAS1/SUG1)
             16  (0.7%) Enzymes of glucose metabolism
             16  (0.7%) Serine-alanine-rich proteins (Srp1/Tip1p family)
             15  (0.6%) Cyclins
             14  (0.6%) Proteasome components
             13  (0.5%) ATP-binding cassette proteins
             10  (0.4%) Ubiquitin-conjugating enzymes
              9  (0.4%) GTPase-activating proteins
              8  (0.3%) Guanine nucleotide exchange factors

Obviously, these counts of protein by category are not necessarily 
indicative of the true abundance of yeast protein in each category because 
many proteins are still uncharacterized or uncategorized. YPD represents an
extensive, but not yet complete, review of the yeast literature. 

Best of luck with YPD.  For help or information concerning the QUEST
on-line services, contact Gerald Latter (latter at cshl.org).  For feedback on
YPD please contact me.  Corrections, comments, new data, etc. are always


Jim Garrels

James I. Garrels                        Tel (508) 922-1643
QUEST Protein Database Center           FAX (508) 922-3971
Cold Spring Harbor Laboratory           Email jg at cshl.org
1 Bungtown Rd.
Cold Spring Harbor, NY 11724

