To: Yeast community
From: GeneQuiz Team
Re: This note explains how to access the results of a computer
sequence analysis of yeast protein sequences.
Reply to: GeneQuiz at embl-heidelberg.de
ANALYSIS OF PUBLICLY AVAILABLE YEAST PROTEIN SEQUENCES
A computer sequence analysis of more than 5000 yeast protein
sequences was performed March 4-7, 1996, using the most recent
public sequence databases. The automated analysis was done using the
GeneQuiz software developed by current and former EMBL scientists on
a 64 processor powerCHALLENGE array at the Silicon Graphics
Supercomputing Technology Center in Cortaillod, Switzerland. The
results of the analysis are available on the Internet and can be
viewed using WWW browsers such as Netscape or Mosaic at the
following Web sites.
Overview of the March 4-7 runs:
Direct access to the yeast results:
Access to GeneQuiz summaries of mycoplasma genitalium and
haemophilus influenzae as well as yeast:
... enjoy !
Georg Casari, Reinhard Schneider, Antoine de Daruvar, Chris Sander
13-March-96, EMBL Heidelberg-Cambridge
READ ON FOR MORE INFORMATION AND USER GUIDE TO THE GENEQUIZ SERVER
For proteins with an informative functional annotation already in
the public sequence databases, the GeneQuiz summary simply mirrors
the database functional assignments. For proteins of unknown
function, the analysis aims at the prediction of protein function
and 3D-structure by homology, as deduced from sequence similarity.
Homology information is labeled as 'clear', 'tentative' or only
'marginal', depending on the level of sequence similarity. The
corresponding functional assignments are those of the homologues in
the database and provide a hypothesis (or prediction) regarding the
function of the search protein. Note that with increasing
evolutionary distance the function of the search protein may differ
significantly from that of the homologues.
The results of this large scale sequence analysis are a distillation
of massive amounts of data. In order to give efficient response to
your questions, we provide the results through queries, enabling you
to selectsets of proteins of particular interest to you. Criteria
for selection can be a search string, a gene name, a chromosome
Examples of queries available through the WWW pages:
* What is the closest homologue and the number of homologues in
the databases for RA51_YEAST (chr5) ?
Answer: RA51_human, 24 homologues.
* Which proteins code for AMD genes ?
Answer: AMDM_YEAST (chr13), AMDY_YEAST (chr4).
* Which proteins on chromosome 3 have a homolog of known 3D
structure at the level of clear homology ?
Answer: KCC4_YEAST, LEU3_YEAST, CISZ_YEAST (3D model available),
* How many proteins on chromosome 14 have "KINASE" in the derived
annotation, i.e., are homologous to a kinase ?
Answer: 9, of which 3 are probable protein kinases (KNOS_YEAST,
* Which proteins located on chromosome 5 having a tentatively
predicted function ?
Answer: left as an exercise to the reader ...
Note that some proteins may appear more than once in the lists you
obtain. This occurs when two databases contain the same protein in
slightly different form (example: swissprot:amdy_yeast is 99.8%
identical to trembl: sc8419_9). The set of sequences analyzed here
has 6613 sequences, corresponding to probably more than 5000 unique
proteins (the entire yeast genome is estimated to have about 6200
unique proteins). Although this set has been cleaned to remove
multiple entries (100% identity), the more subtle redundancies will
remain until the complete genome is released (announced for spring
1996) and the duplications in the genome can be reliably
distinguished from duplications in the databases. See
http://genecrunch.sgi.ch/nryeast.html for a more detailed
description of the sequence set analyzed. Note also that some
sequences have composition biased regions marked as 'XXXX' in the
alignment reports, i.e., 'X' does not mean 'unknown' but 'removed
for purposes of improving search selectivity'.
The GeneQuiz yeast server is likely to evolve and expand in
functionality in response to your feedback and as the result of new
developments in the GeneQuiz methodology.
We hope that the information accessible from the GeneQuiz yeast
server will be useful in planning future experiments on gene
Let us know what you think.
GeneQuiz at embl-heidelberg.de
We are indebted to all scientists world-wide who have made
sequences and other experimental results publicly available,
especially all those involved in the yeast genome sequencing
projects and to the staff of database centers such as EMBL-EBI,
GenBank, DDBJ, Swissprot, PIR, MIPS, YPD, and SGD.
GeneQuiz is a collaborative effort between scientists at
EMBL-Heidelberg and EMBL-EBI (the European Bioinformatics Institute
near Cambridge) and former EMBL scientists at CNB Madrid, MDC
Berlin, and SRI Menlo Park: Georg Casari, Antoine de Daruvar,
Reinhard Schneider, Michael Scharf, Peer Bork, Miguel Andrade,
Javier Tamames, Alfonso Valencia, Christos Ouzounis and Chris
Sander. Special thanks to: Thure Etzold, Burkhard Rost and Gerrit
The GeneCrunch team at the Silicon Graphics Supercomputing
Technology Center at Cortaillod, Switzerland included: Pam Bremer,
Michael Schlenkrich, Richard Mercille, Horst Vollhardt, Ron Larson,
Christophe Desperrier, Ove Hansen, Oliver Enzmann.
Thanks to all :-))