The following information was distributed by: Jim Freeman
>The BioMolecular Engineering Research Center has identified a large
>set of ORFs in S. cerevisiae which are likely to be artifactual.
> NOTE 1: There are 6275 ORF's originally identified by MIPS in
>yeast genome that theoretically could encode proteins longer than 99
>amino acid residues (Goffeau et al., Science, 1996, Vol. 274, page
>546-567). About 45% of them have been functionally annotated,
>experimentally or matching previously assigned proteins. The rest of
>the ORF's remain "hypothetical", and many of them are not real
>ORFs. Our statistics based on sequence length distribution alone
>predicts that over 400 of these hypothetical ORFs between 100 and 110
>amino acid residues long are not likely to code for proteins (Das et
>al., Nature, 1997, Vol. 385, page 29-30).
>In addition, when two unannotated ORFs overlap one should anticipate
>that one or the other is likely an artifact, although which is
>generally indeterminate. However when one, or both, of the pairs has a
>length of between 100 and 110 there is a strong case for its
>artifactual nature.
The existence of questionable ORFs is well known and a detailed, excellent
publication can be found: Termier M and Kalogeropoulos A., Yeast 1996, Vol. 12,
page 369-384.
The following facts were not taken into account:
1) The paper form Goffeau et al. says that 6275 ORFs were extracted,
from these extracted ORFs 390 had been assigned as questionable ORFs. This
makes 5885 hypothetical proteins in yeast.
(Goffeau et al., Science, 1996, Vol. 274, page 546-567)
2) The genome overview at the MIPS WWW site is constantly updated and can be
accessed at: http://speedy.mips.biochem.mpg.de/mips/yeast/inventy.html
At the moment 6287 ORFs are extracted and 434 are annotated as questionable
ORFs according to the criteria provided by Termier and Kalogeropoulos. These
have to be substracted. The result are 5853 hypothetical proteins.
3) At MIPS a questionable ORF is defined by a combination of the following
attributes: low cai value, partial overlap to a longer or known ORF, no
similarity to other ORFs.
Nevertheless, we have scrutinized the information provided by J. Freeman.
We have inspected the list of hypothetical ORFs and came to the result
that most of the ORFs mentioned had been previously assigned as
questionable ORFS with the following 50 exceptions:
29 ORFs are annotated at MIPS as hypothetical proteins, these have been checked
and for 22 of these we had to make corrections: 3 of the corrections were
located on chromosome X and 18 were located on chromsome XV (in fact our
information for chromosome XV was incomplete):
YJL009W hypothetical protein (has to be annotated as questionable ORF)
YJL119C hypothetical protein (has to be annotated as questionable ORF)
YJL135W hypothetical protein (has to be annotated as questionable ORF)
YOL035C hypothetical protein (has to be annotated as questionable ORF)
YOL050C hypothetical protein (has to be annotated as questionable ORF)
YOL099C hypothetical protein (has to be annotated as questionable ORF)
YOL150C hypothetical protein (has to be annotated as questionable ORF)
YOR105W hypothetical protein (has to be annotated as questionable ORF)
YOR121C hypothetical protein (has to be annotated as questionable ORF)
YOR135C hypothetical protein (has to be annotated as questionable ORF)
YOR139C hypothetical protein (has to be annotated as questionable ORF)
YOR146W hypothetical protein (has to be annotated as questionable ORF)
YOR169C hypothetical protein (has to be annotated as questionable ORF)
YOR170W hypothetical protein (has to be annotated as questionable ORF)
YOR218C hypothetical protein (has to be annotated as questionable ORF)
YOR225W hypothetical protein (has to be annotated as questionable ORF)
YOR282W hypothetical protein (has to be annotated as questionable ORF)
YOR300W hypothetical protein (has to be annotated as questionable ORF)
YOR331C hypothetical protein (has to be annotated as questionable ORF)
YOR333C hypothetical protein (has to be annotated as questionable ORF)
YOR345C hypothetical protein (has to be annotated as questionable ORF)
YPL072W hypothetical protein (has to be annotated as questionable ORF)
The 7 remaining hypothetical proteins have not been changed due to our
definition of a questionable ORF:
YAR030C hypothetical protein (113 aa)
YBL018C hypothetical protein (133 aa)
YGR290W hypothetical protein (147 aa, helix-loop-helix motif)
YHL005C hypothetical protein (130 aa)
YLR236C hypothetical protein (107 aa)
YMR151W hypothetical protein = YIM2
YNL303W hypothetical protein (115 aa)
For 16 ORFs we have found other classifications because of similarities to
other proteins, part of them are known TY proteins:
YAL004W strong similarity to A.klebsiana glutamate dehydrogenase
YAR043c not extracted at MIPS because internal to other bigger ORF
YAR052c not extracted at MIPS because internal to other bigger ORF
YAR074c not extracted at MIPS because internal to other bigger ORF
YAR073W FUN63 strong similarity to IMP dehydrogenases
YAR075W strong similarity to IMP dehydrogenases
YBL112C strong similarity to subtelomeric encoded proteins
YCL013w PART of BUD3, the sequence was corrected in September 1996
YCR013C weak similarity to M.leprae B1496_F1_41 protein
YDL228C similarity to A.klebsiana glutamate dehydrogenase
YER097W weak similarity to ribosomal S3 proteins
YFL002w-A TY2B protein
YFL065C strong similarity to subtelomeric encoded proteins
YGR181W similarity to YHR004c-a
YHR218W-A strong similarity to subtelomeric encoded proteins
YIL080W Ty3-2 orf C fragment
YIL082w-A TY3B protein
YIL175W putative pseudogene
YKL199C might be C-terminal part of YKL198c due to a frameshift error
YLL037W weak similarity to human platelet-activating factor receptor
YNL203C weak similarity to B.subtilis CDP-diacylglycerol--serine
O-phosphatidyltransferase
YPR203W strong similarity to subtelomeric encoded proteins
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* | E-mail: *
* Dr. Kaj Albermann | albermann at mips.embnet.org *
* Dr. Jean Hani | hani at mips.embnet.org *
* Dr. H.W. Mewes | mewes at mips.embnet.org *
* | *
* MIPS | Tel: *
* am Max-Planck-Institut fuer Biochemie | +49 89 8578 2659 *
* Am Klopferspitz 18a | *
* D-82152 Martinsried | Fax: *
* Germany | +49 89 8578 2655 *
*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*
* Internet connectivity to MIPS - WWW: WWW.MIPS.BIOCHEM.MPG.DE *
* - FTP: FTP.MIPS.EMBNET.ORG *
* - Email: username at MIPS.EMBNET.ORG *
*----------------------------------------------------------------------------*