robison1 at husc10.harvard.edu (Keith Robison)
>>As Monte Carlo simulation shows, about 24 of these ORF are expected
>>to be found by chance (in a random sequence of length 300,000 with
>>the same base frequencies as in yeast III chromosome: A=0.31, T=0.30,
>>G=0.19, and C=0.20).
>>Andrey.
> Curiosity: in the ChrIII paper, the claim was made that
>ORFs of >100 amino acids "have 0.2% probability of occurring by
>chance in S.cerevisiae DNA." Is this consisistent with the above
>estimate?
>Reference ginven (I haven't looked it up yet)
> Sharp & Crowe. Yeast (1991) 7:657-678.
>Keith Robison
I didn't read the paper, but the Monte Carlo estimate
can be simply supported by the following consideration:
The average ORF length is 64/3 = 21.3 (for equal base contens)
The expected number of ORFs (of any size including zero length)
is 100,000/21.3 = 4687.5
The probability of ORF of size L is p(1-p)^L where p=3/64
The probability of ORF of size L>=100 is (1-p)^100 = 0.008222163
The expected number of long ORFs (L>=100) is 4687.5*0.008222163 = 38.54
Considering the complementary chain doubles this amount, 77.
This is quite close to the Monte Carlo estimation, 24*2 = 48
The difference might be due to unequal usage of bases, boundary effects,
including/excluding stop codon to ORF etc.
So, the probability of finding this long ORFs is not too small
as it could seem at the first glance.
Andrey