thorsten burmester wrote:
>> Dear all,
>> I would like to have your comments on the following idea:
>> One often reads in the literature speculations about possible
> relationships of proteins with only some 15 to 20% identity scores.
> Recently, I thought that a possible method to evaluate the
> significance of such low similarity scores would be to randomise the
> sequences of these proteins by keeping the relative amino acid
> composition. If one does this several times (with one or both of the
> sequences), and re-align these randomised sequences with the same gap
> creation and gap length weights, in case this original alignment was
> significant, the new similarity/identity scores should be
> significantly lower. However, if the observed identity is just due to
> similar amino acid compositions, the scores should be similar.
>> My questions:
>> 1. Does this sound reasonable, and has anybody ever tried a similar
> approach before?
This Monte-Carlo strategy of evaluating alignment scores is being used
routinely in the GCG sequence alignment programs. Basically, the idea is
as you stated it. Once you make, say, 100 randomizations, you get a
normal distribution of scores (vs. the random) with a given mean, and
standard deviation. In my group, we use the rule-of-thumb that if the
non-random score is >6 S.D. above the random score, then there might be
some biological significance. This seems like a bit of a harsh rule, as
it is common wisdom the 2-3 standard deviations are enough for
statistical significance. However, it was empirically found (Science,
1991, D. Eisenberg, can't remember more than that, but should be enough
for a Medline search), that 6 S.D is a good rule. Less than that is
> 2. Do you know any program that can randomise an amino acid sequence
> as described above?
GCG GAP and BESTFIT do that.
email: idoerg at cc.huji.ac.il
More info: finger idoerg at cc.huji.ac.il