"James McInerney" wrote in message
> I have a question about selection on genes in HIV (but probably anywhere).
> In some HIV genes there is often a great excess of replacement
substitutions
> over silent substitutions. In the past we would say that this meant that
> there was a positive selection event involved. However, if there is no
> selective difference between substitutions that occur in synonymous and
> non-synonymous sites then we would see about three times as many
> substitutions that are replacement than silent.
> Am I correct?
Yes. There are counting methods to account for the uneven probability of
obtaining a synonymous substitution under a neutral model (no selection).
> Does this mean that the value of 1, which is the arbitrary value chosen to
> distinguish between positive and negative selection for Ka:Ks ratio
> estimates might not be such a good cutoff point?
In particular, a counting method developed by Nei and Gojobori provides a
neutral model to work with. In that paper they use dS and dN, which is
equivalent in meaning to Ks and Ka. The value of 1 is reasonable given a
simple null model. There are variants of this counting method, like
factoring in the transition/transversion bias, that would "change" the null
model. It would be appropriate to calculate the values of the two variables
and then conduct a z-test (or fisher test) to see if the pattern of
substitution exceeds that expected by the "null model". Hopefully, a test
of significance can smooth out some of the inherent error in setting the
ratio of the two variables to 1. Also, there are other tests to detect
selection.
In the case of RNA viruses, the mutation rate is higher than in other
organisms. So if directional selection is acting upon a protein, there is a
short time to make a snapshot of the event. At least in the time frames
expected here, the number of synonymous substitutions will overwhelm the
number of nonsynonymous substitutions. So failure to detect positive (or
negative) selection may be due to divergence among compared sequences.
Bob
---