In article <6352FR831.6FBV101996 at manifesto.Nihonkai.jp>,
Ivan Zimogorov <ivan at manifesto.nihonkai.jp> wrote:
>Well I came up with a novel way to test heterogeneity of gene frequencies
>between samples of haploid individuals from several populations (sounds
>like Karl Pearson, so what? :-)). The question is how to compare other
>methods (like X^2, Fisher's exact, conditional X^2 etc) with it. I
>simulated genetic drift by the coalescence process, then sampled from
>diverged populations, repeated that process many times to get empirical
>powers for the different degrees of divergence.. Therefore, under the
>neutral scenario I have an idea how things behave.
As you seem to be aware, heterogeneity chi-squares are testing whether the gene
frequencies are different at all, and that difference could be due to genetic
>Now, how do people implement selection? Motoo Kimura's treatment of
>random intensity selection when there is no dominance, gives U-shaped
>allele frequency distribution. Is it theoretically valid to generate
>steady-state distribution of allele frequencies for each population from
>a Beta(a<1, b<1) density to approximate Kimura's results?
Actually Kimura's 1954 work was not correct about there being an equilibrium
distribution in this case. The U-shaped distribution becomes ever more
extreme. It is not approximated by any particular Beta distribution.
You can verify this by putting the process on a log(p/(1-p)) scale and
seeing that it becomes a normal distribution that becomes flatter and
Of course you can use the diffusion equation solutions for lots of other
cases such as deleterious mutation, overdominance with mutation, etc. The
problem is that there are a lot of different possible distributions.
>should allele frequency profile follow for a non-equilibrium situation? I
>could not find if there is any theory done for that case. I'd appreciate
>any references, and hints.
There is a paper in press that is relevant. It is due out in Genetics
soon (next few months) and makes major progress on coalescents with
selection. As it is not by me or my lab, I can't give a further description
of it, but watch for it.
Of course you could always do an old-fashioned non-coalescent simulation
which simulates the whole population, and generate gene frequencies in
the subpopulations that way. Note that you can simulate a smaller population
by noting that as N --> infinity with s, u, m --> 0 such that Ns, Nu, and Nm
stay constant one approaches a diffusion limit. Since the diffusion limit
is a good approximation to the original population which has a finite N,
you can also approximate that case by using any other one which has the
identical diffusion limit. Thus a case with N=10^6, s = 0.00001,
u = 10^(-6), and t = 10^6 generations of divergence, is reasonably well
approximated by one with N = 100, s= 0.1, u = 0.01, and t = 100 generations
of divergence. Thus you can economically simulate it.
Joe Felsenstein joe at genetics.washington.edu (IP No. 220.127.116.11)
Dept. of Genetics, Univ. of Washington, Box 357360, Seattle, WA 98195-7360 USA