We have been using RFLPs of mitochondrial DNA to look at mosquito populations
for a number of years. We have looked at about thirty species, thousands of
individuals, and twenty or so enzymes; so this post is backed up by a large
amount of data.
Mosquito mtDNA is a 15 to 16 kb circle that is 75% to 80% A+T (we have
sequenced one species and someone else sequenced another species). Therefore,
we can predict how often enzymes with different types of sites should cut,
assuming a random distribution of hexamers (let's consider only enzymes with
unambiguous sites). On a practical note, we want enzymes that cut around 5
times: less don't give enough information and more are impossible to map.
Enzymes with six base sites containing 6 A+T should cut many times.
Enzymes with six base sites containing 4 A+T should cut about four times.
Enzymes with six base sites containing 0 or 2 A+T should usually not cut.
Enzymes with four base sites containing 2 or 4 A+T should cut many times.
Enzymes with four base sites containing 0 A+T should cut about three times.
When we screen for enzymes that cut around five times, we end up with 6 base
cutters with 4 A+T and 4 base cutters with 0 A+T, as predicted. The mean
number of sites for both of these classes is around four. Six base cutters
with less A+T don't cut enough and the other types cut too many times. So far
However, when we screen wild populations, we find much more variability in
the four base sites than in the six base sites. In one example, in screening
several hundred individuals of a sibling species complex consisting of five
species, we found more than 30 Hpa II (CCGG) variants. These same individuals
were screened with five six base enzymes, and we found about twenty variants
total. So although the mean number of sites per enzyme was about the same,
the variation was about ten times greater for the four base cutter. This has
held up with several combinations of enzymes in many different species.
This makes no sense to me. If anything, I would expect to see more variation
with six base cutters, since there are more ways to create or lose a site. Am
I missing something obvious in the statistical analysis?