F.Schaap at FYS.UNIMAAS.NL wrote:
> I am (still) searching a a reliable method for
> calculating divergence time
> between members of a multigene family, with the
> members having different
> evolutionary rates? Between orthologous sequences
> synonymous substitution
> rates are fairly constant (3*10-9 per site per year),
> while nonsynonymous
> substitution rates range from 0.021-0.76*10-9.
> Between paralogous sequences
> the number of synonymous substitutions per site
> can not be accurately estimated.
>> Who has any suggestions?
>> Regards, Frank
Divergence time cannot be estimated, given
the percent divergence, unless one has an estimated
rate. Two human immunodeficiency virus envelope genes
which are 5% divergent (95% identical) from one
another are estimated to have shared a common ancestor
about 5 years ago. We know the 1% per year rate, because
we have studied hundreds of HIV+ patients over times
ranging up to 12 years. On the other hand, if a chimpanzee
globin gene and a human globin gene are 97% identical,
our estimate of divergence time is orders of magnitude
greater, because primate genomic DNA mutates much more
slowly than the HIV RNA genome.
On top of the simple measure of % identity or
% divergence, one must take into consideration that
selection is acting to eliminate detrimental mutations
from the population, and that most mutations are
detrimental. A measure of the synonymous:non-synonymous
mutation ratio gives a rough estimate of the selective
presure involved. We might find a gene such as that
for ribosomal elongation factor 2, where there have
been something like 97 mutations between rat and
hamster, all of which are synonymous. Thus this gene
evolves much slower than a pseudogene, where both the
non-synonymous and synonymous mutations could survive
in the population.
On top of that, we must realize that once a
gene has reached mutational saturation, we can no
longer observe any further change. As the gene approaches
saturation, the length of time needed to observe a
1% increase in sequence divergence increases to
infinity. There is not a linear relationship between
time and % divergence, but a curve that varies from
gene to gene and organism to organism. In an organism
with very high G+C content (or any other bias in
nucleotide composition) the curve is steeper than
in an organism with nearly 25% of each of the 4
Looking at HIV as an example again, we
do not expect that after 60 years of 1% change
per year we will have 60% sequence divergence.
The saturation of mutatable sites might be
reached at 40% divergence, at which point there
will be as many back mutations as forward mutations.
i.e. if the common ancestor of one codon is
CTG; in 30 years we might observe CCG and TTG
which are only 33% identical to each other.
In 40 years these may go on to become TCG
and TCG respectively, now with 100% identity
to each other, and 33% identity to the true
common ancestor codon CTG. There is no such
thing as two sequences that are 100% divergent.
Any two random sequences aligned without any gaps
will match at about 25% of the sites, and if we
put in a few gaps we can do better than that.
On top of all this, we have a problem with
scoring insertions and deletions. We also have a
problem with scoring frameshifted regions, which then
undergo more rapid evolution.
I'm not sure where you got the idea that
orthologous genes evolve at different rates than
paralogous genes, or that the synonymous and
non-synonymous rates were fixed. The rate
and ratio will change over time in any one gene,
and will vary between genes and organisms. For
example the synonymous sites will be saturated
with mutations, at which point we can not measure
any further increase in divergence of those sites,
while the non-synonymous sites are still quite
|Brian T. Foley btf at t10.lanl.gov |
|HIV Database (505) 665-1970 |
|Los Alamos National Lab http://hiv-web.lanl.gov/index.html |
|Los Alamos, NM 87544 U.S.A. http://hiv-web.lanl.gov/~btf/home.html||____________________________________________________________________|