IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

multigene family

Brian Foley btf at t10.lanl.gov
Fri Feb 14 17:23:53 EST 1997

F.Schaap at FYS.UNIMAAS.NL wrote:
> I am (still) searching a a reliable method for 
> calculating divergence time
> between members of a multigene family, with the 
> members having different
> evolutionary rates? Between orthologous sequences 
> synonymous substitution
> rates are fairly constant (3*10-9 per site per year), 
> while nonsynonymous
> substitution rates range from 0.021-0.76*10-9. 
> Between paralogous sequences
> the number of synonymous substitutions per site 
> can not be accurately estimated.
> Who has any suggestions?
> Regards, Frank

	Divergence time cannot be estimated, given 
the percent divergence, unless one has an estimated 
rate.  Two human immunodeficiency virus envelope genes
which are 5% divergent (95% identical) from one
another are estimated to have shared a common ancestor
about 5 years ago.  We know the 1% per year rate, because
we have studied hundreds of HIV+ patients over times
ranging up to 12 years.  On the other hand, if a chimpanzee
globin gene and a human globin gene are 97% identical,
our estimate of divergence time is orders of magnitude
greater, because primate genomic DNA mutates much more 
slowly than the HIV RNA genome.
	On top of the simple measure of % identity or
% divergence, one must take into consideration that
selection is acting to eliminate detrimental mutations
from the population, and that most mutations are
detrimental.  A measure of the synonymous:non-synonymous
mutation ratio gives a rough estimate of the selective
presure involved.  We might find a gene such as that
for ribosomal elongation factor 2, where there have
been something like 97 mutations between rat and 
hamster, all of which are synonymous.  Thus this gene
evolves much slower than a pseudogene, where both the
non-synonymous and synonymous mutations could survive
in the population.
	On top of that, we must realize that once a
gene has reached mutational saturation, we can no
longer observe any further change.  As the gene approaches
saturation, the length of time needed to observe a
1% increase in sequence divergence increases to
infinity.  There is not a linear relationship between
time and % divergence, but a curve that varies from
gene to gene and organism to organism.  In an organism
with very high G+C content (or any other bias in 
nucleotide composition) the curve is steeper than
in an organism with nearly 25% of each of the 4
	Looking at HIV as an example again, we
do not expect that after 60 years of 1% change
per year we will have 60% sequence divergence.
The saturation of mutatable sites might be
reached at 40% divergence, at which point there
will be as many back mutations as forward mutations.
i.e. if the common ancestor of one codon is
CTG; in 30 years we might observe CCG and TTG
which are only 33% identical to each other.
In 40 years these may go on to become TCG
and TCG respectively, now with 100% identity 
to each other, and 33% identity to the true
common ancestor codon CTG.  There is no such
thing as two sequences that are 100% divergent.
Any two random sequences aligned without any gaps
will match at about 25% of the sites, and if we
put in a few gaps we can do better than that.
	On top of all this, we have a problem with 
scoring insertions and deletions.  We also have a
problem with scoring frameshifted regions, which then 
undergo more rapid evolution.  
	I'm not sure where you got the idea that
orthologous genes evolve at different rates than
paralogous genes, or that the synonymous and 
non-synonymous rates were fixed.  The rate
and ratio will change over time in any one gene,
and will vary between genes and organisms.  For
example the synonymous sites will be saturated
with mutations, at which point we can not measure
any further increase in divergence of those sites,
while the non-synonymous sites are still quite
|Brian T. Foley                btf at t10.lanl.gov                      |
|HIV Database                  (505) 665-1970                        |
|Los Alamos National Lab       http://hiv-web.lanl.gov/index.html    |
|Los Alamos, NM 87544  U.S.A.  http://hiv-web.lanl.gov/~btf/home.html|

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net