IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

testing if seqs. are in same phylo tre

Doug Eernisse DEernisse at fullerton.edu
Tue Nov 14 02:50:21 EST 1995

In article <485nog$6ke at mark.ucdavis.edu>, ez005139 at chip.ucdavis.edu
(Daniel Mcgoldrick) wrote:

> Hello David,
>         If the sequences that you describe are proteins with the same function
> but not derived from the same ancestral gene then perhaps they would be 
> distributed on a different portion of the linkage map. I don't know if 
> it is feasible in your system or what the copy number is, but if you 
> perform a cross you can observe how the proteins segregate and assort and or 
> try a complementation test if you have recessive phenotypes.
> If the proteins fail to compliment in trans then they are the same gene 
> at the same locus, but if the proteins assort independently then they are at 
> different loci. If the proteins occupy the same locus then aren't they 
> homologous? Otherwise they have either duplicated or are not homologous 
> - perhaps recruited from another ancestral gene. Anyway, maybe you can take 
> advantage of something external to the protein's own sequence to attack 
> the question of homology.


You might agree that questions of gene homology do not strictly depend on
the degree of sequence "similarity." "G" and "C" are homologous in two
organisms if they occupy the same site in the same gene, even though
they are fundamentally dissimilar. It is further possible (albeit unlikely)
that a gene or peptide sequence could be 0% similar to a homologous gene
or peptide sequence in another genome. This is in contrast to
earlier tendencies of molecular biology to refer inappropriately
to "percent homology" of two sequences.

Having said that, we won't recognize the homology between a
"G" and "C" unless we have some other context to suggest that these
differing states may correspond due to ancestry. For example,
flanking sites may have an identical or highly similar distribution 
of states. It may be possible to estimate the likelihood that the
observed similarity or pattern of similarity in the sequence overall
is not due to chance. Rather than reinvent the wheel, I would second
someone's earlier suggestion to try the Gibb's sampling algorithm 
(Lawrence et al., '93) to assess the statistical significance  of 
blocks of amino acid property and pattern similarity (implemented by Greg 
Schuler in Macaw). I have a feeling, though, that you are more interested
in the situation where the sequences would likely fail such a test.

Daniel (above) has excellent suggestions for assessing the positional
correspondence between the genes in their respective genomes. Out of
curiosity, are your beetles comparable to flies in being amenable to 
linkage studies? 

There have also been attempts by chemists/physicists to claim genes 
are homologous despite low (< 20%) identity based on how the sequences 
conform to a model of higher-level structure (someone from EMBL research 
group, sorry can't remember who). These claims of homology were disputed
by others who thought that it was equally or more likely that there were 
only so many ways to fold or twist a string of amino acids (i.e., 
convergence of structural features). As far as I know, this is still
an ongoing debate as to whether all proteins can be traced back to
relatively few ancestral proteins or whether new proteins arise
frequently and might share similar structural properties with existing

Doug Eernisse <DEernisse at fullerton.edu>
Dept. Biological Science MH282
California State University
Fullerton, CA 92634

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net