corelation between codon and protein secondary structures

Dennis Farr defarr at use.usit.net
Fri Jan 2 15:07:34 EST 1998

In 1989, I gathered up information on the secondary structure of a few proteins 
and the corresponding DNA code for those proteins, where I could find both sets of
data for a protein. I found around twenty proteins for which I had both data sets.

I then checked the correlation between codon and secondary structure type for 
each amino acid that can be represented by multiple codons. (There are 21 amino 
acids and 64 codons.) I found significant correlation coefficients for several 

This was an admittedly very small sample. I believe it would be quite easy to 
repeat my study using currently available datasets and come up with a much 
larger sample size with very little effort. 

I am seeking information on whether or not someone has done similar work, or has 
the resources to do so. I am no longer able to spend the kind of spare time I 
put into the first study, but would be glad to help out or provide additional 
details to anyone interested.

Caveats: I know the correlation I found is supposed to be impossible. I do not 
propose a direction for the arrow from cause to effect for the correlation I 
found, if it holds up under additional scrutiny. I am a computer programmer by 
trade, and a mathematician by training, not a molecular biologist. 

I believe the  phenomenon I seem to have discovered, or at least conjectured, 
should be investigated. The cost to do so is cheap. The impact of even a small 
correlation between structure and codon would be an improvement in protein 
structure prediction. Using codon rather than amino acid sequence as input to a 
protein structure prediction algorithm adds almost 2 bits of information per 
amino acid to the input. If the additional information is at all relevant, the 
resulting predicted structure should be 'better'.

