3D-proteine structure according to available DNA-sequence

David Jones jones at bsm.bioc.ucl.ac.uk
Mon Jan 30 17:41:40 EST 1995

Andrej Sali (sali at tamika.rockefeller.edu) wrote:
: This is much too pesimistic. About one third of all currently known  
: sequences are related to at least one currently known structure. About one  
: half of these sequences are related at about 30% sequence identity or  
: more. When sequence identity is about 40% or more, you can get a model by  
: comparative modeling that is essentially equivalent to a medium resolution  

I fully agree with Andrej's comments here, but I would like to point out
another less optimistic estimate of the fraction of sequences related
to known structures. The estimate that a third of currently known sequences
are related to known structures is quite true, but it should be realised
that the majority of these relationships are from a small number of large
(and rather over-represented) families. For example, of the 40000 or so
sequences in the current SWISSPROT, 665 of these sequences are globins.
A somewhat more conservative estimate of the probability of finding a
related 3-D structure for your particular protein may be obtained by
calculating the fraction of currently known protein _families_ for which
a related 3-D structure exists (i.e. the 665 globins above would be treated
as a single family). The result of this calculation is that only
1 in 25 (4%) of currently known families have related structures in the
structure database.

This message was written, produced and executively directed by Dr David Jones
Email: jones at bsm.bioc.ucl.ac.uk         |     JANET: jones at uk.ac.ucl.bioc.bsm
Address: Dept. of Biochemistry          |       Tel: +44 71 387 7050 x3879
and Molecular Biology, University       |       Fax: +44 71 380 7193
College, London WC1E 6BT, U.K.          |

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net