Andrej Sali (sali at tamika.rockefeller.edu) wrote:
: This is much too pesimistic. About one third of all currently known
: sequences are related to at least one currently known structure. About one
: half of these sequences are related at about 30% sequence identity or
: more. When sequence identity is about 40% or more, you can get a model by
: comparative modeling that is essentially equivalent to a medium resolution
I fully agree with Andrej's comments here, but I would like to point out
another less optimistic estimate of the fraction of sequences related
to known structures. The estimate that a third of currently known sequences
are related to known structures is quite true, but it should be realised
that the majority of these relationships are from a small number of large
(and rather over-represented) families. For example, of the 40000 or so
sequences in the current SWISSPROT, 665 of these sequences are globins.
A somewhat more conservative estimate of the probability of finding a
related 3-D structure for your particular protein may be obtained by
calculating the fraction of currently known protein _families_ for which
a related 3-D structure exists (i.e. the 665 globins above would be treated
as a single family). The result of this calculation is that only
1 in 25 (4%) of currently known families have related structures in the
structure database.
>---------------------------------------------------------------------------<
This message was written, produced and executively directed by Dr David Jones
Email: jones at bsm.bioc.ucl.ac.uk | JANET: jones at uk.ac.ucl.bioc.bsm
Address: Dept. of Biochemistry | Tel: +44 71 387 7050 x3879
and Molecular Biology, University | Fax: +44 71 380 7193
College, London WC1E 6BT, U.K. |
Disclaimer: STANDARD > KEYWORDS : OPINIONS MY OWN NOBODY ELSE'S WHATSOEVER