sali at tamika.rockefeller.edu (Andrej Sali) wrote:
>> In article <3gbenc$and at mserv1.dl.ac.uk> <bionet at cgmvax.cgm.cnrs-gif.fr>
> writes:
> > >
> > >This is much too pesimistic. About one third of all currently known
> > >sequences are related to at least one currently known structure.
> > >
> >
> > ??? You really mean that 15,000 sequences from Swissprot (for example)
> are
> > related to at least one entry in the PDB ? I'd be interested in getting
> a
> > reference on this subject.
> >
> > Cheers,
> >
> > Jean-Loup
> >
> >
> > ---------------------------------------------------------------------
> > Jean-Loup Risler Tel: (33 1) 69 82 31 34
> > CNRS Fax: (33 1) 69 07 49 73
> > Centre de Genetique Moleculaire Email:
>risler at cgmvax.cgm.cnrs-gif.fr> > 91198 Gif sur Yvette Cedex France
> > ---------------------------------------------------------------------
> >
> >
>> I have not meant exactly what you said, because I wanted to be
> conservative, but it is close enough. Many of the actually related
> sequence-structure pairs cannot be detected as such (yet) because the
> usual sequence alignments and even threading techniques are not perfect
> (yet). You can get the hard numbers in the very nice paper by Orengo,
> Jones, Thornton, Nature 372, pp 631, 1994. Sander, Holm et al also did
> some nice work along these lines.
>> Andrej
>> P.S. My own hard number related to this argument is that about one third
> of currently deposited PDB structures have significant sequence similarity
> to at least one already deposited PDB structure (>30%).
This of course assumes that the currently determined PDB structures
are a random selection of protein sequences, which I think is
unlikely, but someone can correct me if he/she has evidence.
(I have no idea either how robust this sort of extrapolation would be
either as you started to deviate from a Normal distribution....)
ewan