IUBio

Comments about secondary structure prediction server

VICTOR B. STRELETS STRELETS at SCRI.FSU.EDU
Thu May 4 16:38:56 EST 1995


Hi,

Recently new server for the prediction of secondary structure was
announced (bio at margit.scri.fsu.edu). In connection with lurge number
of related requests, including quality of prediction, I would like
to give some comment which were absent (or not detected byusers)
in original announcement.

1. Method predicts secondary structure in 3-state model. States
   include helix, beta and coil/other ([H], [B] and [-] in reply
   files). Prediction of coil/other is really prediction, not
   just an absence of other predictions in particular parts of
   sequence.

2. Accuracy of the prediction was evaluated positionally, including
   coil/other state prediction.

3. Algorithm was learned on the content of PDB-65 (its translation
   to NRL3D in PIR-41). Features other than helices or beta were
   classified as coil/other.

4. Originally work was directed toward building (revealing) of
   stereochemical code in connection with formation of secondary
   structures, i.e. if in sequence ... in ... configuration,
   then secondary structure is ... in ... configuration. And
   such a code was revealed, of course, on some level of partial
   positional description. Corresponding patterns (rules) obtained
   in more than 95% of particular sequence patterns occuerence.
   Model represent secondary structure organization in general,
   after filtering/joining of corresponding code elements.

5. Prediction implementation - something additional to this
   result. It was simple idea: if it exists, why not try connect
   database with some prediction program and why not to expose
   it for public?

6. Pp.5 was probably a bad idea, because some problems immediately
   pop up:
    - mechanism of code joining, overlapping and converting code
      to prediction was not perfectly developed;
    - prediction was built as an analysis of 3 resulting weight
      profiles (3 states), but there are problems with weigting
      of different patterns in code (unexpected..) and analysis
      of profiles is rather simple and unsophisticated what leads
      to the errors in comparison even with manual profiles
      analysis;
    - thre are holes in model for some subsequences (lack of data?);
    - model on the server is only copy (2-4 days old) of model in
      work.

I decided to keep server active, but people should take into account
its experimental character (it was specifically mentioned in original
announcement). Next week users will have in reply files FIRST three
weight profiles, then automatic classification. On the profiles,
sharp regions clerly indicate structures and their boundaries, even
if relative ranging if profiles (due to weight problems) does not
allow to draw right conclusions.

Note: code revealing algorithm was organized so that any parts of
code obtained with significant homology of corresponding peptides
were deleted from model. So, it was orientation on the really
novel proteins (or really universal code, if you like). Therefore
model in application to the prediction (profiles?) could be tested
even on the same learning data.

Note2: mentioned by some users prevalence of helices will studied
after final model building. No comment right now, except of problem
with code parts weighting.

Note3: any helpful comments and discussions are still welcome!

Note4: if you like to use method in real work, wait one month
please - some problems will be already resolved and model probably
will be completed in full.

And final comment (many requests, especially from disappointed
people) about accuracy: test on all the bank, new proteins and
specially excluded (although homology influence was blocked)
from learning structures will be done after final model construction,
but tests on two partial sets (excluded and set of used in buiding)
shows mean positional 3-state appr. 90%. Any questions about biased
sets etc. unlikely to correspond to the real situation (I checked
representation in these sets), but will be resolved after final
testing.

And still I think that it was error to open it before finishing..
I regret, despite of some excellent replyes..

Regards,

Victor B.Strelets




More information about the Proteins mailing list

Send comments to us at biosci-help [At] net.bio.net