[Protein-analysis] 3D Structural Properties : From DSSP or PDB or ?

Pooja Jain via proteins%40net.bio.net (by pcxpj1 from nottingham.ac.uk)
Wed Apr 23 15:27:28 EST 2008


To continue this discussion, I am very much interested in what others  
say about the best approach to get the 3-D structural properties for  
the disordered regions or the secondary structures elements, may be  
for the purpose of training some machine learning algorithm or guiding  
MD simulations of a deemed homologous protein with unknown structure?

Should it be DSSP or PDB or something else that I am not aware of ?

Thank you.


On 23 Apr 2008, at 19:20, Kevin Karplus wrote:

> Narges Habibi wrote
>> I'm doing a project on "Protein Contact Map Prediction" and I use  
>> some
>> features for nueral network's input, including Secondary Structure  
>> of a
>> given Amino Acid. There are several ways:
>> 1- getting dssp file for each pdb file (from ftp server)
>> 2- extracting from pdb file (The HELIX and SHEET and TURN section)
>> 3- getting ss file from www.pdb.org (as I see the given sequences  
>> in this
>> file don't match with the pdb files, why?)
>> What do you suggest? What method is more accurate?
> None of the above.
> Predicting contact maps using known structure is cheating.  You should
> be predicting the local structure, not extracting it from known
> structures.  Any way that data from known structures can creep into
> your inputs invaliates your testing, and makes it impossible to say
> with confidence that your method does anything useful.  Given the
> rather low-quality of contact prediction at the current state of the
> art, even small amounts of information from the real structure can
> make a big difference.
> The following paper by my student is a pretty good summary of the the
> best method as of CASP7---improvements since then have been modest:
> George Shackelford and Kevin Karplus.
> Contact Prediction using Mutual Information and Neural Nets.
> Proteins: Structure, Function, and Bioinformatics,
> 69(S8):159-164, 2007. (CASP7 sepcial issue).
> doi:10.1002/prot.21791
> I see a lot of "prediction" work that is complete garbage, because the
> authors fooled themselves by using data that could only come from
> knowing the real structures.  The even more common problem is
> insufficient separation of train and test sets, in which computer
> scientists assume that the random partition of a data set is all that
> is needed---but the sta sets we have aren't independent samples, so
> one has to go to some effort to ensure that the test set does not
> contain examples that are very close to training set examples.
> ------------------------------------------------------------
> Kevin Karplus 	karplus from soe.ucsc.edu	http://www.soe.ucsc.edu/~karplus
> Professor of Biomolecular Engineering, University of California,  
> Santa Cruz
> Undergraduate Director, Bioinformatics
> (Senior member, IEEE)	(Board of Directors & Chair of Education  
> Committee, ISCB)
> Affiliations for identification only.
> https://lists.sdsc.edu/mailman/listinfo.cgi/pdb-l .

More information about the Proteins mailing list

Send comments to us at biosci-help [At] net.bio.net