karsten at asl.uni-bielefeld.de writes:
I,m looking for a databank which contains protein secondary-structure data.
I'd like to implement a neural network which predicts secondary-structure and
need very much data for the training.
As noted previously, the NRL-3D dataset (available from the UH server) is one
way to get structural information. Another is to get the PDB dataset itself
directly from Brookhaven, either by email server (fileserv at pb1.pdb.bnl.gov) or
anonymous ftp (pdb.pdb.bnl.gov). I believe there is also a gopher hole at
Brookhaven. Write pdb at chm.chm.bnl.gov for more information. If you are going
to use these "raw" sources of data, I would strongly recommend getting Laura
Lynn Walsh's PDB info file, which has very useful annotation of each structure.
It is available from her by writing lwalsh at nemo.life.uiuc.edu.
It is also possible to get precisely the same dataset that Qian & Sejnowski
used for their neural network secondary structure prediction paper [J. Mol.
Bio. (1988) 202:865-884] which is available via the University of California,
Irvine machine learning archive. Anonymous ftp to host ics.uci.edu, directory
/pub/machine-learning-databases/molecular-biology/protein-secondary-structure/
I believe this archive is mirrored by the hosts cs.dal.ca in Canada, and by
src.doc.ic.ac.uk in the UK.
You may also want to check Zhang, Mesirov and Waltz's "Hybrid Systems for
Protein Secondary Structure Prediction," [J. Mol. Bio. (1992) 225:1049-1063]
for the training and test sets that they used, which were quite carefully
selected.
Best of luck,
Larry
--
Lawrence Hunter, PhD.
National Library of Medicine
Bldg. 38A, MS-54
Bethesda. MD 20894 USA
tel: +1 (301) 496-9300
fax: +1 (301) 496-0673
internet: hunter at nlm.nih.gov
encryption: PGP 2.0 public key via "finger hunter at work.nlm.nih.gov"