IUBio

Looking for a protein secondary-structure databank

POSTMASTER at NBRF.GEORGETOWN.EDU POSTMASTER at NBRF.GEORGETOWN.EDU
Fri Nov 20 11:18:18 EST 1992


Karsten Quast in message <9211201109.AA11655 at net.bio.net> pleads

> I,m looking for a databank which contains protein secondary-structure data.
> I'd like to implement a neural network which predicts secondary-structure
> and need very much data for the training.

The PIR's NRL_3D database is an integrated database of protein sequence and
secondary structure information using all of the Brookhaven Protein Data Bank. 
The caveat is that the annotated secondary structure information is only what
appears in the Brookhaven Protein Data Bank.  Since that information was
provided by the depositors, it is not necessarily complete (not all sequences
are annotated for all features, so the absence of a feature for a particular
sequence can't be taken as meaning that structure is not present) or consistent
(some depositors may interpret what is essentially the same structure in
different ways.)  The following is a description of the features annotations in
the NRL_3D database from one of our recent announcements.  The current version
of NRL_3D is 10.00 it corresponds to Brookhaven Protein Data Bank Release 61,
and contains 1,457 sequences with 244,804 residues.
------------------------------------------------------------------------
2. NRL_3D Release 9.1 Has Feature Information from Brookhaven Data Bank

The NRL_3D Database of sequence information extracted from the Brookhaven
Protein Data Bank (PDB) has been upgraded to release 9.1.  This new version
includes feature annotations extracted from PDB HELIX, SHEET, TURN, SITE, and
SSBOND records along with special ATOM and HETATM records.  New algorithms
have been implemented to construct and name chains and fragments, to recognize
non-standard residues and to discard entries with completely unknown sequence.
NRL_3D release 9.1 corresponds to PDB release 60 (May 1992) and contains
1,380 sequences with 229,099 residues.

The inclusion of this feature information in NRL_3D allows PDB entries to be
recovered through the FEATURE command.  For example the commands
  USE BASES NRL_3D
  FEATURE TURN "TYPE I "
will list all entries in the NRL_3D database with a "type I" turn annotated
in their corresponding PDB entry.

Release 9.1 of NRL_3D is available through the PIR Network Request Server,
through the PIR On-Line Access System and by FTP from the University of Houston 
server at ftp.bchs.uh.edu in the files
  /pub/gene-server/incoming/pir33/nrl_3d-9.1-vms
  /pub/gene-server/incoming/pir33/nrl_3d-9.1-ascii

Our thanks to Bill Pearson and Dan Davison for their efforts in providing FTP
access to the PIR databases.
------------------------------------------------------------------------
                                 Dr. John S. Garavelli
                                 Database Coordinator
                                 Protein Information Resource
                                 National Biomedical Research Foundation
                                 Washington, DC  20007
                                 POSTMASTER at GUNBRF.BITNET
                                 POSTMASTER at NBRF.GEORGETOWN.EDU



More information about the Proteins mailing list

Send comments to us at biosci-help [At] net.bio.net