PDBtoGCG (for VMS)

Fri Dec 3 10:46:26 EST 1993

In message <9312030024.AA02665 at net.bio.net> David Mathog
(mathog at seqvax.bio.caltech.edu) says
> Sometimes users insist on having the sequences out of PDB files (even
> though most of these can be retrieved from other databases).  It's a bit
> of a pain to do manually, so here are a short script and a program
> it uses that carry out this conversion automatically.  It only works
> on protein sequences, and then only if the sequence(s) are properly
> recorded in the SEQRES records (so sue me).  Each strand's sequence
> is placed in a separate file.

The protein sequences from the Brookhaven Protein Data Bank entries have
already been extracted into a sequence file format and are provided by the PIR
in the NRL_3D database.  This database also includes all the source,
bibliographic and feature information extracted from PDB HELIX, SHEET, TURN,
SITE, and SSBOND records along with special ATOM and HETATM records.  The
NRL_3D database is indexed and can be searched and queried (and even converted
to other formats) like the other PIR sequence databases.

The NRL_3D entries have titles that conform to NBRF naming rules and match the
corresponding PIR entries.  The title field of NRL_3D entries contains all the
same elements as the corresponding PIR entries and in the same order.  However,
additional elements are present in the NRL_3D entries to distinguish entries
that may be chains or fragments with different crystallographic coordinates,
have different crystallization conditions, or have different chemical
modifications.  The NRL_3D title of an entry may not correspond to the
Brookhaven Protein Data Bank COMPND record from which it was originally derived
because all Enzyme Commission numbers have been changed to conform to the
current rules, and all co-crystallized protein chains from different sources
are distinguished and correctly identified.  A file of the NRL_3D titles can be
obtained by sending the E-Mail message

Release 12.2 of NRL_3D is currently available with the regular tape and CD-ROM
distribution of the PIR.  It is available for database searches and queries
through the PIR Network Request Server and the PIR On-Line Access System.  It
is available by FTP from the University of Houston Gene-Server at 
ftp.bchs.uh.edu and the NCBI archives at ncbi.nlm.nih.gov.
                                 Dr. John S. Garavelli
                                 Database Coordinator
                                 Protein Information Resource
                                 National Biomedical Research Foundation
                                 Washington, DC  20007
                                 POSTMAST at GUNBRF.BITNET
                                 POSTMASTER at NBRF.GEORGETOWN.EDU

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net