In message <9303020601.AA17760 at net.bio.net> Lenny Bloksberg
(PreissJ at clvax1.cl.msu.edu) asks
> There has been some talk lately about accessing the PDB via gopher.
> Right now, just want to find out if a given sequence has a known crystal
> structure. I have run FASTA of selected domains of my protein using GCG
> on my local VAX. I now want to know which proteins to look up in the library
> first. I figure that those with known crystal structures will probably have
> the most information about the likely structure and function of the domains
> that I have selected. What is the quickest way to find out this info?
Last week I answered a similar question on the proteins and xtalography boards.
As several others have already been kind enough to point out, the NRL_3D
database produced by the PIR-International contains all the protein sequence
information extracted from the Brookhaven Protein Data Bank. The NRL_3D is
provided with the standard PIR distribution and it is also available both
for FASTA searching and for database queries through the PIR Network Request
Server.
The following is extracted from the Announcements of the Protein Information
Resource Network Request Service published last summer.
> 5. FASTA Searches for NRL_3D Only
> Some users had suggested that they wanted to do FASTA sequence searches
> only for the sequences with known 3-dimensional structures, the sequences
> extracted from the Brookhaven Protein Data Bank in NRL_3D. Normally our
> FASTA searches are done against all the protein databases, PIR1, PIR2, PIR3,
> the non-redundant PATCHX (described in the August announcement and in part 2
> above) and NRL_3D. Now when the command
> USE BASES NRL_3D
> is used before a SEARCH command, only the NRL_3D database will be used for
> the FASTA search. Otherwise, all the protein databases will be used.
To perform the FASTA search in NRL_3D send an electronic mail message containing
the following lines (with the appropiate sequence substitution)
USE BASES NRL_3D
SEARCH protein_sequence_in_single_letter_code
to the PIR Network Request Service address FILESERV at NBRF.Georgetown.EDU on
Internet or FILESERV at GUNBRF on BITNET. The server will return the result of a
FASTA search through only the protein sequences with reported atomic positions
in the Brookhaven Protein Data Bank. The first four characters of the entry
codes in the NRL_3D database correspond to the PDB entry codes. Users who have
the PIR database access programs, the NRL_3D database and the Brookhaven
Protein Data Bank can use the MATCH command to generate a VMS command procedure
that will extract the atomic coordinates of all the matched sequences in the
PDB for model building and comparison.
It is also possible to do a more general database query for related sequences.
The COMPND records in Broookhaven PDB entries are extracted into the title
records of NRL_3D. The server TITLE command can be used to find entries by
the words in their titles. Likewise, the SPECIES command can be used to find
entries by the source of the sequence. In some cases, the titles and species
differ between NRL_3D and the corresponding PDB entry either because the
information has been updated or corrected. For example, the PDB SOURCE
record occasionally lists the host species of a viral sequence rather than the
virus itself, and some PDB COMPND records carry older Enzyme Commission numbers.
This series of commands could be used to find the cytochrome oxidases in
Brookhaven PDB.
USE BASES NRL_3D
TITLE CYTOCHROME OXIDASE
This would find any sequences from white-tailed deer
USE BASES NRL_3D
SPECIES WHITE TAILED DEER
The information in HELIX, SHEET, TURN, SITE and some ATOM and HETATM records
of PDB entries is extracted into NRL_3D features records. To find all author
reported segments of type I turns, this set of commands could be sent to the
server
USE BASES NRL_3D
FEATURE TURN "TYPE I "
Addition information can be obtained by sending a HELP request to the PIR
Network Request Service address.
------------------------------------------------------------------------
Dr. John S. Garavelli
Database Coordinator
Protein Information Resource
National Biomedical Research Foundation
Washington, DC 20007
POSTMAST at GUNBRF.BITNETPOSTMASTER at NBRF.GEORGETOWN.EDU