Announcements of the Protein Identification Resource
Network Request Service
Highlights
1. PATCHX Supplements PIR with Sequences from Other Databases
2. Feature Information from Brookhaven Data Bank in NRL_3D Database 9.1
3. Complimentary CD-ROM Available with ATLAS Multidatabase Retrieval Program
4. New USE FORMAT Server Command Provides Versatile Output
5. GenBank and EMBL Database Sections
6. PIR Network Request Service Command Summary
Announcements
1. PATCHX Supplements PIR with Sequences from Other Databases
The PATCHX database produced by MIPS at the Max Planck Institute for
Biochemistry, Martinsreid, FRG. The PATCHX database includes all protein
sequences (not identical with or contained in sequences from PIR1, PIR2 and
PIR3 release 32.2) from the following databases:
Database Release Date Entries Code Description
MIPSOwn 33.0 6-92 1251 D MIPS preliminary entries
PIRMOD 33.0 6-92 32 E MIPS/PIR preliminary entries
MIPSH 32.2 6-92 65 F MIPS yeast entries
NRL_3D 8.0 3-92 247 R Brookhaven Data Bank Sequences
MIPSTrn 33.0 6-92 1130 G MIPS preliminary translations
EMTrans 30.0 5-92 12756 H EMBL automatic translations
SwissProt 21.0 2-92 1618 I SwissProt entries
GenPept 71.0 3-92 4603 J GenBank automatic translations
Kabat 5.0 3-92 3567 K Kabat entries
PSeqIP 5.0 7-88 956 L NEWAT
M PSD
N PGTrans
All sequences that are IDENTICAL within or between databases are present ONCE.
Duplicate sequences and sequences that were completely contained within others
(subsequences) have been eliminated according to the priority (top to bottom)
in the table above. The number of entries in the table reflects the number of
entries remaining from that database after elimination of duplicates and
subsequences, not the original number of entries. There still remain numerous
inexact duplicates in PATCHX, multiple reports of the same protein that have at
least one amino acid residue difference. Many of these are cited in merged PIR
entries. The PIR3, MIPSOwn, PIRMOD and MIPSTrn databases contain preliminary
data that should be used with extreme caution.
The PATCHX database is available through the PIR Network Request Server,
through the PIR On-Line system and on the ATLAS CD-ROM now being distributed.
Friedhelm Pfeiffer at MIPS wishes to thank Reinhard Doelz and Hans
Ullitz-Moeller for their valuable suggestions in the production of this
database.
2. NRL_3D Release 9.1 Has Feature Information from Brookhaven Data Bank
The NRL_3D Database of sequence information extracted from the Brookhaven
Protein Data Bank (PDB) has been upgraded to release 9.1. This new version
includes feature annotations extracted from PDB HELIX, SHEET, TURN, SITE, and
SSBOND records along with special ATOM and HETATM records. New algorithms
have been implemented to construct and name chains and fragments, to recognize
non-standard residues and to discard entries with completely unknown sequence.
NRL_3D release 9.1 corresponds to PDB release 60 (May 1992) and contains
1,380 sequences with 229,099 residues.
The inclusion of this feature information in NRL_3D allows PDB entries to be
recovered through the FEATURE command. For example the commands
USE BASES NRL_3D
FEATURE TURN "TYPE I"
will list all entries in the NRL_3D database with a "type I" turn annotated
in their corresponding PDB entry.
Release 9.1 of NRL_3D is available through the PIR Network Request Server,
through the PIR On-Line Access System and by FTP from the University of Houston
server at ftp.bchs.uh.edu in the files
/pub/gene-server/incoming/pir33/nrl_3d-9.1-vms
/pub/gene-server/incoming/pir33/nrl_3d-9.1-ascii
Our thanks to Bill Pearson and Dan Davison for their efforts in providing FTP
access to the PIR databases.
3. Complimentary CD-ROM Available with ATLAS Multidatabase Retrieval Program
A preliminary version of the ATLAS CD-ROM is being distributed on a
complimentary basis as an introduction. Regular distribution of the
ATLAS CD-ROM is expected to begin in the Fall, coordinated with the quarterly
releases of the PIR-International Protein Sequence Database. To receive a
complimentary ATLAS CD-ROM, please send your name and complete mailing address
to: PIRMAIL at GUNBRF.BITNET
The ATLAS CD-ROM contains the Atlas Retrieval System, the PIR-International
Protein Sequence Database, the GenBank Gene Sequence Database, and several
related databases. The Atlas Retrieval System (ATLAS) is an information
retrieval system specifically designed to access macromolecular sequence
databases. It provides simultaneous retrieval from all (or a selected subset)
of these databases. The Atlas program is currently designed to run on PC/DOS
and VAX/VMS computer systems. Support for UNIX and MAC systems will be added.
The development of the ATLAS program was partially supported by NLM LM05206-09,
by NSF BIR-9107540, and by Digital Equipment Corporation. The ATLAS program is
copyrighted by the National Biomedical Research Foundation. The ATLAS of
Protein and Genomic Sequences is a trademark of the National Biomedical
Research Foundation.
The ATLAS program was developed from the NBRF eXperimental Query System (XQS)
and is designed along similar lines; it does not contain some of the utility
functions of the XQS program; these will be added later as portability permits.
VAX/VMS systems currently do not support direct access to ISO 9660 formatted
CD-ROMs. The ATLAS CD-ROM may be accessed on VAX/VMS systems by two
approaches:
(1) There is an ISO 9660 compliant device driver available from Digital
Equipment Corporation (DEC) that allows direct access to the CD-ROM
(driver part number YT-GS001-01). Please contact your DEC sales
representative for further information.
(2) There is a public domain utility for accessing ISO 9660 CD-ROMs,
called CD_ACCESS, written by Peter Stockwell, University of Otago,
New Zealand, that will allow all the files on the CD-ROM to be copied
to a magnetic disk drive. This utility can be obtained from the EMBL
E-mail server (for further information contact DataLib at EMBL-Heidelberg.DE).
When copying files using CD_ACCESS, be sure to use the /BINARY qualifier
to the copy command.
4. New USE FORMAT Server Command Provides Versatile Output
The PIR Network Server now provides a command for changing the default format
of PIR-International database entries. The default format for PIR entries
conforms to CODATA specifications. To obtain PIR entries in the format
normally presented by PIR database retrieval programs (PSQ, XQS and ATLAS)
use the command
USE FORMAT ATLAS
Subsequent GET commands will then return entries in the ATLAS format.
The command
USE FORMAT CODATA
will cause subsequent GET commands to return entries in the default CODATA
format.
5. GenBank and EMBL Database Components
To facilitate program access, the GenBank and EMBL databases have been broken
into sections. GenBank is available in three sections, GB, GBSUP and GBNEW,
and EMBL is available in two sections, EMBL and EMBLSUP. The GBNEW section
contains the GenBank weekly update entries. The GBSUP and EMBLSUP contain
regular entries in supplemental sections (presently these are the primate
entries). All these databases are automatically available through all the
commands that can use them. Particular databases may be selected with the
USE BASES command. The command
USE BASES GB*
will select all the GenBank databases, and only those databases, for
subsequent database query and retrieval commands. The command
USE BASES N+GB*+EMBL*
will select all the nucleotide sequence databases for subsequent query and
retrieval commands.
6. PIR Network Request Service Command Summary
The National Biomedical Research Foundation Protein Identification Resource
network request service is a full-function fileserver and database query
system. It has been operating since August 1990 and is capable of handling
database queries, sequence searches and sequence submissions, in addition to
fileserver requests. To use this server, request commands should be sent to
FILESERV at GUNBRF on BITNET. The FILESERVer recognizes the following commands
sent either in a mail message, or (if the sender is on BITNET) in command
messages or in a file:
Command Action
------- -----------------------------------------------
ACCESSION list entry codes and titles by accession number
AND combine QUERY commands with Boolean AND
AUTHOR list entry codes and titles by author
BASES list accessible databases
CROSS list PIR entry codes and titles corresponding to
a particular nucleic sequence database entry
DEPOSIT deposit entry for database submission
END DEPOSIT terminate deposit entry
FEATURE list entry codes and titles by feature table entry
GENE list entry codes and titles for a gene name
GET return entry by entry code
HELP return HELP instructions
HOST list entry codes and titles by host species
INDEX list SENDable files
JOURNAL list entry codes and titles by journal citation
KEYWORD list entry codes and titles by keyword
MEMBER list alignments containing entry code as a member
NOT combine QUERY commands with Boolean NOT
OR combine QUERY commands with Boolean OR
QUERY begin collecting QUERY commands
END QUERY terminate collecting commands and execute QUERY
QUIT ignore the remaining text (E-mail signature blocks)
RETURN change return address for gateway mail
SEARCH search for sequence by FASTA procedure
END SEARCH terminate sequence for searching
SEND send file
SPECIES list entry codes and titles by species
SUGGEST leave suggestion or correction for PIR staff
END SUGGEST terminate suggestion text
SUPERFAMILY list entry codes and titles by superfamily name
TAXONOMY report taxonomy for scientific or common name
TITLE list entry codes and titles by title
USE set databases, dates or formats to use
Multiple commands can be sent with one command on each line of a mail message
or file. Commands should NOT be sent on the Subject line of a mail message.
Receipt of command messages and files will be acknowledged immediately. Mail
messages will be acknowledged by return mail.
For help in using any of the commands, send a request of the for
HELP topic
for example
HELP SEARCH
In addition to the commands, help instructions are also available on the
following topics:
Custom_Services
Databases
Gateway_Access
Help_en_Espanol
Help_en_francais
IBM-VM_BITNET
On-Line_Access
PIR_Distribution
VAX-VMS_BITNET
Because of network gateway communication protocols, there are limitations on
requests sent through gateways. Users not on BITNET or INTERNET who access
BITNET through local or network gateways should read and carefully follow
these instructions before sending requests. Only mail message requests
(not command messages or files) can be sent through gateways. Because
addresses posted on gateway mail do not always work for the return, before you
send requests through network gateways it is strongly recommended that you
first contact Dr. John S. Garavelli at POSTMASTER at GUNBRF on BITNET. We will
confirm a return address for you and may instruct you to use the RETURN
command to insure that your request output will reach you. It is not usually
necessary to do this if you are on BITNET or INTERNET, unless your system
employs a local remailer or your mail program applies a non-standard return
address (for example a personal name on the FROM: line).
The BITNET network and the network gateways impose strict limits on file size.
Poorly posed database queries may result in output so extensive that it could
not be returned by network mail. Therefore, an output limit of 1000 lines for
each command and 3000 lines for each request is imposed by the PIR FILESERVer.
The DEPOSIT and QUERY commands must, and the SEARCH and SUGGEST commands may,
be followed by their respective END commands when text appears on intervening
lines. The DEPOSIT command requires, and the SEARCH command optionally uses,
parameters that appear on the same line as the command. Because these four
commands are so complex, users should obtain and carefully read the help
instructions before attempting to use them.
The databases available through the PIR Network Server and their abbreviations
for code specification are as follows:
Abbreviation Database Update Schedule
PIR1 PIR Annotated and Classified Entries quarterly
PIR2 PIR Preliminary Entries approximately monthly
PIR3 PIR Unverified Entries weekly
ALN PIR Alignment Entries quarterly
NRL_3D Brookhaven Data Bank Sequences quarterly
PATCHX MIPS PIR-Supplementary Database quarterly
N NBRF Nucleic
GB GenBank (TM) as received
GBSUP GenBank (TM) as received
GBNEW GenBank (TM) New Entries weekly
EMBL EMBL as received
EMBLSUP EMBL as received
Not all commands work with all databases; please read the information returned
by the command HELP DATABASES.
------------------------------------------------------------------------
Dr. John S. Garavelli
Database Coordinator
Protein Identification Resource
National Biomedical Research Foundation
Washington, DC 20007
POSTMASTER at GUNBRF.BITNET