IUBio

Announcements of PIR Network Request Service

POSTMAST at GUNBRF.BITNET POSTMAST at GUNBRF.BITNET
Mon Aug 3 17:06:55 EST 1992


              Announcements of the Protein Identification Resource
                            Network Request Service

Highlights
1. PATCHX Supplements PIR with Sequences from Other Databases
2. Feature Information from Brookhaven Data Bank in NRL_3D Database 9.1
3. Complimentary CD-ROM Available with ATLAS Multidatabase Retrieval Program
4. New USE FORMAT Server Command Provides Versatile Output
5. GenBank and EMBL Database Sections
6. PIR Network Request Service Command Summary

Announcements
1. PATCHX Supplements PIR with Sequences from Other Databases

The PATCHX database produced by MIPS at the Max Planck Institute for
Biochemistry, Martinsreid, FRG.  The PATCHX database includes all protein
sequences (not identical with or contained in sequences from PIR1, PIR2 and
PIR3 release 32.2) from the following databases:

  Database   Release  Date  Entries  Code  Description

  MIPSOwn    33.0     6-92   1251    D     MIPS preliminary entries
  PIRMOD     33.0     6-92     32    E     MIPS/PIR preliminary entries
  MIPSH      32.2     6-92     65    F     MIPS yeast entries
  NRL_3D      8.0     3-92    247    R     Brookhaven Data Bank Sequences
  MIPSTrn    33.0     6-92   1130    G     MIPS preliminary translations
  EMTrans    30.0     5-92  12756    H     EMBL automatic translations
  SwissProt  21.0     2-92   1618    I     SwissProt entries
  GenPept    71.0     3-92   4603    J     GenBank automatic translations
  Kabat       5.0     3-92   3567    K     Kabat entries
  PSeqIP      5.0     7-88    956    L     NEWAT
                                     M     PSD
                                     N     PGTrans

All sequences that are IDENTICAL within or between databases are present ONCE.
Duplicate sequences and sequences that were completely contained within others
(subsequences) have been eliminated according to the priority (top to bottom)
in the table above.  The number of entries in the table reflects the number of
entries remaining from that database after elimination of duplicates and
subsequences, not the original number of entries.  There still remain numerous
inexact duplicates in PATCHX, multiple reports of the same protein that have at
least one amino acid residue difference.  Many of these are cited in merged PIR
entries.  The PIR3, MIPSOwn, PIRMOD and MIPSTrn databases contain preliminary
data that should be used with extreme caution.

The PATCHX database is available through the PIR Network Request Server,
through the PIR On-Line system and on the ATLAS CD-ROM now being distributed.

Friedhelm Pfeiffer at MIPS wishes to thank Reinhard Doelz and Hans
Ullitz-Moeller for their valuable suggestions in the production of this
database.


2. NRL_3D Release 9.1 Has Feature Information from Brookhaven Data Bank

The NRL_3D Database of sequence information extracted from the Brookhaven
Protein Data Bank (PDB) has been upgraded to release 9.1.  This new version
includes feature annotations extracted from PDB HELIX, SHEET, TURN, SITE, and
SSBOND records along with special ATOM and HETATM records.  New algorithms
have been implemented to construct and name chains and fragments, to recognize
non-standard residues and to discard entries with completely unknown sequence.
NRL_3D release 9.1 corresponds to PDB release 60 (May 1992) and contains
1,380 sequences with 229,099 residues.

The inclusion of this feature information in NRL_3D allows PDB entries to be
recovered through the FEATURE command.  For example the commands
  USE BASES NRL_3D
  FEATURE TURN "TYPE I"
will list all entries in the NRL_3D database with a "type I" turn annotated
in their corresponding PDB entry.

Release 9.1 of NRL_3D is available through the PIR Network Request Server,
through the PIR On-Line Access System and by FTP from the University of Houston
server at ftp.bchs.uh.edu in the files
  /pub/gene-server/incoming/pir33/nrl_3d-9.1-vms
  /pub/gene-server/incoming/pir33/nrl_3d-9.1-ascii

Our thanks to Bill Pearson and Dan Davison for their efforts in providing FTP
access to the PIR databases.


3. Complimentary CD-ROM Available with ATLAS Multidatabase Retrieval Program

A preliminary version of the ATLAS CD-ROM is being distributed on a
complimentary basis as an introduction.  Regular distribution of the
ATLAS CD-ROM is expected to begin in the Fall, coordinated with the quarterly
releases of the PIR-International Protein Sequence Database.  To receive a
complimentary ATLAS CD-ROM, please send your name and complete mailing address
to: PIRMAIL at GUNBRF.BITNET

The ATLAS CD-ROM contains the Atlas Retrieval System, the PIR-International
Protein Sequence Database, the GenBank Gene Sequence Database, and several
related databases.  The Atlas Retrieval System (ATLAS) is an information
retrieval system specifically designed to access macromolecular sequence
databases.  It provides simultaneous retrieval from all (or a selected subset)
of these databases.  The Atlas program is currently designed to run on PC/DOS
and VAX/VMS computer systems. Support for UNIX and MAC systems will be added.

The development of the ATLAS program was partially supported by NLM LM05206-09,
by NSF BIR-9107540, and by Digital Equipment Corporation.  The ATLAS program is
copyrighted by the National Biomedical Research Foundation.  The ATLAS of
Protein and Genomic Sequences is a trademark of the National Biomedical
Research Foundation.

The ATLAS program was developed from the NBRF eXperimental Query System (XQS)
and is designed along similar lines; it does not contain some of the utility
functions of the XQS program; these will be added later as portability permits.

VAX/VMS systems currently do not support direct access to ISO 9660 formatted
CD-ROMs.  The ATLAS CD-ROM may be accessed on VAX/VMS systems by two
approaches:

(1) There is an ISO 9660 compliant device driver available from Digital
    Equipment Corporation (DEC) that allows direct access to the CD-ROM
    (driver part number YT-GS001-01).  Please contact your DEC sales
    representative for further information.

(2) There is a public domain utility for accessing ISO 9660 CD-ROMs,
    called CD_ACCESS, written by Peter Stockwell, University of Otago,
    New Zealand, that will allow all the files on the CD-ROM to be copied
    to a magnetic disk drive.  This utility can be obtained from the EMBL
    E-mail server (for further information contact DataLib at EMBL-Heidelberg.DE).
    When copying files using CD_ACCESS, be sure to use the /BINARY qualifier
    to the copy command.


4. New USE FORMAT Server Command Provides Versatile Output

The PIR Network Server now provides a command for changing the default format
of PIR-International database entries.  The default format for PIR entries
conforms to CODATA specifications.  To obtain PIR entries in the format
normally presented by PIR database retrieval programs (PSQ, XQS and ATLAS)
use the command
  USE FORMAT ATLAS
Subsequent GET commands will then return entries in the ATLAS format.
The command
  USE FORMAT CODATA
will cause subsequent GET commands to return entries in the default CODATA
format.


5. GenBank and EMBL Database Components

To facilitate program access, the GenBank and EMBL databases have been broken
into sections.  GenBank is available in three sections, GB, GBSUP and GBNEW,
and EMBL is available in two sections, EMBL and EMBLSUP.  The GBNEW section
contains the GenBank weekly update entries.  The GBSUP and EMBLSUP contain
regular entries in supplemental sections (presently these are the primate
entries).  All these databases are automatically available through all the
commands that can use them.  Particular databases may be selected with the
USE BASES command.  The command
  USE BASES GB*
will select all the GenBank databases, and only those databases, for
subsequent database query and retrieval commands.  The command
  USE BASES N+GB*+EMBL*
will select all the nucleotide sequence databases for subsequent query and
retrieval commands.


6. PIR Network Request Service Command Summary

The National Biomedical Research Foundation Protein Identification Resource
network request service is a full-function fileserver and database query
system.  It has been operating since August 1990 and is capable of handling
database queries, sequence searches and sequence submissions, in addition to
fileserver requests.  To use this server, request commands should be sent to
FILESERV at GUNBRF on BITNET.  The FILESERVer recognizes the following commands
sent either in a mail message, or (if the sender is on BITNET) in command
messages or in a file:

  Command        Action
  -------        -----------------------------------------------
  ACCESSION      list entry codes and titles by accession number
  AND            combine QUERY commands with Boolean AND
  AUTHOR         list entry codes and titles by author
  BASES          list accessible databases
  CROSS          list PIR entry codes and titles corresponding to
                 a particular nucleic sequence database entry
  DEPOSIT        deposit entry for database submission
    END DEPOSIT  terminate deposit entry
  FEATURE        list entry codes and titles by feature table entry
  GENE           list entry codes and titles for a gene name
  GET            return entry by entry code
  HELP           return HELP instructions
  HOST           list entry codes and titles by host species
  INDEX          list SENDable files
  JOURNAL        list entry codes and titles by journal citation
  KEYWORD        list entry codes and titles by keyword
  MEMBER         list alignments containing entry code as a member
  NOT            combine QUERY commands with Boolean NOT
  OR             combine QUERY commands with Boolean OR
  QUERY          begin collecting QUERY commands
    END QUERY    terminate collecting commands and execute QUERY
  QUIT           ignore the remaining text (E-mail signature blocks)
  RETURN         change return address for gateway mail
  SEARCH         search for sequence by FASTA procedure
    END SEARCH   terminate sequence for searching
  SEND           send file
  SPECIES        list entry codes and titles by species
  SUGGEST        leave suggestion or correction for PIR staff
    END SUGGEST  terminate suggestion text
  SUPERFAMILY    list entry codes and titles by superfamily name
  TAXONOMY       report taxonomy for scientific or common name
  TITLE          list entry codes and titles by title
  USE            set databases, dates or formats to use

Multiple commands can be sent with one command on each line of a mail message
or file.  Commands should NOT be sent on the Subject line of a mail message.
Receipt of command messages and files will be acknowledged immediately.  Mail
messages will be acknowledged by return mail.

For help in using any of the commands, send a request of the for
  HELP topic
for example
  HELP SEARCH

In addition to the commands, help instructions are also available on the
following topics:
  Custom_Services
  Databases
  Gateway_Access
  Help_en_Espanol
  Help_en_francais
  IBM-VM_BITNET
  On-Line_Access
  PIR_Distribution
  VAX-VMS_BITNET

Because of network gateway communication protocols, there are limitations on
requests sent through gateways.  Users not on BITNET or INTERNET who access
BITNET through local or network gateways should read and carefully follow
these instructions before sending requests.  Only mail message requests
(not command messages or files) can be sent through gateways.  Because
addresses posted on gateway mail do not always work for the return, before you
send requests through network gateways it is strongly recommended that you
first contact Dr. John S. Garavelli at POSTMASTER at GUNBRF on BITNET.  We will
confirm a return address for you and may instruct you to use the RETURN
command to insure that your request output will reach you.  It is not usually
necessary to do this if you are on BITNET or INTERNET, unless your system
employs a local remailer or your mail program applies a non-standard return
address (for example a personal name on the FROM: line).

The BITNET network and the network gateways impose strict limits on file size.
Poorly posed database queries may result in output so extensive that it could
not be returned by network mail.  Therefore, an output limit of 1000 lines for
each command and 3000 lines for each request is imposed by the PIR FILESERVer.

The DEPOSIT and QUERY commands must, and the SEARCH and SUGGEST commands may,
be followed by their respective END commands when text appears on intervening
lines.  The DEPOSIT command requires, and the SEARCH command optionally uses,
parameters that appear on the same line as the command.  Because these four
commands are so complex, users should obtain and carefully read the help
instructions before attempting to use them.

The databases available through the PIR Network Server and their abbreviations
for code specification are as follows:
  Abbreviation  Database                              Update Schedule
  PIR1          PIR Annotated and Classified Entries  quarterly
  PIR2          PIR Preliminary Entries               approximately monthly
  PIR3          PIR Unverified Entries                weekly
  ALN           PIR Alignment Entries                 quarterly
  NRL_3D        Brookhaven Data Bank Sequences        quarterly
  PATCHX        MIPS PIR-Supplementary Database       quarterly
  N             NBRF Nucleic
  GB            GenBank (TM)                          as received
  GBSUP         GenBank (TM)                          as received
  GBNEW         GenBank (TM) New Entries              weekly
  EMBL          EMBL                                  as received
  EMBLSUP       EMBL                                  as received
Not all commands work with all databases; please read the information returned
by the command HELP DATABASES.

------------------------------------------------------------------------
                                 Dr. John S. Garavelli
                                 Database Coordinator
                                 Protein Identification Resource
                                 National Biomedical Research Foundation
                                 Washington, DC  20007
                                 POSTMASTER at GUNBRF.BITNET



More information about the Proteins mailing list

Send comments to us at biosci-help [At] net.bio.net