Announcements of the Protein Information Resource
Network Request Service
Highlights
1. Summaries for PIR-International Release 35 and NRL_3D Release 11
2. The ALN Database of Protein Sequence Alignments
3. Confidentiality of Requests Submitted to the Network Request Server
4. PIR-International Technical Development Bulletin
5. GenBank and EMBL Database Sections
6. PIR Network Request Service Command Summary
Announcements
1. Summaries for PIR-International Release 35 and NRL_3D Release 11
Release 35.00 of the PIR-International databases, Release 11.00 of the
NRL_3D database (corresponding to Brookhaven Protein Data Bank Release 62),
and Release 3.00 of the ALN database of protein sequence alignments, are now
available through the PIR On-line system and Network Request Server.
Distribution of the tapes of the new release has been completed and the
CD-ROMs are due to be shipped shortly.
Database Release Sequences Residues
PIR1 35.00 10,928 3,761,590 Annotated and Classified Entries
PIR2 34.00 16,662 4,453,825 Preliminary Entries
PIR3 34.00 19,644 5,660,425 Unverified Entries
NRL_3D 11.00 1,550 272,744 Sequences from Brookhaven PDB
ALN 3.00 715 (entries) Protein Sequence Alignments
Growth of the PIR databases is documented in the file DBGROWTH.LIS available
through the Network Request Server. The following files are also available
through the Server:
entries added since Release 34.00 are listed in PADD.LIS,
entries revised since Release 34.0 are listed in PREV.LIS,
superfamiles recorded in PIR1 and PIR2 are listed in SUPERFAM.LIS,
keywords employed in PIR1 and PIR2 are listed in KEYWORDS.LIS,
features cataloged in PIR1 and PIR2 are listed in FEATURES.LIS,
recognized journal abbreviations are listed in JOURNALS.LIS
a description of the ALN database is in ALNBASE.LIS,
titles in the ALN database are listed in ALNTITLE.LIS,
titles in the NRL_3D Database are listed in NRLTITLE.LIS.
2. The ALN Database of Protein Sequence Alignments
The Protein Information Resource (PIR) is developing a system for construction,
storage and retrieval of alignments of protein sequences. The objective is a
database of characteristic domain alignments with their known properties that
might be useful for characterizing proteins of unknown structure and function
as well as for describing the evolutionary relationships of multidomain
proteins.
In the initial phase, we have constructed a database of alignments of
homologous protein sequences that are less than 55% different from each other.
Groups of at least three sequences with comparable lengths and more than 50%
identical were selected from Section 1, Annotated and Classified entries of the
PIR-International Protein Sequence Database (PIR1). The ClustalV program of
Des Higgins at EMBL was used to align the sequences initially. The alignments
were checked by senior staff members at PIR and corrections were incorporated
wherever necessary using the ALNED program developed at PIR.
Other alignments developed as part of research projects at PIR, as well as
alignments of domains and repeats have also been included. The database
currently has 715 entries and can be accessed through the PIR On-line system,
the Network Request Service and the ATLAS retireval system being developed at
PIR.
Description of an ALN database entry
Each entry consists of a variable number of consecutive records. The
information contained in these lines is divided into six sections. The
sections are listed below in the order in which they occur in the entry.
1. TITLE
The title of the alignment.
2. DATE
Creation and revision dates.
3. MEMBERS
The sequence identification codes of the sequences used in the
alignment.
4. MEMBERS TITLES
The members title lines, as found in the Protein Sequence Database.
5. ALIGNMENT (variable number of records)
The alignment of sequences. The completely conserved residues are
marked by '*' and partially conserved residues are marked by '.'
at the bottom of the alignment.
6. MATRIX The matrix of percent differences.
The upper portion of the matrix gives the number of differences
between the sequences while the lower portion represents the same
as percent differences.
3. Confidentiality of Requests Submitted to the Network Request Server
All requests submitted to the PIR Network Request Server, including protein and
nucleotide sequences submitted for FASTA search against the PIR-International
protein sequence databases, are confidential within the following limitations.
The requests are stored in files in a directory that is not accessible to the
public through either network communication or the on-line system. Network
access is only possible through the server daemon and then only in response to
the network request, coded by date, time and address, that generated the file.
The files are not accessible to PIR personnel except those with the computer
system privileges necessary to conduct computer hardware and software
maintenance. The files, other than those generated by PIR staff members, are
examined only for accounting purposes and to monitor and ensure correct
software performance. Accounting summaries are generated for user address
distribution, numbers of requests, and numbers of commands on a monthly basis.
The files may be retained for up to one month for this accounting and are then
deleted. This confidentiality does not, of course, apply to protein sequences
submitted through the Server for inclusion in the PIR-International database.
4. PIR-International Technical Development Bulletin
We have on-going efforts to standardize the PIR databases, improving their
parsability and compliance with CODATA and other format standards. During
the next year the combined staffs of the PIR-International will be imposing
and enforcing many new rules and requirements on the distributed versions of
the database. Some of these rules and requirements may affect the currently
existing software designed to read the PIR databases in "NBRF format".
Notification of the broader aspects of these changes will be placed in our
newsletters and in announcements posted on the BioSci Newsgroups PROTEINS and
BIONEWS. However, some people may wish to be informed about the technical
aspects of these changes before they appear in a database release. For that
reason we will be setting up an electronic mailing list to inform software
developers and others interested in the technical aspects of these database
changes.
This electronic bulletin serves as an "early warning system" for people who
are concerned about changes in the format and standards for PIR database
entries. The first bulletin was posted on 22 January. Hereafter, they should
appear approximately quarterly. The first bulletin may be obtained by sending
the request SEND PIRTECH.LIS to the PIR Network Request Server.
If you would be interested in being placed on this mailing list, please send
a brief electronic mail note to me at POSTMAST at GUNBRF.BITNET or
POSTMASTER at NBRF.Georgetown.Edu.
5. GenBank and EMBL Database Sections
The GenBank and EMBL entries available on the On-line system and the Network
Request Server are now divided into the standard 13 libraries. The GBNEW
section contains the GenBank weekly update entries. All these databases are
automatically available on the Server through all the commands that can use
them. Particular databases may be selected with the USE BASES command
described at the end of the Server command summary.
6. PIR Network Request Service Command Summary
The National Biomedical Research Foundation Protein Information Resource
network request service is a full-function fileserver and database query
system. Operating since August 1990 it is capable of handling database
queries, sequence searches and sequence submissions, in addition to
fileserver requests. To use this server, request commands should be sent to
FILESERV at GUNBRF on BITNET or FILESERV at NBRF.Georgetown.EDU on Internet.
The server recognizes the following commands sent either in a mail message,
or (if the sender is on BITNET) in a command message or a file:
Command Action
------- -----------------------------------------------
ACCESSION list entry codes and titles by accession number
AND combine QUERY commands with Boolean AND
AUTHOR list entry codes and titles by author
BASES list accessible databases
CROSS list PIR entry codes and titles corresponding to
a particular nucleic sequence database entry
DEPOSIT deposit entry for database submission
END DEPOSIT terminate deposit entry
FEATURE list entry codes and titles by feature table entry
GENE list entry codes and titles for a gene name
GET return entry by entry code
HELP return HELP instructions
HOST list entry codes and titles by host species
INDEX list SENDable files
JOURNAL list entry codes and titles by journal citation
KEYWORD list entry codes and titles by keyword
MEMBER list alignments containing entry code as a member
NOT combine QUERY commands with Boolean NOT
OR combine QUERY commands with Boolean OR
QUERY begin collecting QUERY commands
END QUERY terminate collecting commands and execute QUERY
QUIT ignore the remaining text (E-mail signature blocks)
RETURN change return address for gateway mail
SEARCH search for matching sequences by FASTA procedure
END SEARCH terminate sequence for searching
SEND send file
SPECIES list entry codes and titles by species
SUGGEST leave suggestion or correction for PIR staff
END SUGGEST terminate suggestion text
SUPERFAMILY list entry codes and titles by superfamily name
TAXONOMY report taxonomy for scientific or common name
TITLE list entry codes and titles by title
USE set databases, dates or formats to use in limited searches
Multiple commands can be sent with one command on each line of a mail message
or file. Commands should NOT be sent on the Subject line of a mail message.
Receipt of command messages and files will be acknowledged immediately. Mail
messages will be acknowledged by return mail.
For help in using any of the commands, send a request of the form
HELP topic
for example
HELP SEARCH
In addition to the commands, help instructions are also available on the
following topics:
Custom_Services
Databases
FTP
Gateway_Access
Help_en_Espanol
Help_en_francais
Hints
IBM-VM_BITNET
On-Line_Access
PIR_Distribution
VAX-VMS_BITNET
Because of network gateway communication protocols, there are limitations on
requests sent through gateways. Users not on BITNET or INTERNET who access the
server through local or network gateways should read and carefully follow these
instructions before sending requests. Only mail message requests (not command
messages or files) can be sent through gateways. Because addresses posted on
gateway mail do not always work for the return, before you send requests
through network gateways it is strongly recommended that you first contact
John S. Garavelli (POSTMAST at GUNBRF on BITNET, POSTMASTER at NBRF.Georgetown.EDU on
Internet). We will confirm a return address for you and may instruct you to
use the RETURN command to ensure that your request output will reach you. It
is not usually necessary to do this if you are on BITNET or INTERNET, unless
your system employs a local remailer or your mail program applies a
nonstandard return address (for example a personal name on the FROM: line).
The BITNET network and the network gateways impose strict limits on file size.
Poorly posed database queries may result in output so extensive that it could
not be returned by network mail. Therefore, an output limit of 1000 lines for
each command and 3000 lines for each request is imposed by the PIR server.
The DEPOSIT and QUERY commands, and the SEARCH and SUGGEST commands (in their
multiline form) must be followed by their respective END commands after the
text appearing on the intervening lines. The DEPOSIT command requires, and the
SEARCH command optionally uses, parameters that appear on the same line as the
command. Because these four commands are so complex, users should obtain and
carefully read the help instructions before attempting to use them.
The databases available through the PIR Network Server and their abbreviations
for code specification are as follows:
Abbreviation Database Update Schedule
PIR1 PIR Annotated and Classified Entries approximately biweekly
PIR2 PIR Preliminary Entries approximately weekly
PIR3 PIR Unverified Entries weekly
ALN PIR Alignment Entries semiannually
NRL_3D Brookhaven Data Bank Sequences quarterly
PATCHX MIPS PIR-Supplementary Database quarterly
N NBRF Nucleic
GB* GenBank (TM) as received
GBNEW GenBank (TM) New Entries weekly
EMBL* EMBL as received
In the FASTA output of the SEARCH command the abbreviation for PATCHX is
shortened to PATX and NRL_3D is shortened to NR3D; the longer abbreviation
should be used to retrieve an entry with the GET command. Not all commands
work with all databases; please read the information returned by the command
HELP DATABASES.
The GenBank (TM), GB, and EMBL databases are now divided into sections
corresponding to the sections of their standard releases:
-BCT Bacterial Sequences
-EST EST Sequences
-INV Invertebrate Sequences
-MAM Other Mammalian Sequences
-PHG Phage Sequences
-PLN Plant Sequences
-PRI Primate Sequences
-RNA Struct RNA Sequences
-ROD Rodent Sequences
-SYN Synthetic Sequences
-UNA Unannotated Sequences
-VRL Viral Sequences
-VRT Other Vertebrate Sequences
These databases may be indivually accessed with the USE BASES command
with the database abbreviation and the section abbreviation, for example
USE BASES GBPRI
or all sections of a given database may be accessed with the database
abbreviation and an asterisk, for example
USE BASES PIR*
or
USE BASES GB*
------------------------------------------------------------------------
Dr. John S. Garavelli
Database Coordinator
Protein Information Resource
National Biomedical Research Foundation
Washington, DC 20007
POSTMAST at GUNBRF.BITNETPOSTMASTER at NBRF.Georgetown.Edu