Distribution-File:
BIONEWS at genbank.bio.net,
PROTEINS at genbank.bio.net
Announcement of the Protein Identification Resource
Network Request Service
Two commands and access to one new database have been added to the PIR network
request service. The new commands, CROSS and GENE, are described below in more
detail. For reasons of program access, the GenBank database is broken into
three components, GB, GBSUP and GBNEW. The GBNEW database contains the
GenBank weekly update entries. GBSUP contains regular GenBank entries in
a supplemental database (presently these are the GenBank Primate entries)
and GB contains all the other regular GenBank entries. All these GenBank
databases are automatically available through all the commands that can use
them. Particular databases may be selected with the USE BASES command, and
the command
USE BASES GB*
will select all the GenBank databases, and only those databases, for
subsequent database query and retrieval commands.
The National Biomedical Research Foundation Protein Identification Resource
network request service is a full-function fileserver and database query
system. It has been operating since August 1990 and is capable of handling
database queries, sequence searches and sequence submissions, in addition to
fileserver requests. To use this server, request commands should be sent to
FILESERV at GUNBRF on BITNET. The FILESERVer recognizes the following commands
sent either in a mail message, or (if the sender is on BITNET) in command
messages or in a file:
Command Action
------- -----------------------------------------------
ACCESSION list entry codes and titles by accession number
AND combine QUERY commands with Boolean AND
AUTHOR list entry codes and titles by author
BASES list accessible databases
CROSS list PIR entry codes and titles corresponding to
a particular nucleotide sequence database entry
DEPOSIT deposit entry for database submission
END DEPOSIT terminate deposit entry
FEATURE list entry codes and titles by feature table entry
GENE list entry codes and titles for a gene name
GET return entry by entry code
HELP return HELP instructions
HOST list entry codes and titles by host species
INDEX list SENDable files
JOURNAL list entry codes and titles by journal citation
KEYWORD list entry codes and titles by keyword
MEMBER list alignments containing entry code as a member
NOT combine QUERY commands with Boolean NOT
OR combine QUERY commands with Boolean OR
QUERY begin collecting QUERY commands
END QUERY terminate collecting commands and execute QUERY
QUIT ignore the remaining text (E-mail signature blocks)
RETURN change return address for gateway mail
SEARCH search for sequence by FASTA procedure
END SEARCH terminate sequence for searching
SEND send file
SPECIES list entry codes and titles by species
SUGGEST leave suggestion or correction for PIR staff
END SUGGEST terminate suggestion text
SUPERFAMILY list entry codes and titles by superfamily name
TAXONOMY report taxonomy for scientific or common name
TITLE list entry codes and titles by title
USE set databases or dates to use in limited searches
Multiple commands can be sent with one command on each line of a mail message
or file. Commands should NOT be sent on the Subject line of a mail message.
Receipt of command messages and files will be acknowledged immediately. Mail
messages will be acknowledged by return mail.
For help in using any of the commands, send a request of the for
HELP topic
for example
HELP SEARCH
In addition to the commands, help instructions are also available on the
following topics:
Custom_Services
Databases
Gateway_Access
Help_en_Espanol
Help_en_francais
IBM-VM_BITNET
On-Line_Access
PIR_Distribution
VAX-VMS_BITNET
Because of network gateway communication protocols, there are limitations
on requests sent through gateways. Users not on BITNET or INTERNET who
access BITNET through local or network gateways should read and carefully
follow these instructions before sending requests. Only mail message
requests (not command messages or files) can be sent through gateways.
Because addresses posted on gateway mail do not always work for the return,
before you send requests through network gateways it is strongly recommended
that you first contact Dr. John S. Garavelli at POSTMASTER at GUNBRF on BITNET.
We will confirm a return address for you and may instruct you to use the
RETURN command to insure that your request output will reach you. It is not
usually necessary to do this if you are on BITNET or INTERNET, unless your
system employs a local remailer or your mail program applies a non-standard
return address (for example a personal name on the FROM: line).
The BITNET network and the network gateways impose strict limits on file size.
Poorly posed database queries may result in output so extensive that it could
not be returned by network mail. Therefore, an output limit of 1000 lines for
each command and 3000 lines for each request is imposed by the PIR FILESERVer.
The DEPOSIT and QUERY commands must, and the SEARCH and SUGGEST commands may,
be followed by their respective END commands when text appears on intervening
lines. The DEPOSIT command requires, and the SEARCH command optionally uses,
parameters that appear on the same line as the command. Because these four
commands are so complex, users should obtain and carefully read the help
instructions before attempting to use them.
Here is a brief synopsis of each server command.
ACCESSION number
This command will return a list of entry codes and titles for entries with
accession numbers matching the left portion of the accession number provided.
AND
This command performs a Boolean AND operation in a QUERY, using the set of
entries collected by the preceding commands and selecting those that
additionally meet the condition specified in the next command.
AUTHOR name
This command will return a list of entry codes and titles for entries with an
author matching the portion of the author name provided.
BASES
This command will return a list of the accessible databases and the number of
entries each contains. The databases available through the PIR Network Server
and their abbreviations for code specification are as follows:
Abbreviation Database Update Schedule
PIR1 PIR Annotated and Classified Entries quarterly
PIR2 PIR Preliminary Entries approximately bimonthly
PIR3 PIR Unverified Entries weekly
ALN PIR Alignment Entries quarterly
NRL_3D Brookhaven Data Bank Sequences quarterly
N NBRF Nucleic
GB GenBank (TM) as received
GBSUP GenBank (TM) as received
GBNEW GenBank (TM) New Entries weekly
EMBL EMBL as received
Access to these and additional databases can be provided to on-line users.
CROSS number
Use this command to find PIR entries that are the translation products of
nucleotide sequence database entries. This command will return a list of
entry codes and titles for entries in the PIR databases only with a
cross-reference to the accession number provided in one of the nucleotide
sequence databases.
DEPOSIT FORM or DEPOSIT AUTHORIN
submission text
END DEPOSIT
This command will allow the submission of protein sequence entries prepared in
a standard format. The PIR accepts submissions in the electronic version of
the GenBank/EMBL/PIR Data Submission Form, or in the Transaction Protocol
Format of the GenBank AUTHORIN program. This command MUST be followed on the
same line by either FORM or AUTHORIN to indicate the type of deposit, and by
the END DEPOSIT command at the end of the text of the entry. Only one DEPOSIT
command should be sent with each request. A separate form must be submitted
for each sequence. Forms with more than one sequence and requests with more
than one DEPOSIT command cannot be accepted.
It is important that nucleotide sequences including authors' protein sequence
translations be submitted to only to GenBank or EMBL, as appropriate, and not
to the PIR FILESERVer. GenBank and EMBL forward protein sequences to the PIR
International with no further effort required on the part of the author.
FEATURE feature-name
This command will return a list of entry codes and titles for entries in the
PIR databases only with an entry in the feature table matching the portion of
the feature name provided. A list of the features currently in the database
can be obtained by the command SEND FEATURES.
GENE gene-name
This command will return a list of entry codes and titles for entries in the
PIR databases only with an entry in the gene name field matching a portion of
the gene name provided. A minimum of 3 characters must be provided and case
is ignored. Less than three characters can be supplied by enclosing three
characters including spaces within quotation marks.
GET code
This command will return the full text of an entry with the code matching
the code provided. These codes are found in the lists returned by one of
the query commands (ACCESSION, AUTHOR, JOURNAL, FEATURE, HOST, KEYWORD,
SPECIES, SUPERFAMILY or TITLE) or the MEMBER command. The format of the
code is a database abbreviation, a colon, and four to ten alphanumeric
characters. Inside a QUERY, a GET ALL command can be used to return all
the entries selected as a result of the query commands.
HOST host-name
This command will return a list of entry codes and titles for entries in the
PIR databases only with a host name matching the portion of the host name
provided.
INDEX
This command will return a list of the files that can currently be sent by the
PIR FILESERVER using the SEND command.
JOURNAL citation
This command will return a list of entry codes and titles for entries with a
journal citation matching the portion of the citation provided.
KEYWORD words
This command will return a list of entry codes and titles for entries with
any keyword, or portion of a keyword, matching the words provided. You may
provide any number of groups of three or more alphanumeric characters
expected in a single keyword entry.
MEMBER code
The MEMBER command searches the Alignment Database for any alignments that
contain the sequence entries with the corresponding code.
NOT
This command performs a Boolean NOT operation in a QUERY, using the set of
entries collected by the preceding commands and removing from them the entries
that meet the condition specified in the next command.
OR
This command performs a Boolean OR operation in a QUERY, using the set of
entries collected by the preceding commands and adding to them the set of
entries that meet the condition specified in the next command.
QUERY
commands
END QUERY
This is a multi-line command to search for database entries that meet several
criteria simultaneously. The commands between the QUERY command and the
END QUERY command are combined with Boolean operators to form a single database
query. Any of the following commands can be used to form a query: ACCESSION,
AUTHOR, FEATURE, HOST, JOURNAL, KEYWORD, SPECIES, SUPERFAMILY, TITLE.
Each command selects a set of entries from the available databases, then one of
the Boolean operations, AND, OR, NOT, is used to combine that set of entries
with the set selected by the next query command. The USE command can be used
to limit the databases to be searched and the dates of the entries.
CAUTION --- poorly posed and inappropriately formed queries can easily select
a very large number of entries. Please carefully read the help instructions
for this command before attempting to use it.
QUIT
If you use a mail program which automatically attaches a signature block to
every message, use this command to inform the server that all the following
lines should be ignored.
RETURN address
If you are sending mail from a non-BITNET network through a gateway, you may
need to provide a return address different from the one posted on the message
in order for your output to be sent to you correctly. The RETURN command
will allow you to correct your return address.
SEARCH parameters sequence
or
SEARCH parameters
sequence text
END SEARCH
This command will allow a sequence to be compared in a FASTA search
(see W.R.Pearson & D.J. Lipman PNAS (1988) 85:2444-2448) with the PIR
databases. You may send either protein or nucleotide sequences in the IUPAC
standard single letter code; however,only the PIR protein sequence databases
will be searched. Nucleotide sequences will be translated in six reading
frames according to a selectable genetic code, and those translated protein
sequences will be compared against the PIR protein sequence databases. The
SEARCH command may be used in two forms, either on a single line with
parameters and sequence, or on multiple lines with the parameters on the line
with the SEARCH command, followed by lines with the sequence and an END SEARCH
command on the line following the end of the sequence. There are two optional
parameters for the SEARCH command, KTUP and NUC. The KTUP parameter sets the
ktup value for the FASTA program. The NUC parameter specifies that the sequence
is a nucleotide sequence, and can select the genetic code to be used for the
translation of that sequence.
SEND filename
This command will instruct the FILESERVer to send, by separate electronic
transmission, the specified file. A list of the currently available files
can be obtained by using the INDEX command.
SPECIES name
This command will return a list of entry codes and titles for entries with
the species matching the portion of a species name provided. The species
name may be the Latin genus and/or species name, or a common name. Because
the names of some viruses contain the common name of the host species, entry
codes and titles for entries with the species of viruses infecting a species
may also be listed.
Please note: this is not an efficient command for performing a general query
of the PIR databases especially with extensively studied species. For well-
studied species, the TITLE command will be more efficient.
SUGGEST text
or
SUGGEST
text
END SUGGEST
This command will submit the text of your message to an NBRF staff member.
You may use it to suggest modifications or improvements to our FILESERVER
or corrections to the PIR database. You may either place the text on the
same line with the SUGGEST command, or you may use any number of lines for
the text followed by the END SUGGEST command on the line after the last line
of the text.
SUPERFAMILY superfamily-name
This command will return a list of entry codes and titles for entries in the
PIR databases only which belong to that superfamily. Since the domains of
some multidomain proteins are not completely classified, the SUPERFAMILY
command will not necessarily produce a complete list of all entries in a
specific superfamily. A list of the superfamilies currently in the database
can be obtained by the command SEND SUPERFAM.
TAXONOMY taxonomic-name
or
TAXONOMY common-name
This command will report taxonomies for entries in the taxonomic database
currently being used by the PIR and shared with GenBank (TM) and EMBL.
This database is maintained by Dr. Andrzej Elzanowski at the Max-Planck-
Institut fur Biochemie.
You should provide a fully or partially specified name of 1 to 8 words with
a minimum length of 3 letters each. Names at all taxonomic levels containing
those words will be reported. The organelles containing genetic material of
some higher organisms also have entries in this database.
TITLE title
This command will return a list of entry codes and titles for entries with
any portion of a title matching the word provided. You may provide any
number of groups of three or more alphanumeric characters expected in a
single title. PIR titles include protein names, species names and Enzyme
Commission numbers, consequently this command is generally the most efficient
way to perform a general query of the PIR databases.
USE
The USE command is used to select particular databases or dates to be used in
limited searches. Three parameters may be set, the BEFORE date, the AFTER
date and the BASES database list. The corresponding commands are
USE BEFORE date
USE AFTER date
USE BASES database [ + database...]
Dates must be in the form, YYMMDD, where YY represents the last two digits
of the year, MM represents two digits of the month (with a leading zero, if
necessary), and DD represents the two digits of the day of the month (with a
leading zero, if necessary).
For the USE BASES command the set of databases to be used must be entered
as abbreviations on a single line connected by plus signs, "+".
All PIR databases can be used with "PIR*", all GenBank databases can be used
with "GB*", and all databases can be used with "*".
Not all commands work with all databases; please read the information returned
by the command HELP DATABASES.
------------------------------------------------------------------------
Dr. John S. Garavelli
Database Coordinator
Protein Identification Resource
National Biomedical Research Foundation
Washington, DC 20007
POSTMASTER at GUNBRF.BITNET