Unique Numerical Protein Representation

Tue Jun 2 15:39:00 EST 1992

The PIR staff have had several inquiries recently about a supposedly large
number of redundant sequences in the PIR database.  These questions arose
from an apparently common misinterpretation of a table in a message posted
to GENBANK-BB by Warren Gish at the NCBI on 13 May 1992.

It should be noted that in the table prepared by Warren Gish he stated that
the statistics were progressive and that SWISS-PROT was read before PIR.
That meant that after all the entries in the SWISS-PROT database had been
checked (23,742 entries with 12 redundancies) and stored then the PIR databases
were checked and of those 40,298 entries, there were 21,056 redundancies
AGAINST SWISS-PROT and the PIR itself.  In fact in a subsequent analysis kindly
performed by Warren, the PIR checked against itself had 2872 redundancies.
Most of those redundancies are in the unmerged PIR3 database with some known
redundancies arising when identical sequences occur in different biological
species.  Of the 23,742 entries in SWISS-PROT, somewhere between 2674 and 5546
were not found identically in one of the PIR databases depending on the number
of doubly redundant sequences.
                                 Dr. John S. Garavelli
                                 Database Coordinator
                                 Protein Identification Resource
                                 National Biomedical Research Foundation
                                 Washington, DC  20007
                                 POSTMASTER at GUNBRF.BITNET

More information about the Proteins mailing list

Send comments to us at biosci-help [At] net.bio.net