The PIR staff have had several inquiries recently about a supposedly large
number of redundant sequences in the PIR database. These questions arose
from an apparently common misinterpretation of a table in a message posted
to GENBANK-BB by Warren Gish at the NCBI on 13 May 1992.
It should be noted that in the table prepared by Warren Gish he stated that
the statistics were progressive and that SWISS-PROT was read before PIR.
That meant that after all the entries in the SWISS-PROT database had been
checked (23,742 entries with 12 redundancies) and stored then the PIR databases
were checked and of those 40,298 entries, there were 21,056 redundancies
AGAINST SWISS-PROT and the PIR itself. In fact in a subsequent analysis kindly
performed by Warren, the PIR checked against itself had 2872 redundancies.
Most of those redundancies are in the unmerged PIR3 database with some known
redundancies arising when identical sequences occur in different biological
species. Of the 23,742 entries in SWISS-PROT, somewhere between 2674 and 5546
were not found identically in one of the PIR databases depending on the number
of doubly redundant sequences.
------------------------------------------------------------------------
Dr. John S. Garavelli
Database Coordinator
Protein Identification Resource
National Biomedical Research Foundation
Washington, DC 20007
POSTMASTER at GUNBRF.BITNET