In article <9309141413.AA10686 at ugene1.abbott.com>, bollingt at ugene1.abbott.com
(Tim Bolling) writes:
>> GeneSeq (Nucleic Acid/Peptide Databanks) (11) each have about 31,000
> GenBank Patent (78) has about 2,700 sequences.
GeneSeq (produced by Derwent and Intelligenetics) has been available for a few
years and should be fairly well-advanced or even complete with respect to
the backlog of patent documents. EMBL and NCBI are not so far: at our
upcoming September release (EMBL 36) we expect to have 5000 nucleotide
sequences and about the same amino acid.
> GeneSeqN (Nucleic Acid db) starts with N00000 and ends with Q40111.
> GenBank Patent starts with A00001 and ends with I07267.
The accession numbers quoted in GeneSeq have no correspondence whatsoever
with those in the DDBJ/EMBL/GenBank/SwissProt accession number scheme. There
will doubtless be entries in GenSeq with the same acc# in the EMBL database
referring to completely different data. The entries with 'A' accession numbers
originate from EMBL.
> And lastly, I did a few test searches and sequences which I found in GeneSeq's
> database where not found in GenBank's and sequences found in GenBank's were
> not found in GeneSeq's (that I really don't understand).
GenBank & EMBL (and the Japanse Patent Office) simply haven't finished the
job yet - EMBL are processing patents from the European Patent Office under
contract which ends in about 6 months time.
We cannot simply extract data from the GeneSeq database to populate
EMBL/GenBank, because of copyright issues.
EMBL Data Library