GENESEQ Vs. GenBank & EMBL patented sequences

stoehr at embl-heidelberg.de stoehr at embl-heidelberg.de
Wed Sep 15 09:41:37 EST 1993

In article <9309141413.AA10686 at ugene1.abbott.com>, bollingt at ugene1.abbott.com
(Tim Bolling) writes:
> GeneSeq (Nucleic Acid/Peptide Databanks) (11) each have about 31,000
> sequences.
> GenBank Patent (78) has about 2,700 sequences.

GeneSeq (produced by Derwent and Intelligenetics) has been available for a few
years and should be fairly well-advanced or even complete with respect to
the backlog of patent documents. EMBL and NCBI are not so far: at our
upcoming September release (EMBL 36) we expect to have 5000 nucleotide
sequences and about the same amino acid.

> GeneSeqN (Nucleic Acid db) starts with N00000 and ends with Q40111.
> GenBank Patent starts with A00001 and ends with I07267.

The accession numbers quoted in GeneSeq have no correspondence whatsoever
with those in the DDBJ/EMBL/GenBank/SwissProt accession number scheme. There
will doubtless be entries in GenSeq with the same acc# in the EMBL database
referring to completely different data. The entries with 'A' accession numbers
originate from EMBL.

> And lastly, I did a few test searches and sequences which I found in GeneSeq's
> database where not found in GenBank's and sequences found in GenBank's were
> not found in GeneSeq's (that I really don't understand).

GenBank & EMBL (and the Japanse Patent Office) simply haven't finished the
job yet - EMBL are processing patents from the European Patent Office under
contract which ends in about 6 months time.
We cannot simply extract data from the GeneSeq database to populate
EMBL/GenBank, because of copyright issues.

Peter Stoehr
EMBL Data Library

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net