A retrieval system for nucleic acid sequence banks

PERRIERE at cism.univ-lyon1.fr
Thu Sep 6 09:50:00 EST 1990


  We wrote a database structure and retrieval software for use with either the
GenBank or EMBL nucleic acid sequence data collections : ACNUC. The nucleotide
and textual data furnished by both collections are each restructured in a
database that allows sequence retrieval on a multi-criterion basis. The main
selection criteria are : species (or higher order taxon), keyword, reference,
journal, author,and organelle ; all logical combinations of these criteria can
be used. Direct access to sequence regions that code for a specific product
(protein, tRNA or rRNA) is provided. A Versatile extraction procedure copies
selected sequences, or fragment of them, from the database to user files (GCG
format) suitable to be analysed by user-supplied applications programs. A
detailed help mechanism is provided to aid the user at any time during the
retrieval session.

  We could send by e-mail a package containing the source code of the retrieval
program (query), and a subset of GenBank release 64 (new format) containing all
E. coli sequences. The version furnished is dedicated to all UNIX systems with
FORTRAN compilers. It works without modifications on SUN workstations under
SunOS from 4.0 to 4.1. With other computers some modifications could be neces-
sary, in this case a set of instructions for adaptation is furnished in the

  The whole database is splitted in ten archive files of ~220 Kbytes correspon-
ding to a single compressed and uuencoded (binhexed) tarfile.

  A vertebrate subset of GenBank release 64 is also available. As the size of
this subset is around 60 Mbytes, if you want to get a copy of it, please send
us a standard 600 feet QIC.

  Guy Perriere

 Guy Perriere                                e-mail : perriere at frcism51.bitnet
 Laboratoire de Biometrie (Bat. 741)
 Universite Claude Bernard - Lyon 1
 69100 Villeurbanne (France)

