Genbank key search & fetch thru IUBio Gopher hole (long)

doelz at urz.unibas.ch doelz at urz.unibas.ch
Mon Feb 17 01:58:07 EST 1992

In article <1992Feb17.023313.4530 at bronze.ucs.indiana.edu>, gilbertd at sunflower.bio.indiana.edu (Don Gilbert) writes:

> The currently installed Genbank is release 0.01 (January 1992) from
> NCBI, which has some 62,807 sequence entries (nearly 200 megabytes
> of sequence and descriptive data).  This is based on release 70 of
> Genbank plus many entries from Medline added at NCBI.  It was
> obtained by anonymous ftp to ncbi.nlm.nih.gov, cd ncbi-genbank.
I'm just curious ... do these 'many' entries reflect additions to the 
annotation or are these 'real' ? 

> The fields that are indexed from the Genbank Flatfile format are:
>   Locus, Accession, Description, Keywords, Source, Organism, Authors,
>   and Title.
> The index files take up about 40 megabytes, compared to 190 megabytes
> for the sequence files.  It takes about 15-20 minutes on a Sparcstation2
> to index the sequences.  A search for a unique keyword like locus name
> or accession number takes no perceptible time.  A typical keyword query 
> with a handful of matches will take a few seconds, a bit longer if you request
> hundreds or more matches.  This compares to about 4 hours for the GCG 
> program stringsearch running on the same machine with the same query.
Sounds *very* interesting to me. I have the CD ROM of EMBL installed, and 
certainly would like to make indexes available this way also. 

> This software may be of interest to anyone with a Genbank flatfile 
> on disk, and a few spare megabytes for indexing, to give thought to 
> installing Gopher with this indexing software.   

Are you willing to share, sell or otherwise disclose the code? I would 
appreciate to hear about it. 


More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net