In article <1992Feb17.023313.4530 at bronze.ucs.indiana.edu>, gilbertd at sunflower.bio.indiana.edu (Don Gilbert) writes:
...
> The currently installed Genbank is release 0.01 (January 1992) from
> NCBI, which has some 62,807 sequence entries (nearly 200 megabytes
> of sequence and descriptive data). This is based on release 70 of
> Genbank plus many entries from Medline added at NCBI. It was
> obtained by anonymous ftp to ncbi.nlm.nih.gov, cd ncbi-genbank.
>I'm just curious ... do these 'many' entries reflect additions to the
annotation or are these 'real' ?
> The fields that are indexed from the Genbank Flatfile format are:
> Locus, Accession, Description, Keywords, Source, Organism, Authors,
> and Title.
>> The index files take up about 40 megabytes, compared to 190 megabytes
> for the sequence files. It takes about 15-20 minutes on a Sparcstation2
> to index the sequences. A search for a unique keyword like locus name
> or accession number takes no perceptible time. A typical keyword query
> with a handful of matches will take a few seconds, a bit longer if you request
> hundreds or more matches. This compares to about 4 hours for the GCG
> program stringsearch running on the same machine with the same query.
>Sounds *very* interesting to me. I have the CD ROM of EMBL installed, and
certainly would like to make indexes available this way also.
> This software may be of interest to anyone with a Genbank flatfile
> on disk, and a few spare megabytes for indexing, to give thought to
> installing Gopher with this indexing software.
>
Are you willing to share, sell or otherwise disclose the code? I would
appreciate to hear about it.
Regards
Reinhard