Searching for accession numbers with GCG package

Don Katz - Genetics Computer Group dkatz at GCG.COM
Mon Jun 18 09:31:14 EST 1990

    As of Version 6.2 of the GCG package, our database formatting utilities
prepend the accession number to the definition line of the entry. This allows
users to use STRINGS to search the definition line for the accession number,
which is very fast. Sites which receive the Genbank flat files can reformat the
databases with the new utilities to obtain this capability. Sites which receive
our database update service have already received one update (containing
release 63 of Genbank) with the accession number on the definition line.

    If the databases at a site do not contain the accession number on the
definition line, a search for entries by accession number must use
STRINGS to search the complete  database records. Searching all the entries in
GenBank and EMBL this way is time-consuming, but the time can be reduced if
one only searches a single section of GenBank. New entries in GenBank, 
found in the Unannotated section, contain the accession number in the entry 
name, so using the command...  
$ names UN:*XXXXX 
where XXXXX is the accession number, may find the desired entry.

    Version 7.0, our next major release, will allow accession numbers to
be used to identify the sequence anywhere that the name of the entry would be
used now. 

Don Katz

Donald Katz, Ph.D                     email: dkatz at gcg.com (Internet)
Genetics Computer Group               phone: (608) 231-5200                 
575 Science Dr., Suite B              fax:   (608) 231-5202
Madison, WI 53711

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net