Questions on build Genbank files for Blast search

Brian Fristensky frist at cc.umanitoba.ca
Tue Feb 23 14:43:10 EST 1999

gmei at genetics.com wrote:
> We are planning to build a local copy of Genbank database to do Blast search
> in house on an Alpha Digital Unix machine. We do not have GCG package and are
> not planning to buy it (at least for now).
A free set of programs for handling databases such as GenBank
and PIR is the XYLEM package:


which can take the .seq files from GenBank and split them into
separate files for annotation, sequence, and an index that
tells the locations of sequences in these files by name
and accession number. The sequence files are in FASTA
format, which, I believe, is readable by BLAST (and, obviously,
the FASTA programs). Key programs include:
  SPLITDB - reformats GenBank or PIR files, as described above.
  FINDKEY - searches for keywords in annotation, and retrieves
        list of sequence names
  FEATURES - extracts any features (eg. exon, intron, CDS, promoter)
        from a list of GenBank entries and returns corresponding
        DNA sequences.
  RIBOSOME - translates batches of DNA sequences
  SHUFFLE - randomizes sequences for statistical studies
The XYLEM package is written in C, translated from Pascal. 
Most programs are run from C-shell scripts.
It has successfully compiled under Sun Solaris, HP/UX, and
IBM/AIX, although I haven't had an opportunity to test it
under Digital Unix. 

If you are installing XYLEM, you may wish to download
the BIRCH framework, a hierarchical directory structure
that we use for our BIRCH Bioinformatics facility.


In addition to giving you a ready-made directory
structure, the BIRCH framework has a number of
handy scripts for administrative tasks such 
as setting up user accounts to
use the central database and programs. 

Importantly, BIRCH includes scripts which automate
the downloading and formatting of GenBank and
PIR. Just make sure you have enough disk space,
start the script, and the download will run
overnight, reformatting as it goes. 

> Guang Mei
> gmei at 3rdmill.com
