IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

[Bio-software] Re: please help me understand BLAST tool structure and files

Scott Harper harper at vt.edu
Tue Jul 18 07:55:14 EST 2006

On Mon, 17 Jul 2006 20:34:07 -0700, Kevin Karplus wrote:

> On 2006-07-17, Brannon <brannonking at yahoo.com> wrote:
>> What I really want to know is the file format of the stage two files --
>> the output of the BLAST tools before they do the sequence alignment.
>> Where do I get that information?
> There are two different versions of BLAST, with two different file
> structures. There is "wu-blast" from Washington University, and NCBI Blast
> from NCBI.

If you are looking for information regarding the internals of the formatdb
output, this page <http://blast.wustl.edu/blast/dbfmts.html> contains
pointers to both the NCBI and WU formats.  As noted in Kevin's response,
the formats are subject to change and may differ slightly from the listed

The formatdb command takes FASTA input and turns it into the binary input
files for blast alignment with bl2seq or blastall (or one of several
other tools that take blast binary format input).  It produces a set of
database-like files that include a sequence file, a header (sequence name)
file, and an index file.  The index provides guidance when accessing the
sequence and header files.  The indicated file format documentation
does a pretty good job of describing the index and sequence files, but
falls a bit short documenting the header file.  If you are interested -
the indicated offset postion in the header looks something like 0x(30 80
30 80), followed by 0x1a (perhaps with some other data), then the name
length in one or two bytes, the ascii name, and finally some extra (fill?)

 . Dr. Scott Harper
 . Adaptive Genomics Corp.
 . 620 N. Main St, Suite 103
 . Blacksburg, VA 24060
 . Scott.Harper at AdaptiveGenomics.com, 540-552-2700

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net