IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

[Bio-software] Re: please help me understand BLAST tool structure and files

Kevin Karplus karplus at cheep.cse.ucsc.edu
Mon Jul 17 22:34:07 EST 2006

On 2006-07-17, Brannon <brannonking at yahoo.com> wrote:
> I'm confused on BLAST file formats and somewhat on the BLAST tool
> structure itself. I have no experience with BLAST, but I recognize
> BLAST can read several input formats including FASTA.
> Assume I'm using the latest version of BLAST. It seems to me there
> would be three file stages. First would be the input files to be
> processed with some heuristical program. Second would be the output
> files from that tool; these output files would also be the input files
> to a tool that would produce the exact alignment. So the third stage
> files would be the alignment files themselves. Is that even remotely
> close to reality?
> What I really want to know is the file format of the stage two files --
> the output of the BLAST tools before they do the sequence alignment.
> Where do I get that information?

There are two different versions of BLAST, with two different file structures.
There is "wu-blast" from Washington University, and NCBI Blast from NCBI.

I believe that the NCBI version handles bigger databases and has been
upgraded more assiduously than the wu-blast version.  I used to use
both, but have switched to using exclusively NCBI blast.

The formatdb command converts fasta files to a bunch of files
(different formats for nucleic acids and proteins).

I found formatdb.html on the web with the following information:

    DISCLAIMER: The internal structure of the BLAST databases is
    subject to change with little or no notice.  The readdb API should
    be used to extract data from the BLAST databases.  Readdb is part
    of the the NCBI toolkit
    (ftp://ncbi.nlm.nih.gov/toolbox/ncbi_tools/), readdb.h contains a
    list of supported function calls.

(the double "the" is in the original)

Kevin Karplus 	karplus at soe.ucsc.edu	http://www.soe.ucsc.edu/~karplus
Professor of Biomolecular Engineering, University of California, Santa Cruz
Undergraduate and Graduate Director, Bioinformatics
(Senior member, IEEE)	(Board of Directors & Chair of Education Committee, ISCB)
life member (LAB, Adventure Cycling, American Youth Hostels)
Effective Cycling Instructor #218-ck (lapsed)
Affiliations for identification only.

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net