IUBio

Solaris 64 bit version of BLAST's formatdb

Gary Williams gwilliam at hgmp.mrc.ac.uk
Mon Nov 22 10:33:18 EST 1999


I'm forwarding this announcement on to b.s with Tom's approval.

Gary Williams               Tel: +44 1223 494522  Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK


> Hi,
>         we've compiled a (Solaris 2.7) 64-bit version of formatdb and it's
> on our FTP site at ftp://ncbi.nlm.nih.gov/blast/temp/formatdb64.sol
> This binary successfully formatted a FASTA file containing 2.6 billion
> base-pairs and blastall was able to search it.  I used the following
> command-line options:
> 
> formatdb -i ntest -p F -v 2000000000
> 
> Note that the 2000000000 is required for this and that you will then have
> more than one set of BLAST databases, as well as an alias file describing
> how to search these databases.  The reason for using database volumes,
> as opposed to simply making the indices in the BLAST databases large 
> enough to handle all conceivable databases with an eight-byte 'integer',
> is that this would have doubled the size of the indices for all
> searches no matter how small the database.  Hence we decided to break down
> very large FASTA files into a couple of databases.  This process can also
> be inverted; a user could manually write an alias file (with a name like
> 'ntest.nal') to combine two databases behind the scenes.  You can look
> at the format of the *.nal file that is produced when you run formatdb.
> 
> There are still a couple of limitations (or bugs) that I can think of.
> One is that our command-line parser only works on integers up to about
> 2 billion (2**31), which explains why I used 2000000000 rather than 
> 4000000000 above.  We hope to lift this limitation by the next toolkit
> release.  Formatdb also handles many numbers as unsigned 4-byte integers,
> limiting anything it can process to 2**32.  We've lifted this limitation
> on blastall so I'm sure we'll be able to do it for formatdb also, hopefully
> by the next release.
> 
> Please let 'blast-help' know if this addresses your immediate concerns
> and we might start including 64-bit formatdb in the archives.  Unfortunately
> these binaries will not work under 2.6. 
> 
> regards,
> 
> Tom Madden, Ph.D.
> National Center for Biotechnology Information
> National Library of Medicine
> National Institutes of Health
> Bldg. 38A, Rm. 8N-805
> 8600 Rockville Pike
> Bethesda, Maryland  20894  USA
> 
> 301-435-5994
> 301-480-9241 FAX
> 
> madden at ncbi.nlm.nih.gov
> 
> 
> Addendum:
> 
> We'll obviously need to do the same thing for IRIX.  I'm not sure if we
> could do the same for LINUX (I've never heard of LINUX64, but maybe I
> just don't know where to look).  DEC Alpha should work without a change. 
> 
> Tom
---





More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net