I'm forwarding this announcement on to b.s with Tom's approval.
Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.ukhttp://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK
> Hi,
> we've compiled a (Solaris 2.7) 64-bit version of formatdb and it's
> on our FTP site at ftp://ncbi.nlm.nih.gov/blast/temp/formatdb64.sol> This binary successfully formatted a FASTA file containing 2.6 billion
> base-pairs and blastall was able to search it. I used the following
> command-line options:
>> formatdb -i ntest -p F -v 2000000000
>> Note that the 2000000000 is required for this and that you will then have
> more than one set of BLAST databases, as well as an alias file describing
> how to search these databases. The reason for using database volumes,
> as opposed to simply making the indices in the BLAST databases large
> enough to handle all conceivable databases with an eight-byte 'integer',
> is that this would have doubled the size of the indices for all
> searches no matter how small the database. Hence we decided to break down
> very large FASTA files into a couple of databases. This process can also
> be inverted; a user could manually write an alias file (with a name like
> 'ntest.nal') to combine two databases behind the scenes. You can look
> at the format of the *.nal file that is produced when you run formatdb.
>> There are still a couple of limitations (or bugs) that I can think of.
> One is that our command-line parser only works on integers up to about
> 2 billion (2**31), which explains why I used 2000000000 rather than
> 4000000000 above. We hope to lift this limitation by the next toolkit
> release. Formatdb also handles many numbers as unsigned 4-byte integers,
> limiting anything it can process to 2**32. We've lifted this limitation
> on blastall so I'm sure we'll be able to do it for formatdb also, hopefully
> by the next release.
>> Please let 'blast-help' know if this addresses your immediate concerns
> and we might start including 64-bit formatdb in the archives. Unfortunately
> these binaries will not work under 2.6.
>> regards,
>> Tom Madden, Ph.D.
> National Center for Biotechnology Information
> National Library of Medicine
> National Institutes of Health
> Bldg. 38A, Rm. 8N-805
> 8600 Rockville Pike
> Bethesda, Maryland 20894 USA
>> 301-435-5994
> 301-480-9241 FAX
>>madden at ncbi.nlm.nih.gov>>> Addendum:
>> We'll obviously need to do the same thing for IRIX. I'm not sure if we
> could do the same for LINUX (I've never heard of LINUX64, but maybe I
> just don't know where to look). DEC Alpha should work without a change.
>> Tom
---