IUBio

blast 1.4 "too short" for the size of genbank and embl

Catherine Letondal letondal at pasteur.fr
Tue Aug 4 07:25:48 EST 1998


In article <6q6u66$j2v$1 at pegasus.csx.cam.ac.uk>, tjrc1 at mole.bio.cam.ac.uk (Tim Cutts) writes:
>In article <6q6jrt$h8p$1 at desdemone.pasteur.fr>,
>Catherine  Letondal <letondal at pasteur.fr> wrote:
>>
>>Hi,
>>
>>We maintain a copy of genbank (release + updates) as well as embl for
>>blast 1.4 searches. The problem is that these databases have reached a 
>>size that exceeds a 32 bits integer capacity - and this version of blast is
>>mainly based on such types of integer (more exactly 31 bits, for the
>>integers are not unsigned). As a result, malloc of negative 
>>numbers occur.
>>
>>Of course, we also have the blast 2 NCBI and Washington-Univ. versions.
>>We are aware that this version of blast is not maintained anymore at NCBI, but
>>we keep the "old" 1.4 version of blast for compatibility reasons
>>with blast output parsers (like bob, tbob, blast2html, ...). 
>>
>>Updating blast 1.4 sources by replacing the 32 bits types by long integers
>>seem very hazardous ...
>>
>>Does someone here have the same problem, and some solution ?
>
>You don't specify what operating system you are using.  I haven't
>looked at the BLAST 1.4 sources, but file offsets should always use
>the type off_t, rather than int.

Sorry, I forgot. It's a DECalpha 4000 5/466 running Digital UNIX V4.0B.


>
>On IRIX at least, off_t is defined in <sys/types.h>.
>
>If you use this, your program will automatically be able to handle
>file size offsets as large as the OS can handle (for example in
>Solaris 2.x for x<6, off_t is 32 bits, which is why Solaris 2.5 and
>earlier had a maximum file size of 2 GB (approximately)).
>
>Similarly, malloc actually takes an argument of type size_t, not int.
>
>If you change the source to use that, you know that you will not
>end up giving malloc negative numbers, since (at least on IRIX),
>size_t is always an unsigned integer of some size (usually 32 or 64
>bits, depending on your operating system).

On Digital Unix: typedef unsigned long   size_t;

(We just are not sure about the consequences of changing blast 1.4 source code...)

>
>Tim.
>
>
>-- 
>--------------------------------------------------------------------------
>Dr T J R Cutts                                        Tel: +44 1223 333596
>Dept. of Biochemistry, 80 Tennis Court Rd.
>Cambridge, CB2 1GA, UK
>


Catherine LETONDAL,	Institut Pasteur    Service d'Informatique Scientifique
letondal at pasteur.fr	25 rue du Docteur Roux    75724 Paris CEDEX 15 - FRANCE

tel: +33 (1) 40 61 31 91
fax: +33 (1) 40 61 30 80




More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net