IUBio

blast 1.4 "too short" for the size of genbank and embl

Tim Cutts tjrc1 at mole.bio.cam.ac.uk
Tue Aug 4 07:18:14 EST 1998


In article <6q6jrt$h8p$1 at desdemone.pasteur.fr>,
Catherine  Letondal <letondal at pasteur.fr> wrote:
>
>Hi,
>
>We maintain a copy of genbank (release + updates) as well as embl for
>blast 1.4 searches. The problem is that these databases have reached a 
>size that exceeds a 32 bits integer capacity - and this version of blast is
>mainly based on such types of integer (more exactly 31 bits, for the
>integers are not unsigned). As a result, malloc of negative 
>numbers occur.
>
>Of course, we also have the blast 2 NCBI and Washington-Univ. versions.
>We are aware that this version of blast is not maintained anymore at NCBI, but
>we keep the "old" 1.4 version of blast for compatibility reasons
>with blast output parsers (like bob, tbob, blast2html, ...). 
>
>Updating blast 1.4 sources by replacing the 32 bits types by long integers
>seem very hazardous ...
>
>Does someone here have the same problem, and some solution ?

You don't specify what operating system you are using.  I haven't
looked at the BLAST 1.4 sources, but file offsets should always use
the type off_t, rather than int.

On IRIX at least, off_t is defined in <sys/types.h>.

If you use this, your program will automatically be able to handle
file size offsets as large as the OS can handle (for example in
Solaris 2.x for x<6, off_t is 32 bits, which is why Solaris 2.5 and
earlier had a maximum file size of 2 GB (approximately)).

Similarly, malloc actually takes an argument of type size_t, not int.

If you change the source to use that, you know that you will not
end up giving malloc negative numbers, since (at least on IRIX),
size_t is always an unsigned integer of some size (usually 32 or 64
bits, depending on your operating system).

Tim.


-- 
--------------------------------------------------------------------------
Dr T J R Cutts                                        Tel: +44 1223 333596
Dept. of Biochemistry, 80 Tennis Court Rd.
Cambridge, CB2 1GA, UK




More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net