IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

big databases

Dave Matthews matthews at greengenes.cit.cornell.edu
Thu Sep 12 06:08:24 EST 2002


Hi folks,

I'm interested in trading war stories and tips about how to deal with
big ACEDB databases.  GrainGenes recently got big and we're feeling like
we're bumping our head.  database/block*.wrm is about 2.5 GB.  Mainly
due to 500K Sequence records.

Today I added a small new .ace file and ran into a Unix "too many open files"
error, where the default limit is 64.  I increased the limit and loaded the
file, no problem.  But then I wondered if increasing the size of the
block*.wrm files wasn't a better solution.  This is defined in
wspec/database.wrm, e.g.

//Keyword : type hostname partition_name file_name          max_size offset
PART :       1   local    ACEDB          block1.wrm  5000      0
PART :       1   local    ACEDB          block2.wrm  20000     0
PART :       1   local    ACEDB          block3.wrm  50000     0
PART :       1   local    ACEDB          block4.wrm  50000     0
.., 

Fifty files are defined in the distribution (4_9i) wspec, 50000 bytes each
(except the first two).  So I changed them to 100000 and reinitialized.

The result to report is that the time for reloading the database improved
from 7.5 hr to 4.25.  (On a Sun Ultra10, 300MHz, 1 GB RAM, ca. 450 MB memory
used by the ACEDB process during loading.)

Other factors I'm curious about and have been tinkering a little with include
wspec/cachesize.wrm, ?Text vs. Text, prodigal XREFing.  Any suggestions
welcome!

- Dave
---





More information about the Acedb mailing list

Send comments to us at biosci-help [At] net.bio.net