I'm interested in trading war stories and tips about how to deal with
big ACEDB databases. GrainGenes recently got big and we're feeling like
we're bumping our head. database/block*.wrm is about 2.5 GB. Mainly
due to 500K Sequence records.
Today I added a small new .ace file and ran into a Unix "too many open files"
error, where the default limit is 64. I increased the limit and loaded the
file, no problem. But then I wondered if increasing the size of the
block*.wrm files wasn't a better solution. This is defined in
//Keyword : type hostname partition_name file_name max_size offset
PART : 1 local ACEDB block1.wrm 5000 0
PART : 1 local ACEDB block2.wrm 20000 0
PART : 1 local ACEDB block3.wrm 50000 0
PART : 1 local ACEDB block4.wrm 50000 0
Fifty files are defined in the distribution (4_9i) wspec, 50000 bytes each
(except the first two). So I changed them to 100000 and reinitialized.
The result to report is that the time for reloading the database improved
from 7.5 hr to 4.25. (On a Sun Ultra10, 300MHz, 1 GB RAM, ca. 450 MB memory
used by the ACEDB process during loading.)
Other factors I'm curious about and have been tinkering a little with include
wspec/cachesize.wrm, ?Text vs. Text, prodigal XREFing. Any suggestions