SRS indexing in parallel on distributed machines

ICHIYANAGI Yoshihiro ichan-Yoshihiro at im.ac.cn
Thu Nov 9 08:25:22 EST 2000

Dear all,

Now we're using SRS for Bio-mirror to provide bio-mirror
databases with search function.
For SRS we have to make indices of each Databases.
It takes a long time to make indices such as DDBJ(48GB),
GENBANK(39GB) and EMBL(37GB) and so on.
We need a powerful computer with many CPU to parallelize
indexing quickly. It's very expensive.
So I tried to make programs, which can make SRS
Databanks' indices concurrently with distributed machines
using Java, HORB(a kind of Java Object Request Broker)
, NFS, Perl and shell scripts on RedHat 6.2.

I know that SRS has a function, which can parallelize indexing
for a multiprocessor machine. I've made use of this function
(parallelType:files in icarus files) to create proto-type system
on distributed computers. Then got pretty good results.

At this time I tried to make DDBJNEW indices with four Linux
machines. There are 102 files of DDBJNEW on our ftp site.
And also I tried to make Genbank indices with three Linux 
machines. There are 75 files of Genbank on our ftp site.

These results were as follows:
Linux No.1:1CPU(PIII 733MHz) mem512M created 28 files' indices
Linux No.2:1CPU(PIII 600MHz) mem512M created 38 files' indices
Linux No.3:1CPU(Celron500MHz)mem128M created 29 files' indices
Linux No.4:1CPU(PIII 450MHz) mem128M created  7 files' indices
total 102 files

Indexing with parallel on four machines (TIME) : 03 hours 52 minutes
Marging indices on Linux No.1           (TIME) : 00 hours 50 minutes
Linux No.1:                          created 25 files' indices
Linux No.2:                          created 28 files' indices
Linux No.3:                          created 22 files' indices
                                       total 75 files

Indexing with parallel on three machines (TIME) : 03 hours 54 minutes
Marging indices on Linux No.1            (TIME) : 00 hours 39 minutes

No.1 is SRS web server, No.2 is ftp and web server,
No.3 is mail server and No.4 is just my PC.
No.1 is destination server and others are remote agent machines
to make indexing in this system.

Before it took more than 13 hours on No.1 machine.

For remote agent machines destination server must export
DATABANKS data directories to be indexed, and $SRSICA, $SRSDB 
configuration files to make indexing with same configurations,
and $SRSINX directory to abbreviate gathering distributed indices
using mount. These are a little mess to set up.

Please give me some advices,if you're interested in this subject,
and you know similar approaches as mine.

Institute of Microbiology,
Chinese Academy of Sciences


More information about the Bio-srs mailing list

Send comments to us at biosci-help [At] net.bio.net