William R. Pearson <wrp at alpha0.bioch.virginia.edu> wrote:
>>We have not looked into SMP vs Beowulf exhaustively, but we have quite
>a bit of experience.
>>(1) SMP is far easier to configure and run than PVM (or MPI or
> others). You just run the program; if its threaded SMP, it runs
> faster. SMP programs are also much easier to develop and debug.
There are a couple of points to make here. 1) MPI is far more
efficient than PVM. No-one should be using PVM these days. 2) MPI is
more flexible than threads in that an MPI version of a program can still
be run on an SMP machine, as well as on a distributed network.
Programs like BLAST and FASTA have a problem in that their I/O
requirements are large, and this can be a real performance problem on a
For example, you could think of implementing your parallel program by
giving each MPI process part of the database to work on. The problem
there is that you have a large overhead in getting the database to the
processor. Ethernet is too slow, and will destroy any performance gain
from the parallel code.
A better solution, easier to implement, and probably more useful for
most purposes, is a workstation farm with each node having a local copy
of all the target databases, and run normal single threaded blast on
each. For large scale work, you typically want to blast lots of
sequences against several databases, so such coarse grained
parallelisation is fine. You just need some way of distributing the
blast jobs to your farm. You can either do this with some fairly
trivial perl scripting, or you can use some more flexible commercial
offering. I can highly recommend platform computing's LSF package.
It's expensive, but it extremely good at managing workstation farms, in
particular with cycle stealing from machines when they're idle.
Using LSF at the University of Cambridge, I got 100 %CPU utilisation on
a 20 workstation farm. These were interactive workstations too; people
doing NMR spectrum assignment at the workstations weren't even aware
their machines were also performing highly CPU intensive analysis jobs
in the background. Efficient use of the workstations like this
ultimately saved money, since they realised that they no longer needed
to buy further machines.