what software do biologists need?

S S Sturrock sss at castle.ed.ac.uk
Thu Mar 18 06:23:19 EST 1993

In article <1993Mar17.033208.22078 at cs.sandia.gov> mccurley at cs.sandia.gov (Kevin S. McCurley) writes:
>In article <1o4nt5INNo48 at srvr1.engin.umich.edu> cash at geneva.csmil.umich.edu (Howard Cash) writes:
>>I say, by all means, let's discuss our wish lists.  Some wishes may come
>You said it.  There are even those of us computer scientists who read
>the news group for the purpose of trying to figure out what the
>concerns of biologists are.  I would encourage you to also think in
>terms of imaginary teraflop computers - they're coming soon and
>biologists should be ready with their wish lists.  This is not to
>exclude your more mundane desires of course.

Just one thing for the computer scientists reading this, for the most part
Teraflops (or Gigaflops at the moment) are really not very useful for the
day to day biology task, ie sequence analysis.  Sure for modelling etc they
become quite handy but integer arithmetic is where it is still at.  I
originally wrote MPsrch for the CM-200 we have here at Edinburgh (oddly
enough it was called CMsrch in those days, nothing like being predictable)
and although the CM-200 claims to have 8 Gigaflops performance (I am sick
and tired of people claiming it is the most powerful supercomputer in the
UK) the it could barely manage 8 million cell updates (8K PEs) for the Smith
Waterman algorithm.  That equates to the same performance as was attained
on a 1024 PE DAP back in 1988!!  And the DAP dates back to the late 70's!
Considering the CM-200 has 8K PEs this is really dreadful.  On the other
hand, porting exactly the same C code onto a smallish MasPar with only 4K
PEs brought about a 4 fold improvment in performance immediately, and
within weeks the code was performing at just under 100 million cell
updates.  This from a machine that costs *FAR* less than the MasPar.  Sure,
for some tasks the CM (and other high Flop rating machines) is great but
what we really need is a machine that can perform huge numbers of integer
operations as well as floating point.  Now that I know a little more about
programming the MasPar machines I have hiked the speed up to 180 million
cell updates for proteins and 285 million for nucleic acid (4K PEs) scaling
perfectly across all maspar machines (1-16K PEs).  Ah the power of mips V 

MPsrch is still available on blitz at de.embl-heidelberg for protein searches
and the nucelic acid version is just about ready, just have to arrange
exactly how to offer it as a mail service.

Shane Sturrock, Biocomputing Research Unit, Darwin Building, Mayfield Road,
University of Edinburgh, Scotland, Commonwealth of Independent Kingdoms.  :-)

Civilisation is a Haggis Supper with salt and sauce and a bottle of Irn Bru.

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net