open-source software for bioinformatics (was Re: Unix vs Linux - the movie.)

John S. J. Anderson jacobs+usenet at genehack.org
Tue Aug 8 07:22:46 EST 2000

Hash: SHA1

karplus at cse.ucsc.edu (Kevin Karplus) writes:

> The programs I have been writing lately for bioinformatics are more
> like 13,000 lines, not counting the library we've developed (about
> another 52,000 lines).  

This isn't enough information in the terms we've been talking about. I
don't think anybody wants to peer review your interface code -- just
the core routines that do the heavy lifting. 

How many lines there?

> I suspect that many other bioinformatics programs are that long or
> longer, and many are poorly written.  

This is really the point. All code has bugs, but poorly written code
has _more_ bugs. I might be doing research based on results from this
poorly written code, or (Ghu forbid!) be forced to *use* the stuff --
and that makes me nervous.

> No one is going around checking that the lab protocols described in
> papers are precisely executed---it is assumed that the experimenters
> have normal competence.

Richard Grant dealt with this, and I agree with him -- this is
spurious reasoning. 'Coding' != 'protocol execution' in this
case. 'Coding' == 'protocol _design_'. Protocol execution would be
typing in the command line, or double clicking the icon to start the
program. I _am_ willing to believe that your standard experimenter can
do this; it's the other part that worries me.

> Nor is it required that someone publishing a sequence submit their
> sequencer hardware and software to peer review.  Why are people
> advocating an impossibly higher standard of peer review for
> bioinformatics publications?

Well, speaking only for myself, I *would* like to see the software
that's driving sequencers, and microarray readers, and
PhosphorImagers, and $BIG_BEIGE_BOX_USED_IN_BIOINFO undergo some sort
of review process. 

> I susggest it is because some people do not want to pay for software
> (though they seem to have no objection to paying enormous amounts for
> reagents and lab equipment).  We routinely see requests on bionet
> newsgroups for help in stealing software.

That is certainly not the case for me, and IIRC, I started us down
this thread. It's not about free software, in either the libre or
gratis senses of the word, it's about being able to trust the results
presented in journals without having to figure in some sort of fudge
factor based on the quality of some piece of code that's never seen
the light of day, which very well could be full of bugs due to having
been written by some grad. student while hung-over/pissed at her
boss/upset with his girlfriend/pick-your-minor tragedy.

This issue has zero to do with the sub-population that's so clueless
that they come to bionet.software looking for w4r3z.

> I applaud the programmers who provide open source and am glad to
> encourge more open source.  But requiring that all scientific code
> follow a particular philosophy is a sure way to stifle innovation.

I'm not asking for 'all scientific code' to be open.

I _am_ saying that (a) when results are based on a piece of code, that
code should be reviewed before the results are published in a
peer-reviewed journal and (b) when people are purchasing
$BIG_PIECE_OF_KIT that depends on some closed software, they should
ask $KIT_VENDOR about what steps have been taken to make sure that the
code driving the thing is accurate and of high quality.


- -- 
- ----------------------------------------------------------------------------
           [ John S Jacobs Anderson ]------><URL:mailto:jacobs at genehack.org>
[ Genehack: Not your daddy's weblog ]------><URL:http://genehack.org>
Version: GnuPG v1.0.2 (GNU/Linux)
Comment: Mailcrypt 3.5.5 and Gnu Privacy Guard


More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net