open-source software for bioinformatics (was Re: Unix vs Linux - the movie.)

Ted Byers rtbyers at bconnex.net
Tue Aug 1 16:47:53 EST 2000

John S. J. Anderson <jacobs+usenet at genehack.org> wrote in message
news:87wvi3k1wc.fsf at genehack.org...
> Hash: SHA1
> >>>>> "Don" == Don Gilbert <gilbertd at bio.indiana.edu> writes:
> Don> But that still doesn't argue that pressure need be put on authors
> Don> to conform to providing full access to their work.  Better that
> Don> we argue the merits of sharing code, and try to find incentives
> Don> to promote sharing, than that we try to remove incentives for new
> Don> works in the area.
> Don, what are your feelings about journals requiring submission of
> source code for peer review as a prerequisite for publication?
> (Ignoring, for the moment, all the potential complications about code
> theft, subsequent sale of the code, etc.)
Hi guys,

I'm not Don  :-), but I have relevant experience you might find useful.  I
have occassionally been asked, in a commercial setting, to review code for a
variety of purposes.  Since I was not employed by the company that developed
the code, the lawyers who arranged the request simply had me sign a
non-disclosure agreement.  It is not so hard to protect code while
concommitantly managing a peer review process, and one of the
responsibilities I would place on journal editors is that they arrange for
such protections as a matter of policy, and that they should they publish
such a policy.  Therefore, I would suggest two things.  First, if you don't
trust your peers, or at least those responsible for doing peer reviews,
don't publish a paper based primarily on code unless you are comfortable, in
the worst case scenario, with giving your code away.  Second, when it is
necessary to both do a peer review and provide some range of legal
protections for the code, have the reviewers sign an appropriate document
that provides the required protections (and of course, if violated provides
the relevant legal remedies).

Algorithm patents are rarely used and never welcomed - none of the
developers I know like them at all (and usually describe those who get them
in terms that ought not be repeated in a public forum).  A good example that
comes to mind is the Unisys patent rights on a portion of the algorithms
used to produce GIFs.  While having a negative dimension to it, with regard
to producing sofware that must store graphics, it spurred on development to
find a suitable replacement, leading to the PNG format, which I am told is
technically superior to the GIF.  I'll bet that anyone who patents an
algorithm will annoy a number of developers enough that it will not be long
before some of them have developed a better algorithm.

In my own work, I try to keep experimental code separate from commercial
development (i.e. my experimental code, produced for the purpose of
scientific publication is normally created to be narrowly focussed on the
science of interest, and would never be appropriate for commercial
development, while my application development either uses well established
methods or adaptations of my research results to a commercial situation).
It would normally be a major undertaking to produce a commercial application
from code that I would publish for a scientific paper even though it would
be more than adequate for the purpose of the scientific paper.  This is
possible because the functional requirements for programs produced in
computational experiments are generally quite different from those required
for commercial purposes (the user interface, for example is almost an
afterthought for my R&D code, whereas it is the place to start when
developing commercial software, where the normal procedure is to begin with
use case scenarios modelling what tasks the user needs to do and how, and
only then add in the code required to support those tasks).

A perfect example involves a library I developed to analyzing the dynamics
of nonlinear systems.  At present, it remains unpublished because of a
couple unresolved issues that lead me to regard it as incomplete.  To use
it, you would have to be an expert programmer, well familiar with numerical
methods, various abstraction types (or design paradigms to use Coplien's
terminology), nonlinear dynamics and modelling.  This is well beyond the
capabilities of even those graduate students I have known in life science
departments who could write their own code for doing multivariat statistical
analyses.  To take this code and produce a commercial application would be
an enormous undertaking, even for me, and I already developed the library.
For this kind of application, developing a user interface that could readily
be used by those biologists I know would be very difficult, and these are
typically brilliant, computer literate biologists (because of the sort of
research and researcher I tend to gravitate toward).  Even if I were to
publish it in its entirety, it would take several years to develop a
commercial application from it.

Just a thought.



rtbyers at bconnex.net

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net