open-source software for bioinformatics (was Re: Unix vs Linux - the movie.)

John S. J. Anderson jacobs+usenet at genehack.org
Mon Aug 14 22:25:01 EST 2000

Hash: SHA1

jkb at arran.mrc-lmb.cam.ac.uk (James Bonfield) writes:

> In <87snsed3f4.fsf at genehack.org> "John S. J. Anderson"
> <jacobs+usenet at genehack.org> writes:

(Apologies for the delay in responding...)

[ gap4 codebase size ]
> > How much 'core' code versus 'interface'?
> The main gap4 directory has about 90K lines of C (perhaps 10K of
> that is interface) and 18K lines of Tcl (nearly all interface).
> The other bits are libraries, some of which is interface, but most
> of which isn't (eg file formats, dynamic arrays, IO, database
> handling, etc).

So, about 25% interface, 75% 'real code'. Is that fair?

> Anyway - none of these separate figures are really important except
> the total; that's what would need to be "reviewed" after all.

Well, that's going to vary from case to case, really. Yes, if you
publish a paper that's dependent on the whole 108 kLOC, then it should
all be reviewed. If you publish several papers, documenting the
ongoing assembly of this large piece of software, then it's a more
manageable job.

I'm also not convinced that "it's going to be really hard" is a
convincing counter argument to the following logic:

   Given that:
   Peer review is done to make sure the conclusions of papers are
   And given that:
   For some papers, software plays a critical role in determining the
   conclusions of the paper.

   And, finally, given that:
   It is not possible (or, at least, it is orders of magnitude more
   difficult) to determine if a given piece of software produces the
   'correct'/intended results without access to the source code.

   Access to the source code of software used in reaching the
   conclusion(s) of a paper is required in order for a proper,
   thorough peer review of the paper.

What am I missing?

> Ok, I'm inclined to agree. There are some bits of code which end up
> being used in ways not originally thought of and yet still work
> perfectly. To me that's a good indication of good design (although
> it may still look ugly). As a person with interests in IOCCC
> (obfuscated C) I know I _can_ write ugly code, but I hope this also
> teaches me what to avoid. There is no 'deliberately' obfuscated code
> in our software :-)

Well, deliberate obfuscation is in some ways better: if you know it
was fscked up on purpose, you can at least assume a certain level of
competence (and maliciousness) on the part of the coder. When you're
not sure, you can't tell if the coder was terribly brilliant, or just


- -- 
- ----------------------------------------------------------------------------
           [ John S Jacobs Anderson ]------><URL:mailto:jacobs at genehack.org>
[ Genehack: Not your daddy's weblog ]------><URL:http://genehack.org>
Version: GnuPG v1.0.2 (GNU/Linux)
Comment: Mailcrypt 3.5.5 and Gnu Privacy Guard


More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net