open-source software for bioinformatics (was Re: Unix vs Linux - the movie.)

James Bonfield jkb at arran.mrc-lmb.cam.ac.uk
Thu Aug 10 05:59:30 EST 2000

In <87snsed3f4.fsf at genehack.org> "John S. J. Anderson" <jacobs+usenet at genehack.org> writes:

> jkb at arran.mrc-lmb.cam.ac.uk (James Bonfield) writes:
> > Indeed. Our assembly editor, gap4, is more like 200,000 lines
> > (including a few of our own libraries). Undoubtably parts of poorly
> > written too; and I contend that some parts are well written -
> > perhaps by fluke :-)
> How much 'core' code versus 'interface'?

It's not immediately easy to tell as much of the code resides in the same
directories (although obviously different filenames). In general though, most
C code is core and most interface is Tcl. The glue logic (C) is probably
termed interface too.

The main gap4 directory has about 90K lines of C (perhaps 10K of that is
interface) and 18K lines of Tcl (nearly all interface).

The other bits are libraries, some of which is interface, but most of which
isn't (eg file formats, dynamic arrays, IO, database handling, etc).

Anyway - none of these separate figures are really important except the total; 
that's what would need to be "reviewed" after all.

> And I'm sure a few people would argue that well written code doesn't
> happen by fluke... 8^)

Ok, I'm inclined to agree. There are some bits of code which end up being used 
in ways not originally thought of and yet still work perfectly. To me that's a 
good indication of good design (although it may still look ugly). As a person
with interests in IOCCC (obfuscated C) I know I _can_ write ugly code, but I
hope this also teaches me what to avoid. There is no 'deliberately'
obfuscated code in our software :-)

> Command lines don't _have_ to be horrible, and can actually be quite
> nice -- if you're the type of person who understand pipe lines, and
> why they are a Good Thing. OTOH, authors who don't dump some
> documentation in response to a 'foo -h' or 'foo --help' should be
> smacked about the head and shoulders.

I couldn't agree more. Unfortunately most of our users would not.
We tried to solve this by combining the two - a good user interface with an
underlying command-line scripting language. It's not always easy though and
documentation often suffers (for the command line stuff).

James Bonfield (jkb at mrc-lmb.cam.ac.uk)   Tel: 01223 402499   Fax: 01223 213556
Medical Research Council - Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, England.
Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net