open-source software for bioinformatics (was Re: Unix vs Linux- the movie.)

Ted Byers rtbyers at bconnex.net
Tue Aug 1 15:12:06 EST 2000


You are correct.  When you are working an a program where much of the code
is experimental, you will find a much greater proportion of your design
changing as you proceed than is the case with most COTS.  With most custom
and commercial software, the majority of your design changes (assuming
capable engineers to begin with) will be due to changing functional
requirements, with only a few due to bugs in or limitations of the
development tools.  With experimental code, in many cases, you discover
limitations inherent in the algorithms and various kinds of abstractions
tested, so a much larger proportion of your code is likely to change.
Therefore, it is increasingly essential not only to begin with decent design
documents, but to maintain them, including a changelog with the rational for
the changes, as the project proceeds.

Unfortunately, in many labs, the focus is on the natural science and
computer programming is seen as just a routine exercise to get useful
results.  I don't think I have seen a lab were the development of the
programs used is as carefully documented as the lab or field work.  I have
seen some where they don't even keep backup copies of data and programs
used.  A colleague of mine saved his former PhD. supervisor considerable
embarassmen because, prior to completing his degree, he designed a system
for creating and maintaining backups for his lab.  After he left, another
fellow was hired who claimed to be a biostatistician.  He wrote a SAS
program and produced a draft for a paper that the supervisor could not
understand, so he contacted my colleague for assistance.  When he examined
the program this other fellow wrote, he discovered several things.  First,
the analysis was completely inappropriate for the study (this so called
statistician had never seen a cow and never visited the study site, so he
had no idea that the assumptions of his favourite analysis were not
satisfied).  Second, he was a very sloppy programmer because about a third
of the way through his program he over-wrote his original dataset, and
consequently corrupted all of the datasets involved in the analysis.
Needless to say, the draft was filed under 'g'.  This was a big, expensive
study, costing many millions of dollars just to collect the data.  Had my
colleague not established a backup system for the lab, and instructed the
support staff on how to maintain it, all of the data would have been lost.
My colleague was able to get the raw data from the backup copy the support
staff made, and he did the correct analysis as a favour to his former
supervisor.  The scary thing, apart from the potential disaster that would
have occured had there not been a proper backup system, is that this so
called statistician is now teaching statistics at a small university in the
US midwest.  It is certain that for most of the folk in that lab, the extent
of their computer skills would consist in an ability to use it as a fancy
typewriter.  I think many biologists do not take their computers as
seriously as they take their other equipment.  I was even told by one
professor that I should not even try to become an able programmer: that
instead I should find someone who knows what they are doing to write my
programs for me.  After all, I am a biologist and real biologists don't
write their own programs.  I didn't listent to him, because I didn't agree.



R.E. Byers

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net