Building Mol Biol programs and the NCBI toolkit...

Reinhard Doelz doelz at comp.bioz.unibas.ch
Sun Jul 24 14:17:19 EST 1994

Tony Nugent (tnugent at gucis.cit.gu.edu.au) wrote:

: I am in the early stages of developing a program that will analyse
: molecular sequences (nucleotide and amino acids).  What functionality
: it will finally have will be largely a matter of time and resources
: (this is for a third-year university project subject, but has every
: possibility of growing in a post-grad honours project too).


Given your later comments, it can be concluded that you are starting 
with fairly basic knowledge without significant help. I would therefore
be very careful with writing yet-another-cute-demo which will be useless
for people who need _complete_ packages. You will be reinventing the 
wheel, and you will be failing in producing a package. However, don't 
give up too early. Make it a success by 
	o starting with the _design_
		(platform and language independent!)
	o continue with the _documentation_
		(in real english, no // comments or *.h,  but REAL TEXT)
	o define your approach, _why_ did you do it this way
		(in contrast to the others which you analyzed before starting)

Only in the later or last stages, start coding. Be realistic - C is a 
language which is usable on nearly any platform. Do _you_ have the license 
and the compiler to run C++ on the current molbio zoo of more than 10 
different operating systems? Do you have the time to look into any least-
common-denominator code which gets you off the #ifdefs ?  Can you afford
to install all these (even GNU C++ takes significant effort and disk space!)?
Or, wouldn't it be better that you give up and state clearly that this 
is too much for a start and that you do it real, right, and complete 
on two examples (such as a PC and a UNIX, or a VMS and a Mac, etc). 

[... on NCBI documentation]

: This is an *AWESOME* document!  (The size and scope of the ncbi
: toolkit is likewise rather awesome:).  It is largely a specification
: document for the format of the data structures that programs
: that do this sort of thing should use.

What do you think _your_ documentation will look like? I consider the 
NCBI documentation as being excellent for the purpose if you compare it 
with other documentation in the public domain. At least you can RTFM 
insteadt of RTFS (read the fine manual, and read the fine source, resp). 

:     I want to write my code in C++, describing sequences as true
:     OBJECTS (OOP).

Don Gilbert has put effort in doing this. On his node you'll find 
useful material for this purpose. Be careful if you define a sequence 
object - many other approaches are around. In particular a genomic 
sequence will blast your structures to hell. Why don't you start a step 
below and ask for a language to describe a sequence object dynamically 
at run-time rather than defining it?  

[... on GUIs]

:     Can anybody point me in the right direction for such publically
:     available code (easy to use, and mostly in C++)?

There are FAQs on this issue (look in news.answers, or ftp to any good 
site which stores the FAQs). You might either go for commercial products, 
which are excellent in some instances (there were some reviews in recent 
computer journals) but, being commercial, are rather expensive, or you 
should seriously consider to use the "Vibrant" toolkit from the NCBI. 
We have successfully used the latter on VMS, OSF/1, IRIX, SunOS, Windows
and Mac, which covers a reasonable subset of the molbio zoo. You will 
have difficulties finding any other PD product out there which is as 
useful as Vibrant. It might not show the nicest, richest, fanciest appearance
but it is truely cross-platform compatible (in C). 
One last thing: Don't forget the ASCII guys. Two third of all molecular
biologists live in a vt100, computerwise. This might become 50% over 
the years but the mass of non-X, non-ethernet sites is still significant. 

Maybe that helps. 
(I am on vacation currently, out of reach of any computer, and type this
only accidentially. I will not be able to answer comments and flames before


  |    Dr. Reinhard Doelz     | Tel. x41 61 2672247    Fax x41 61 2672078 |
  |      Biocomputing         | electronic Mail       doelz at urz.unibas.ch |
  |Biozentrum der Universitaet+-------------------------------------------+

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net