Tony Nugent (tnugent at gucis.cit.gu.edu.au) wrote:
[...]
: I am in the early stages of developing a program that will analyse
: molecular sequences (nucleotide and amino acids). What functionality
: it will finally have will be largely a matter of time and resources
: (this is for a third-year university project subject, but has every
: possibility of growing in a post-grad honours project too).
[...]
Given your later comments, it can be concluded that you are starting
with fairly basic knowledge without significant help. I would therefore
be very careful with writing yet-another-cute-demo which will be useless
for people who need _complete_ packages. You will be reinventing the
wheel, and you will be failing in producing a package. However, don't
give up too early. Make it a success by
o starting with the _design_
(platform and language independent!)
o continue with the _documentation_
(in real english, no // comments or *.h, but REAL TEXT)
o define your approach, _why_ did you do it this way
(in contrast to the others which you analyzed before starting)
Only in the later or last stages, start coding. Be realistic - C is a
language which is usable on nearly any platform. Do _you_ have the license
and the compiler to run C++ on the current molbio zoo of more than 10
different operating systems? Do you have the time to look into any least-
common-denominator code which gets you off the #ifdefs ? Can you afford
to install all these (even GNU C++ takes significant effort and disk space!)?
Or, wouldn't it be better that you give up and state clearly that this
is too much for a start and that you do it real, right, and complete
on two examples (such as a PC and a UNIX, or a VMS and a Mac, etc).
[... on NCBI documentation]
: This is an *AWESOME* document! (The size and scope of the ncbi
: toolkit is likewise rather awesome:). It is largely a specification
: document for the format of the data structures that programs
: that do this sort of thing should use.
What do you think _your_ documentation will look like? I consider the
NCBI documentation as being excellent for the purpose if you compare it
with other documentation in the public domain. At least you can RTFM
insteadt of RTFS (read the fine manual, and read the fine source, resp).
: I want to write my code in C++, describing sequences as true
: OBJECTS (OOP).
Don Gilbert has put effort in doing this. On his node you'll find
useful material for this purpose. Be careful if you define a sequence
object - many other approaches are around. In particular a genomic
sequence will blast your structures to hell. Why don't you start a step
below and ask for a language to describe a sequence object dynamically
at run-time rather than defining it?
[... on GUIs]
: Can anybody point me in the right direction for such publically
: available code (easy to use, and mostly in C++)?
There are FAQs on this issue (look in news.answers, or ftp to any good
site which stores the FAQs). You might either go for commercial products,
which are excellent in some instances (there were some reviews in recent
computer journals) but, being commercial, are rather expensive, or you
should seriously consider to use the "Vibrant" toolkit from the NCBI.
We have successfully used the latter on VMS, OSF/1, IRIX, SunOS, Windows
and Mac, which covers a reasonable subset of the molbio zoo. You will
have difficulties finding any other PD product out there which is as
useful as Vibrant. It might not show the nicest, richest, fanciest appearance
but it is truely cross-platform compatible (in C).
One last thing: Don't forget the ASCII guys. Two third of all molecular
biologists live in a vt100, computerwise. This might become 50% over
the years but the mass of non-X, non-ethernet sites is still significant.
Maybe that helps.
(I am on vacation currently, out of reach of any computer, and type this
only accidentially. I will not be able to answer comments and flames before
mid-august).
regards
Reinhard
--
+---------------------------+-------------------------------------------+
| Dr. Reinhard Doelz | Tel. x41 61 2672247 Fax x41 61 2672078 |
| Biocomputing | electronic Mail doelz at urz.unibas.ch |
|Biozentrum der Universitaet+-------------------------------------------+