IUBio

readseq in C++/Java - comments?

Don Gilbert gilbertd at bio.indiana.edu
Mon Jul 14 18:03:02 EST 1997


I would like to hear from biocomputing managers and server 
maintainers and others who use sequence glue tools like readseq and
need to compile from scratch.  Do you have concerns about C++
as a portable source?

I disagree with David's comments below.  In my experience
it is very portable.  I've been using C++ on multiple systems
for several years now and haven't had any problems getting the source 
to compile with the same compiler set I use for C, including
  Macintosh:  Codewarrior C/C++, MPW C/C++, Symantec C/C++
  MSWindows:  Symantec C/C++, Borland C/C++
  Solaris (Sparc/X86): Sun cc/CC
  SGI Irix: SGI cc/CC
  Dec OSF/ Dec Unix:  cc/cxx
  All unix platforms (e.g., Linux): Gnu gcc/g++
  
None of these current compilers use the old AT&T C-Front
preprocessor that did have various problems, but even that
provided a standard C++ compiler, you just had to avoid
some of the later extensions to C++ like templates and exceptions.

The only hassle I have with getting C++ portable is that
some of the above Unix compilers are very insistant that you use
a limited set of file suffixes for your C++ code, and the
suffix set does not intersect (e.g., SGI's CC requires a different
suffix that Sun's... so my make scripts have to rename a
bunch of files, how stupid.  The most compatible suffix is '.cc').

As for a Java version of readseq, Java is becoming the single
best choice for any software with a user interface.  As a
language it offers some very good new features that are lacking
in C++ (and many many more interesting things than tired old C).

These include runtime reference resolution such that I can
potentially make readseq a portable component that any non-programmer
or interested biologist could drop into an application container
and link together with other program parts to form a new
application.  This is part of what Java Beans is about.  One
can't do this with C++ or C code.  It is something new and
potentially very enabling for the average scientist or
bioinformatician.

And of course when you or someone has a new file
format, one will be able to add that to the object oriented
C++ or Java version of readseq without nearly as much heck to go 
thru as with a procedural C version.

The relative slowness of some Java p-code interpreters is not 
a big factor for many tools and uses such as for readseq.  I don't 
think anyone uses it for batch conversion of Genbank-sized files
-- for that you can go to a C++ tool.  Java certainly is faster
than Perl, and note how many use Perl glue for many similar tasks.
Besides, your next computer will likely have an OS (or microprocessor)
tuned to Java.


The first version of readseq was written in Pascal.  Then
as C became a dominant language, I converted it to that (it
still shows marks of the automatic conversion tool that took
it from Pascal to C).  For a short time I maintained Pascal and C
versions, but I lack the time to do that for very long, and there
are no automatic tools to convert readily among these languages.

I will do what I can to put some of the new parts of the C++/Java
version into the C version.  The more feedback I get from people
who will use a new version in any particular language, the
better I'll be able to direct this where you all need it.  But
it was and is a tool to help programmers (me) handle sequence
data.  As I see programming in bioinformatics, it is and should
be moving away from procedural languages like C (and aged pascal)
toward C++ and Java.

-- Don



In article <5pts57$jrd at gap.cco.caltech.edu>,
 <mathog at seqaxp.bio.caltech.edu> wrote:
>In article <5pqvae$9hd$1 at dismay.ucs.indiana.edu>, you write:
>>I am working on a new version of readseq in conjuntion with a
>>new SeqPup.  It will likely come in C++ and Java source, but not 
>>C source.
>
>Ugh, double ugh, no, triple ugh.  PLEASE check the C++ version on multiple 
>compilers before releasing it. I've had miserable luck porting C++ code to
>run outside of the original compiler (/platform) environments.  In fact,
>the situation is so bad that I currently consider C++ to be a nonportable
>language.  Take the average g++ program and try to build it with DEC C++
>(Unix or OpenVMS), you'll succeed maybe 10% of the time.  Even C++ packages
>that are carefully maintained to be cross platform, such as Amulet, won't
>work from release to release on some compilers (ie, the new Metrowerks for
>Mac/Windows will no longer build it.) 
>
>Java is by design portable, but on the flip side, is slower than molasses
>for anything big. So the Java version will probably be ok for a single
>sequence conversion, but I'd hate to have to funnel a few thousand conversions
>through a Java readseq. 
>
>Regards,
>
>David Mathog
>mathog at seqaxp.bio.caltech.edu
>Manager, sequence analysis facility, biology division, Caltech 
>**************************************************************************
>*Affordable VMS? See:  http://seqaxp.bio.caltech.edu:8000/www/pcvms.html *
>**************************************************************************

--
-- d.gilbert--biocomputing--indiana u--bloomington--gilbertd at bio.indiana.edu




More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net