IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Availability of EGCG9?

mathog at seqaxp.bio.caltech.edu mathog at seqaxp.bio.caltech.edu
Tue Feb 10 13:00:00 EST 1998

In article <subyazk99w4.fsf at unst.sanger.ac.uk>, Peter Rice <pmr at sanger.ac.uk> writes:
>The biggest problem is that I am now working on the successor to EGCG,
>a sequence analysis suite called EMBOSS, and for various reasons any
>further work on EGCG, from my point of view, will be wasted
>effort. This is despite long and frustrating negotiations to try to
>link the two projects together.

I'm game for a GCG replacement.  Let's look back at GCG's history and 
consider what went right, and what went wrong.

GCG's first phase was academic.  A group at the University of Wisconsin
that gathered together existing molbio programs, made all of the interfaces
look the same, and then provided the sanitized result, complete with source 
code to those who wanted it. This was a big improvement over the bad old 
days, when an end user had to write to each program's author to get a 

In the second phase GCG became a private company, added a considerable 
amount of their own code to the package, and made the packaging a bit 
slicker. But from the end user perspective it was pretty much the same deal
- a nice package with consistent documentation, support, and source code. 

In the third, and for us, final phase, GCG withdrew the source code from 
the package.  While I understand the reasoning behind this, since as a 
private company they had a right to protect their intellectual property,
it had the side effect of making the GCG package much, much less attractive.
In fact, since it also wiped out EGCG availability, the new GCG was so 
unattractive that we simply dropped our contract with them and stayed at 8.1.
Moreover, in effect it layed claim to the intellectual property of a lot of 
other people as well - those who had written the programs GCG started with 
back in the first phase.

Meanwhile, while all of this was going on, computing in general went 
through about 5 paradigm shifts, and these turned up some technical 
problems in the GCG approach.

1.  There was no automatic way to map the command line/prompted variables
for each program into new interfaces, and no way for the interface to
determine if the supplied variables were valid or not before passing them 
to the program.  (Even GCG's own command line method was notoriously
insensitive to bad commands ie 


when the correct command should have been


would not generate any error or warning.)  This seriously hindered bundling 
the GCG programs with both X11 and Web interfaces.

2.  Computational power shifted out from single servers to myriad PCs
and Macs, and GCG never provided a means to let the processing migrate
to the desktops, no means for load balancing across multiple machines,
and no automatic means for detecting when to shift the processing to the
data (as for a FASTA search through Genbank, which should not be done
by downloading all of Genbank across the local network!)

3.  The GCG graphics model never improved significantly, and it is
positively archaic now.  In particular, the default use of stroke fonts
and the absence of area fills have been major problems.  The former made it
very, very difficult to import GCG graphics into standard drawing programs
for final touch up work - one either had to go through HPGL and the HP2PICT
hypercard stack, or when we got sick of that, through a special CGM driver
I wrote  (http://seqaxp.bio.caltech.edu/pub/SOFTWARE/GCGCGM.ZIP, of course,
if you've moved to 9.x, you can't use the CGM driver.)  The lack of fills
resulted in PRETTYBOX being postscript only, which meant that there was no
way to get it into a drawing program whatsoever. 

So, I don't know exactly what Peter and company have planned for EMBOSS, 
but I'd suggest that at least the following be adopted.

1.  Write it all in ANSI C.  Not that I want to start a religious war, but
at this time ANSI C is clearly the best cross platform language, at least
for algorithmic programming.  "Best" in the sense that a strictly ANSI C
program is more portable than anything else I've encountered (including
ANSI C++ and Fortran 77.)  Mixed language programming on a cross 
platform project should be avoided like the plague.

2.  GPL everything that goes in, and arrange it as a bunch of libraries.
The public domain product would ship with a command line interface to those 
libraries.  If some company wants to provide paid support for this package
at some point, that could still happen (as it does for Linux, for
instance).  One of the reasons that GCG protected its source code was
(apparently) that some of their source found its way into a competitor's
product.  Well folks, if you've priced any of the molbio software packages
lately you know that they are all very expensive, and a big chunk of that is 
because each must write the same core code - a total waste of effort.  What
I would like to see happen is for a GPL'd core to develop, around which a
commercial vendor could wrap a proprietary interface, if they so wanted.
(I think that is legal, so long as the vendor doesn't sell you the library.)
You would buy the interface from the vendor, but the core would be public
domain.  This should decrease the price of all such software (since the
vendors need only provide interfaces, not core routines).  Perhaps more
importantly, it would provide a means for patching the core routines, and
most likely, adding one's own routines, to otherwise "proprietary"

3.  Define rigorous interface standards, including methods for interfaces 
to determine at run time the variables and conditions among variables that 
various modules require.  So when the next cool interface blows in, we
can get an interface to this package up and running without hideous amounts 
of work.  Provide a simple means for adding prompts/help text
in alternate languages.

(4. Unless EMBOSS is to be owned by EMBO, which is, I think, not a great 
idea, change the name to something else.)

Anyway, you get the drift.  I'll be happy to help with this project in any 
way I can.


David Mathog
mathog at seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech 

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net