Rick Westerman wrote:
> Michael Black wrote:
>> >And I hope they push forward with version 11.0 quickly - some of the
>> > size/length limitations in the current GCG are ridiculously restrictive in
> > today's genomic age.
>> Amen to that.
And guess what, having the SDK doesn't help as much as you'd think. We'd been
running an old copy of 8.1 on VMS for years that was modified to bump the MAXSEQLEN
up to 500k. We recently migrated to 10.2 on Solaris, and we bought the SDK (days
before they decided to stop selling it) specifically so that we could
do the same on the new version. Except that it isn't so easy! Sure, I could bump
up the MAXSEQLEN
and rebuild everything, but then the dozen or so GCG programs that they supply as
binaries but aren't in the SDK, most notably , Seqlab, blew up when I ran them.
They weren't compatible with tbe new .so libraries
apparently because the 350k vs. 500k was resulting in different offsets into data
structures.
To get around this I had to modify the build procedure so that everything I had
source for used
lib_lcl_blah.so and leff alone the original libblah.so. I'm apparently going to
also have to modify
the database build programs so that they act as if MAXSEQLEN is 350k or Seqlab is
going to blow
up when it hits the first entry over 350k. So we have a system where most of the
programs can now
go to 500k, but others cannot.
Memory is so cheap these days that it makes no sense to restrict the sequence sizes
so severely.
If it was all C code there'd be no reason to limit the size at all, but given that
there's still a lot of fortran in
there a decent compromise would be to bump MAXSEQLEN to 1Mb. It would probably
also make
sense in GCG 11 to make a disitinction between MAXCSEQLEN (which should be
eliminated in the C programs or at least raised by a huge amount) and MAXDBSEQLEN
(the biggest sequence that can be stored in a GCG database) and
MAXFORTRANSEQLEN (the biggest sequence that can be handled by the fortran
programs.) Currently these are all the same, implicitly, and are MAXSEQLEN = 350k.
Regards,
David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech