Split sequences in GCG 8.1

Jack Leunissen jackl
Tue Sep 5 08:58:39 EST 1995


After installation of GCG V8.1, and re-indexing of my databases, I noticed a
strange behaviour of the software when trying to read the (full) EMBL database:
the programs (like FASTA, DBSTATS, EQUICKINDEX, ...) who all try to read the
whole database, stopped reading with the following error message:

 *** ERROR in SQNext. Sequence reading is out of synch.

This appeared to happen only in those database sections that contained split
sequences, i.e. EM_BA, EM_FUN, and EM_PR. The sequences were split in the 
old GCG way: when longer than 350k bases, they were split in 350k chunks.

However, when I reformatted these sections according to the new GCG scheme 
(i.e. when longer than 350k, split into 100k chunks, plus a 10k overlap), 
and reindexed them, it all worked perfectly again.

So, if you do your own database conversions, reformat EMBL from flatfile 
before indexing under V8.1! EMBLTOGCG will do the trick. If however, you 
insist on having your databases in NBRF format (like I do), then pick up 
the latest version of "embl2nbrf" from my FTP-site:


Best regards,

      Jack A.M. Leunissen, Ph.D. | CAOS/CAMM Center, Univ. of Nijmegen
      Email: jackl at caos.kun.nl   | Toernooiveld
      Tel. : +31 80 65 22 48     | 6525 ED Nijmegen, The Netherlands
      Fax  : +31 80 65 29 77     | URL=http://www.caos.kun.nl/

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net