A bug in GCG SEQED program? How to solve the program

Peter Rice pmr at unst.sanger.ac.uk
Fri Feb 3 05:06:57 EST 1995

In article <Pine.3.89.9502021944.A1082252951-0100000 at SLUVCA.SLU.EDU> HUANGY at SLUVCA.SLU.EDU writes:
>	   While I was using with the GCG SEQED program, one strange thing
>   happened to me. When I created a new sequence file with SEQED program and
>   put two or more consequent periods (.) into the header part or the comment
>   part and exited with some sequences, the SEQED program would say that the
>   file was not in GCG format once I reopened it with the program.
>	   I spent quite a while figuring out how this could be happening.
>   Finally, I realized that this problem resulted from the 2 or more periods
>   in the header part.
>	   What I did was very simple. Use a text file editor and open the
>   sequence file and put a space between two consequent periods. But DON'T put
>   any letter or space between the two periods after the Month day year
>   hour:second Check: xxxx. Then, the program was very happy to accept the
>   files which it had created and refused to accept.

Definitely a bug, but not new in 8.0. Even GCG 7.3 on OpenVMS seems to
suffer from it.

SEQED should expand the ".." to ". ." just as you did. It only needs
to call StrAerate for each line of heading when writing it out.

Most other GCG programs are paranoid about .. and keep expanding it.
EMBLTOGCG, for example, expands ".." in the feature table to ". ."
and upsets SRS output, even though every access to the sequence
databases by GCG will repeat the check in any case (necessary
because, for example, PIR can be obtained in GCG format with
".." in the author lists).

Because of this effect, I recently turned off the ".." to ". ." conversion
in EMBLTOGCG to make SRS happy when using GCG format databases. So far it
seems to work fine in GCG.

I also changed the order of the hacked DE line to be:

acnumber description species

instead of GCG's "acnumber species description" which means that
database searches always say something useless like:

Hs11_Arath X12345 Arabidopsis thaliana (mouse-ear cress) 17.4 kd clas . . .

when the SwissProt entry name already tells me the species :-)

As for SEQED, I gave up using it years ago. I can type faster in a
normal editor. Just put ".." after the heading, and use reformat to
write the true ".." line.
Peter Rice                           | Informatics Division
E-mail: pmr at sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr    | England

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net