We have detected a minor bug in readseq and offer a simple one line
fix. The problem is when the last line of some input sequences do not
have end of line characters. Although this is NOT a likly occurrence, moving
files between platforms, cutting and pasting, using WEB browsers, etc., it is
a possibility - it happened at our site.
Two examples which exhibit the bug are given here:
Example 1:
Input sequence is a fasta format with 1419 bases. The Sequence has
50 characters per line but NO end of line character for the last line:
% cat -e example.one
>HIL15COM Human interleukin 15 (IL15) mRNA, complete cds., 1419 bases, 99E46F98 checksum.$
TGTCCGGCGCCCCCCGGGAGGGAACTGGGTGGCCGCACCCTCCCGGCTGC$
GGTGGCTGTCGCCCCCCACCCTGCAGCCAGGACTCGATGGAGAATCCATT$
-CUT-
TAATTTAGTTATTGATGTATAAAGCAACTGTTATGAAATAAAGAAATTGC$
AATAAAAAAAAAAAAAAAA
% readseq -f5 example.one -pipe > example_one.gcg
% cat example_one.gcg
HIL15COM Human interleukin 15 (IL15) mRNA, complete cds.
HIL15COM Length: 1400 (today) Check: 5081 ..
1 TGTCCGGCGC CCCCCGGGAG GGAACTGGGT GGCCGCACCC TCCCGGCTGC
51 GGTGGCTGTC GCCCCCCACC CTGCAGCCAG GACTCGATGG AGAATCCATT
-CUT-
1351 TAATTTAGTT ATTGATGTAT AAAGCAACTG TTATGAAATA AAGAAATTGC
NOTE: The sequence is truncated at 1400 (the last line with an end of
line character).
Example 2:
Input sequence is a fasta format with 1419 bases. All the seqeunce is
in a single line with no end of line character.
% cat -e example.two
>HIL15COM Human interleukin 15 (IL15) mRNA, complete cds., 1419 bases, 99E46F98
checksum.$
TGTCCGGCGCCCCCCGGGAGGGAACTGGGTGGCCGCACCCTCCCGGCTGC -CUT- ATAAAAAAAAAAAAAAAA
% readseq -f5 example.two -pipe > example_two.gcg
% cat example_two.gcg
IL15COM Human interleukin 15 (IL15) mRNA, complete cds.
HIL15COM Length: 1275 (today) Check: 1094 ..
1 TGTCCGGCGC CCCCCGGGAG GGAACTGGGT GGCCGCACCC TCCCGGCTGC
51 GGTGGCTGTC GCCCCCCACC CTGCAGCCAG GACTCGATGG AGAATCCATT
-CUT-
1201 TAATGCTGCA GGTCAACAGC TATGCTGGTA GGCTGAACCA CTGACTACTG
1251 GCTCCCATTG ACTTCCTTCA TAAGC
NOTE: The sequence is truncated at 1275 (The last multible of full
255 "fgets" function calls).
A fix, provided by Rao Parasa of our group, is to add the line following
the comments to the readline routine in ureadseq.c:
Local void readline(FILE *f, char *s, long *linestart)
{
char *cp;
*linestart= ftell(f);
if (NULL == fgets(s, 256, f))
*s = 0;
else {
cp = strchr(s, '\n');
if (cp != NULL) *cp = 0;
/*
* Following line fixes BUG when last line in the input does not have
* an EOL character
*/
if (feof(f)) clearerr(f);
}
}
--
--------
John Powell phone: (301) 496-2963
Building 12A, Room 2033 FAX: (301) 402-2867
National Institutes of Health
Bethesda, MD 20892 Internet: jip at helix.nih.gov