In article <24ar34$sdn at mserv1.dl.ac.uk>, lhb at s-crim1.dl.ac.uk (L.H. Bell) writes:
...
|> Symbol comparison table: /exe1/gcg/gcgcore/data/rundata/swgappep.cmp
|> Symbol comparison table: /exe1/gcg/gcgcore/data/rundata/swgapdna.cmp
..
|> The problem is that it thinks the short sequence is DNA.
|> The bug is that including -protein on the command line, to force it to think
|> protein, does not work. The only way round this is to fetch the protein
|> comparison table and explicitly declare that in the command line.
|>
Single GCG sequences might suffer from being the wrong 'Type' as well as
sequence databases might have the wrong type also if you format the database
yourself. Check the file swissprot.header, it should read (wrapped for
clarity)
NAME:swissprotdir:SWISS LN: SWISSPROT SN:SWISS REL:26 RELDATE:1993 FORDATE:1993 TYPE:P FORMAT:NBRF
^ ^
| may vary, can also be DATA or GCG depending on
MUST be P (Protein) the procedure used
The sequence should be looked at also, and it might be that you came
up with a short peptide (which YOU think is peptide) but the FROMSTADEN
or REFORMAT was confused about the composition and called it a DNA fragment
instead;
biox > cat > test.sdn
DGTGADDACT
biox > reformat test.sdn test.seq
REFORMAT rewrites sequence file(s), symbol comparison table(s), or enzyme
data file(s) so that they can be read by GCG programs.
No ".." divider
biox > grep -i Type test.seq
test.seq Length: 10 August 11, 1993 15:55 Type: N Check: 3918 ..
If you use FROMSTADEN instead,
biox > fromstaden
FROMSTADEN changes a sequence from Staden format into GCG format.
If the file contains a nucleotide sequence, the ambiguity codes are
translated as shown in Appendix III of the PROGRAM MANUAL.
FROMSTADEN of what Staden sequence file ? test.sdn
What should I call the output file (* test.seq *) ? test.seq1
biox > grep -i Type test.seq1
test.seq1 Length: 10 August 11, 1993 15:57 Type: N Check: 3904 ..
... NOTE THE DIFFERENT CHECKSUM! This is because FROMSTADEN is
'clever' enough to try to convert the D symbol into c .
The right way of doing it would be
biox > reformat -pep test.sdn test.seq2
REFORMAT rewrites sequence file(s), symbol comparison table(s), or enzyme
data file(s) so that they can be read by GCG programs.
No ".." divider
biox > grep -i Type test.seq2
test.seq2 Length: 10 August 11, 1993 16:00 Type: P Check: 3918 ..
and here we go with the beauty of a peptide sequence:
biox > more test.seq2
REFORMAT of: test.sdn check: -1 from: 1 to: 10 August 11, 1993 16:00
(No documentation)
test.seq2 Length: 10 August 11, 1993 16:00 Type: P Check: 3918 ..
1 DGTGADDACT
^
|
note the P
Regards
Reinhard
+----------------------------------+-------------------------------------+
| Dr. Reinhard Doelz | RFC doelz at urz.unibas.ch |
| Biocomputing | DECNET 20579::48130::doelz |
|Biozentrum der Universitaet | X25 022846211142036::doelz |
| Klingelbergstrasse 70 | FAX x41 61 261- 6760 or 267- 2078
| CH 4056 Basel | TEL x41 61 267- 2076 or 2247 |
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
ftp mirror at nic.switch.ch
-----------------------------------------