IUBio

Problem with BESTFIT

Reinhard Doelz doelz at comp.bioz.unibas.ch
Wed Aug 11 09:21:38 EST 1993


In article <24ar34$sdn at mserv1.dl.ac.uk>, lhb at s-crim1.dl.ac.uk (L.H. Bell) writes:
...
|>          Symbol comparison table: /exe1/gcg/gcgcore/data/rundata/swgappep.cmp
|>          Symbol comparison table: /exe1/gcg/gcgcore/data/rundata/swgapdna.cmp
..
|> The problem is that it thinks the short sequence is DNA. 
|> The bug is that including -protein on the command line, to force it to think 
|> protein, does not work. The only way round this is to fetch the protein
|> comparison table and explicitly declare that in the command line.
|> 

Single GCG sequences might suffer from being the wrong 'Type' as well as 
sequence databases might have the wrong type also if you format the database
yourself. Check the file swissprot.header, it should read (wrapped for 
clarity)

NAME:swissprotdir:SWISS LN: SWISSPROT SN:SWISS REL:26 RELDATE:1993 FORDATE:1993 TYPE:P FORMAT:NBRF
                  ^         ^
                  |         may vary, can also be DATA or GCG depending on 
   MUST be P (Protein)      the procedure used 


The sequence should be looked at also, and it might be that you came 
up with a short peptide (which YOU think is peptide) but the FROMSTADEN
or REFORMAT was confused about the composition and called it a DNA fragment
instead; 

 biox > cat > test.sdn
DGTGADDACT
 biox > reformat test.sdn test.seq

REFORMAT rewrites sequence file(s), symbol comparison table(s), or enzyme 
data file(s) so that they can be read by GCG programs. 
 No ".." divider

 biox > grep -i Type test.seq
 test.seq  Length: 10  August 11, 1993  15:55  Type: N  Check: 3918  ..

If you use FROMSTADEN instead, 
biox > fromstaden         

FROMSTADEN changes a sequence from Staden format into GCG format. 
If the file contains a nucleotide sequence, the ambiguity codes are
translated as shown in Appendix III of the PROGRAM MANUAL. 

 FROMSTADEN of what Staden sequence file ?  test.sdn

 What should I call the output file (* test.seq *) ?  test.seq1


 biox > grep -i Type test.seq1
test.seq1  Length: 10  August 11, 1993  15:57  Type: N  Check: 3904  ..

... NOTE THE DIFFERENT CHECKSUM! This is because FROMSTADEN is 
'clever' enough to try to convert the D symbol into c . 

The right way of doing it would be 
 biox > reformat -pep test.sdn test.seq2

REFORMAT rewrites sequence file(s), symbol comparison table(s), or enzyme 
data file(s) so that they can be read by GCG programs. 
 No ".." divider

 biox > grep -i Type test.seq2
test.seq2  Length: 10  August 11, 1993  16:00  Type: P  Check: 3918  ..

and here we go with the beauty of a peptide sequence:



 biox > more test.seq2
 REFORMAT of: test.sdn  check: -1  from: 1  to: 10  August 11, 1993  16:00

 (No documentation)

test.seq2  Length: 10  August 11, 1993  16:00  Type: P  Check: 3918  ..

       1  DGTGADDACT 
                                                     ^
                                                     |
                                                  note the P

Regards
Reinhard

+----------------------------------+-------------------------------------+
|    Dr. Reinhard Doelz            | RFC     doelz at urz.unibas.ch         |
|      Biocomputing                | DECNET  20579::48130::doelz         |
|Biozentrum der Universitaet       | X25     022846211142036::doelz      |
|   Klingelbergstrasse 70          | FAX     x41 61 261- 6760 or 267- 2078     
|     CH 4056 Basel                | TEL     x41 61 267- 2076 or 2247    |   
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
                     ftp mirror at nic.switch.ch 
               -----------------------------------------



More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net