IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

MSF problem?

Keith Robison robison at lipid.harvard.edu
Mon Aug 1 20:59:08 EST 1994

L. H. Bell (lhb at s-ind1.dl.ac.uk) wrote:
: In article (B.L.Cohen) writes:
: |> I'm having "Cannot Open" of "Cannot read" problems with MSF files produced
: |> by the Export Foreign format function of GDE (i.e. Gilbert's Readseq) and
: |> entered as input as follows: Plotsim filename.msf{*}
: |> 
: |> The files look like OK MSF DNA multiple alignment files to our eyes.   
: |> 
: |> Can anyone point to possible problems?   

: This looks interesting. when you output proteins from GDE as MSF, they 
: load fine into GCG programs but when you try outputting DNA, you get 
: this error. However running the corrupt MSF sequence through readseq
: again (readseq -a -p -form=msf <dna.msf >dna2.msf) produces a working 
: MSF file that loads into GCG. The test file contained 3 DNA sequences,
: all 550 bases long. The first sequence was named SYNR. 

: My gde fix would be to create a new file item, 'export as msf' as 
: itemmethod:readseq -pipe -all -form=msf < in1 > out1 ; readseq -p -a 
: -form=msf <out1 > $OUTPUTFILE
: and remove the msf option from the 'export foreign format'.

I hope Dr. Cohen doesn't mind me spilling the beans, but he did figure out
that the problem is that GDE outputs the gaps as '.' and GCG demands
'-'.  Putting sed inbetween GDE and readseq should cure the problem
(or awk or perl) by converting all the '.' to '-'
(remembering that '.' is the wildcard for a single character and 
'..' is the escaped form!)

itemmethod:sed 's/../-/g' in1 |readseq -all -pipe -form=msf >$OUTFILE

is probably close to the correct GDE menu entry (now to go fix
this myself...)

: s-crim1:lhb 140> diff dna.msf dna2.msf 
: 2c2
: <  gde26558_1  MSF: 550  Type: N  January 01, 1776  12:00  Check: 8511 ..
: ---
: >  dna2.msf  MSF: 550  Type: N  January 01, 1776  12:00  Check: 5077 ..
: 4c4
Nice to know our founding fathers used GCG!!! ^^^^

Keith Robison
Harvard University
Department of Cellular and Developmental Biology
Department of Genetics / HHMI

robison at mito.harvard.edu 

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net