In article <gbga13-300794123016 at mac7-43.genetics.gla.ac.uk>, gbga13 at udcf.gla.ac.uk (B.L.Cohen) writes:
|> I'm having "Cannott Open" of "Cannot read" problems with MSF files produced
|> by the Export Foreign format function of GDE (i.e. Gilbert's Readseq) and
|> entered as input as follows: Plotsim filename.msf{*}
|>|> The files look like OK MSF DNA multiple alignment files to our eyes.
|>|> Can anyone point to possible problems?
This looks interesting. when you output proteins from GDE as MSF, they
load fine into GCG programs but when you try outputting DNA, you get
this error. However running the corrupt MSF sequence through readseq
again (readseq -a -p -form=msf <dna.msf >dna2.msf) produces a working
MSF file that loads into GCG. The test file contained 3 DNA sequences,
all 550 bases long. The first sequence was named SYNR. A diff of the
files looks like this.
s-crim1:lhb 140> diff dna.msf dna2.msf
2c2
< gde26558_1 MSF: 550 Type: N January 01, 1776 12:00 Check: 8511 ..
---
> dna2.msf MSF: 550 Type: N January 01, 1776 12:00 Check: 5077 ..
4c4
< Name: SYNR Len: 600 Check: 9901 Weight: 1.00
---
> Name: SYNR Len: 550 Check: 6467 Weight: 1.00
My gde fix would be to create a new file item, 'export as msf' as
itemmethod:readseq -pipe -all -form=msf < in1 > out1 ; readseq -p -a
-form=msf <out1 > $OUTPUTFILE
and remove the msf option from the 'export foreign format'.
Anybody know why readseq is behaving like this and so suggest a more
elegant fix? gde pipes the sequences in GENBANK format into readseq.
Hope this was helpful,
Lachlan Bell