extract_seq - is it broken?

James Bonfield jkb at mrc-lmb.cam.ac.uk
Mon Feb 10 05:38:48 EST 1997

Andy Law wrote:

>What I had intended doing was the following. Each sequence is an
>ABI-derived trace file. I can see within that file that there are
>restriction sites that may indicate multiple inserts. Since I want the
>information from the traces to be available for each insert, but each
>insert needs to be assembled separately, I figured the best way to do it
>would be to let pregap run on the file as normal. Then, having generated a
>single experiment file to encompass the whole sequence, I would duplicate
>the experiment file, modifying the reading name slightly (readings A and B,
>for example) and adding a CS or similar tagging line to each to mask out
>the non-required sequence. Both experiment files would point to the same
>template, clone, and (more importantly) trace file. Since each points to
>the same file, the cut-offs should remain the same. Will *that* work?

The principle sounds OK, but there's a few specifics which may cause problems.

a) Gap4 will tag the CS regions, but currently (it's in the code, but disabled)
won't mark the sequence as hidden data. The reason for this is that having the
sequence visible in the contig selector allows for easy recognition of your
two end contigs.

b) The quality cutoffs are for marking the sequence quality clips, not the
trace quality clips, but obviously they are usually the same. If you split
sequence X into X1 and X2 you may end up with a QL for X2 as < 1 and a QR for
X1 as > the X1 sequence length. I'm unsure of what will happen in such cases.

c) You need to specify ON (original numering) and AV (accuracy values) lines.
The ON line specifies a mapping from sequence positions to original base
numbers in the traces. In the above example, this is only needed for X2. Also,
if you use accuracy values you'll need to specify AV for X2 too, otherwise
it'll read them from the wrong component of the SCF file. See the output of
Extract Readings (quality info enabled) in Gap4 for examples of valid ON and
AV formats.

d) It's probably not possible to bring up two trace displays for the same
trace, but different sequence components.

There's possibly other pitfalls too.

James Bonfield (jkb at mrc-lmb.cam.ac.uk)   Tel: 01223 402499   Fax: 01223 213556
Medical Research Council - Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, England.
Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/

More information about the Staden mailing list

Send comments to us at biosci-help [At] net.bio.net