IUBio

estimation of base accuracy and putting N instead of - in sequence files

James Bonfield jkb at mrc-lmb.cam.ac.uk
Tue Jul 29 02:53:55 EST 1997


In article <5ril0e$ieu at mserv1.dl.ac.uk> rifat at icr.ac.uk writes:
>	Recently I had a problem with some sequences in that it was 
>giving me a noisy sequence upto 150bp and then good sequence from 
>150bp to 500bp. But the problem is that eba program clipped the whole 
>sequence and excluded it from the database. How can I change the eba 

Firstly, eba does no clipping. It assigns quality values to trace files.

The clipping program is either "clip" or "trace_clip". "clip" looks
purely at the frequency of unknown bases in a sequence, and clips when
this matches a given threshold. It can do this for both 5' and 3'
data, and starts its search at a given offset. To control this
behaviour, set clip_args to something in your .pregaprc file. Eg:

	echo "clip_args='-s 150'" >> .pregaprc

Trace_clip is the alternative, this looks at the shapes of the traces
to perform clipping. It has two measures for this, which the user can
independently weight and specify thresholds. The scale_trace_clip
program allows you to manually clip a set of files and then to analyse
these to produce the trace_clip parameters to attempt to reproduce
your own clipping criteria. To make pregap use trace_clip, use (eg):

	echo "clip=trace_clip" >> .pregaprc
	echo "clip_args='-b -s 150'" >> .prergaprc

For more instructions, try "man clip", "man trace_clip" and "man
scale_trace_clip".

>Also, why does Staden change the Ns in the sequence to -  Is there 
>anyway I can leave the Ns in the sequence, because when I blast the 
>sequences the BLAST software gives a warning about the - 

It's purely for visual purposes. We feel (and our users have also told
us this) that dashes are much easier to see when perusing the sequence
than just another uppercase letter, hence making editing less
tiring. If they cause a problem, it's trivial to change them. Simply
use a unix cmmand like the following:

	tr - N < old_file > new_file

James
--
James Bonfield (jkb at mrc-lmb.cam.ac.uk)   Tel: 01223 402499   Fax: 01223 213556
Medical Research Council - Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, England.
Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/



More information about the Staden mailing list

Send comments to us at biosci-help [At] net.bio.net