Staden scripting

jkb at mrc-lmb.cam.ac.uk jkb at mrc-lmb.cam.ac.uk
Thu Jun 28 12:31:02 EST 2001

In <sot*EyTZo at news.chiark.greenend.org.uk> Tim Cutts <timc at chiark.greenend.org.uk> writes:

> In article <3B3B47A6.281EE8C at bbsrc.ac.uk>,
> Simon Andrews  <simon.andrews at bbsrc.ac.uk> wrote:
> >1) open a database of a given name
> load_package gap
> set io [open_db -name foobar -version 0 -access rq -create 1]

It's good to see I can rely on you Tim to come up with the goods :-)

> >2) add in exp files from pregap.passed using normal shotgun assembly
> Not sure here, but it might be somethign like:
> assemble_new_contigs -io $io -files {read1 read2 read3}

Take a look at the $STADENROOT/src/scripts/assemble4 file. It contains pretty
much an example of what you're trying to do. It's age-old though so I don't
know if it still works, but it is still a worthwhile example to look at.

> >5) if there is one contig extract its consensus (with all gaps removed)
> get_consensus -io $io -contigs contigname -outfile filename ...
> there are a few more options to that to precisely control the way the
> consensus is generated.

There's also a calc_consensus function which doesn't have as many twiddly bits 
as get_consensus (it's "nearer the algorithm" if you see what I mean), but it
has the bit advantage of not needing to save to a file. Calc_quality has the
same interface, but it returns the confidence values. Remember to set the
global variables controlling the consensus algorithm first.


set consensus_cutoff 0.01
set quality_cutoff 0
set consensus_mode 2
set seq  [calc_consensus -io $io -contigs $clist]
set conf [calc_quality -io $io -contigs $clist]

Conf is a binary string. Tcl handles strings with nuls fine. To convert to
something more readable you can use:

binary scan $conf c* conf_str
puts $conf_str

I have a useful strip pads function too:

proc Strip_Pads {cons qual new_cons_var new_qual_var} {
    upvar $new_cons_var new_cons
    upvar $new_qual_var new_qual

    set new_cons ""
    set new_qual ""
    set pos 0
    foreach c [split $cons *] {
	#set new_cons $new_cons$c
	set length [string length $c]
	set new_cons $new_cons$c
	set new_qual $new_qual[string range $qual $pos [expr $pos+$length-1]]
	incr pos [expr $length+1]

Note though that in the next release (soon, I promise), which will be based on 
Tcl 8.3.x (probably) the "string map" command can do most of this in one
step, although it won't be able to strip pads out of the quality string at the 
same time.

> >Can anyone help me out on this one?  Even examples of working Staden
> >scripts would be useful so I can sort out the information in James'
> >scripting manual.

Sorry that the scripting manual is out of date. It took such an effort to
write it that I really should have kept it up to date, but there's never quite 
enough hours in the day to contemplate such things!

Good luck,

James Bonfield (jkb at mrc-lmb.cam.ac.uk)   Tel: 01223 402499   Fax: 01223 213556
Medical Research Council - Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, England.
Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/

More information about the Staden mailing list

Send comments to us at biosci-help [At] net.bio.net