IUBio

estimation of base accuracy and putting N instead of - in sequence files

Andy Law Andy.Law at bbsrc.ac.uk
Wed Jul 30 05:01:28 EST 1997


Rifat,

For the benefit of anyone else who may find this trick useful,
I've also posted this to bionet.software.staden.

Rifat wrote:

 >      I would be grateful if you could tell me the
 >  implemenation you made regarding modifying the pregap script
 >  to cope with bad sequence at the beginning.
 >  
 >  Do you open up each .exp file and check QL/QR then offer
 >  interactive clipping?
 >  
 >  In my case if I have bad sequence upto 150bp then good
 >  sequence till 500bp, does QL give an indication at 150 and
 >  QR at 500? If so then I can deal with that.
 >  


I modified (amongst other things) section 5 of the pregap script. This
is the quality clipping stage. My version reads like this (modified from
version 1.9 so you may need to tinker with it a bit).

#-----------------------------------------------------------------------------
# Fifth phase - perform quality clipping
#
# Input : ${fofn}.passed        File of experiment file names
# Output: ${fofn}.passed        File of passed files (rewrite)
#         ${fofn}.suspect-clip  File of those files failing the clip specs.
#         ${fofn}.failed        Append any failures

if ${do_qual_clip}
then
    $echo "Quality clipping              \c"
    rm "${fofn}.passed_tmp" 2>/dev/null

    exec 3<"${fofn}.passed"
    touch "${fofn}.suspect-clip"

    while read file <&3
    do
    if "${clip}" ${clip_args} "${file}"
        then
        $echo "$file" >> "${fofn}.passed_tmp"
        $echo '.\c'
        if [ ${QL_def} ]
            then
            exp_line "${file}" QL "${QL_def}"
        fi
        thisQR=`${STADENROOT}/bin/lookup "${file}" QR 1`
        thisQL=`${STADENROOT}/bin/lookup "${file}" QL 1`
        if [  "${thisQR}" -lt "${clip_QR_min}" ]
            then
            $echo "$file     QR below minimum value   ${thisQR} vs
${clip_QR_min}" >> "${fofn}.suspect-clip"
            
        elif [ "${thisQL}" -gt "${clip_QL_max}" ]
            then
            $echo "$file     QL above maximum value   ${thisQL} vs
${clip_QL_max}" >> "${fofn}.suspect-clip"        
        fi
    else
        $echo "$file    $clip failed" >> "${fofn}.failed"
        $echo '!\c'
        fi
    done

    exec 3<&-

    $echo
    mv "${fofn}.passed_tmp" "${fofn}.passed"
fi
#-----------------------------------------------------------------------------




You need to define (in .pregaprc somewhere along the line) the
variables clip_QR_min and clip_QL_max. The files are processed
normally with files that run through the clip program being added to
the passed file ready to progress. However, having been run through
the check, the QR and QL fields are pulled from the .exp file and
compared to the preset cuttoff values. If QL is greater than
clip_QL_max (ie the left hand clipping extends further into the
sequence than you would like) or if QR is less than clip_QR_min (ie
the right hand clipping extends further into the sequence than you
would like) then the file is added to the "suspect-clip" file. At the
end of this section, there is a file of file names that need to have
their quality clipping inspected manually before progressing.

Then between sections 5 and 6, I have inserted section 5a which does
the trev interactive stuff on the necessary files

#-----------------------------------------------------------------------------
# Fifth phase part 2 - interactive quality clipping
# (i) trev
#
# This invokes the trace editor, trev, on the file so that we can reset
# any suspicious quality clippings prior to sequence vector analysis
#
# Input : ${fofn}.suspect-clip        File of experiment files
# Output: none                        (the files have passed clip anyway)
#
if ${do_qual_clip} && ${qual_clip_interactive}
then
    $echo "Interactive quality clipping  \c"
    
    exec 3<"${fofn}.suspect-clip"

    while read file rest <&3
    do
    ${trev} ${trev_args} "${file}"
    $echo '.\c'
    done

    exec 3<&-
    $echo
fi

rm "${fofn}.suspect-clip" 2>/dev/null


#-----------------------------------------------------------------------------


I have also modified the interactive clipping phase at section 7 so that only
sequences that have an "unexpected" SL clipping value are examined. We know
that our sequencing vector will account for the first 40-60 bases of the file.
If the sequence clipping program determines the left end of the sequence to 
be before a preset value (clip_SL_min - set in the .pregaprc file as above)
then the file is presented for checking, otherwise it is passed over.

#-----------------------------------------------------------------------------
# Seventh phase - interactive quality clipping
# (i) trev
#
# This invokes the trace editor, red, on the file.
#
# Input : ${fofn}.passed        File of experiment files
# Output: none
#
if ${do_qual_clip} && ${qual_clip_interactive}
then
    
    $echo "Interactive quality clipping  \c"
    
    exec 3<"${fofn}.passed"

    while read file <&3
    do
    if [ `${STADENROOT}/bin/lookup "${file}" SL 1` -lt ${clip_SL_min} ]
        then
        ${trev} ${trev_args} "${file}"
        $echo '.\c'
    fi
    done

    exec 3<&-
    $echo
fi

#-----------------------------------------------------------------------------

Hope this is useful to someone. I'm open to comments on it as well.

Later,

Andy Law
------------------
( Andy.Law at bbsrc.ac.uk )
( Big Nose in Edinburgh )



More information about the Staden mailing list

Send comments to us at biosci-help [At] net.bio.net