Algorithm for Restriction Enzyme Analysis

Brian Fristensky frist at cc.umanitoba.ca
Mon Jan 4 18:38:44 EST 1999

Gabor E. Tusnady wrote:
> Hello,
> >Hi All,
> >I am in trouble. I am developing an algorithm for restriction enzyme analysis
> >but it is taking too long. The reason is because there are so many degenerate
> >bases in the DNA sequence and thus it takes very long to analyze for all of
> >them considering the possible combiantions they make. All the more the
> >recognition sequence also have degenerate bases. Is there anybody out there to
> >help me optimize the algorithm? Yes, there is. So thanking in advance to all
> >those who respond.
> >Ravi Gupta.
> >Research Scholar
> There is a little program, called restenzyme on the net:
> http://www.enzim.hu/~tusi/restric/restenzyme.html
> It uses a very simple table:
>       nwryhmksdacgt
> ACGT n+++++++++++++
> AT   w+++++++-++--+
> GA   r+++-++++++-+-
> CT   y++-+++-++-+-+
> CAT  h+++++++++++-+
> AC   m++++++-++++--
> TG   k+++++-+++--++
> CG   s+-+++++++-++-
> GAT  d++++++++++-++
> A    a+++-++--++---
> C    c+--+++-+--+--
> G    g+-+---+++--+-
> T    t++-++-+-+---+
> so if you have a degenerated sequence and degenerated recognition site of
> the restriction enzymes, this table shows whether they can be the same.
> For example the sequence:   ...acacnwk...
>            the rec. site:   ...dynctyh...
> characters from the table:  ...+++++++...

Most of the time this will work, but what if there is a degenerate
base in the query sequence? For example, if you have y in the 
query sequence, it should still match a y in the restriction 

In the FSAP package


BACHREST and INTREST use Pascal sets to simplify 
ambiguity comparisons:

      (* Array NUCSET holds sets of nucleotides  *)
      (* using the conventions of IUPAC-IUB      *)
      NUCSET[A]:= [A];NUCSET[C]:= [C];NUCSET[G]:= [G];NUCSET[T]:= [T];
      NUCSET[R]:= [A,G,R];NUCSET[Y]:= [C,T,Y];NUCSET[S]:= [C,G,S];
      NUCSET[M]:= [A,C,M];NUCSET[K]:= [G,T,K];NUCSET[W]:= [A,T,W];
      NUCSET[B]:= [C,G,T,B,Y,K,S];NUCSET[D]:= [A,G,T,W,R,D,K];
      NUCSET[H]:= [A,C,T,M,Y,W,H];NUCSET[V]:= [A,C,G,M,R,S,V];
      NUCSET[N]:= [A..N];
      AMBIGUOUS:= NUCSET[N] - [A,C,G,T];

Since set operations are built into the language, this
makes dealing with ambiguities quick and easy. For example,

          if SEQ[Sk] in PATTERN[Pj] then (*CHARACTER MATCHED*)

decides if the sequence element Sk matches the restriction pattern
element Pj. 

If I was writing it today, I'd probably implement the ambiguity
relationships as Java classes, and then check to see whether
a sequence element was an element of the class.
Brian Fristensky                |  
Department of Plant Science     |  
University of Manitoba          |  All kings is mostly rapscallions.
Winnipeg, MB R3T 2N2  CANADA    |  
frist at cc.umanitoba.ca           |
Office phone:   204-474-6085    |  Mark Twain (1835-1910)  
FAX:            204-261-5732    |

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net