Gabor E. Tusnady wrote:
>> Hello,
>> >Hi All,
> >I am in trouble. I am developing an algorithm for restriction enzyme analysis
> >but it is taking too long. The reason is because there are so many degenerate
> >bases in the DNA sequence and thus it takes very long to analyze for all of
> >them considering the possible combiantions they make. All the more the
> >recognition sequence also have degenerate bases. Is there anybody out there to
> >help me optimize the algorithm? Yes, there is. So thanking in advance to all
> >those who respond.
> >Ravi Gupta.
> >Research Scholar
>> There is a little program, called restenzyme on the net:
>http://www.enzim.hu/~tusi/restric/restenzyme.html>> It uses a very simple table:
>> nwryhmksdacgt
> ACGT n+++++++++++++
> AT w+++++++-++--+
> GA r+++-++++++-+-
> CT y++-+++-++-+-+
> CAT h+++++++++++-+
> AC m++++++-++++--
> TG k+++++-+++--++
> CG s+-+++++++-++-
> GAT d++++++++++-++
> A a+++-++--++---
> C c+--+++-+--+--
> G g+-+---+++--+-
> T t++-++-+-+---+
>> so if you have a degenerated sequence and degenerated recognition site of
> the restriction enzymes, this table shows whether they can be the same.
> For example the sequence: ...acacnwk...
> the rec. site: ...dynctyh...
> characters from the table: ...+++++++...
>
Most of the time this will work, but what if there is a degenerate
base in the query sequence? For example, if you have y in the
query sequence, it should still match a y in the restriction
site.
In the FSAP package
http://home.cc.umanitoba.ca/~psgendb/FSAP.html
BACHREST and INTREST use Pascal sets to simplify
ambiguity comparisons:
(* Array NUCSET holds sets of nucleotides *)
(* using the conventions of IUPAC-IUB *)
NUCSET[A]:= [A];NUCSET[C]:= [C];NUCSET[G]:= [G];NUCSET[T]:= [T];
NUCSET[R]:= [A,G,R];NUCSET[Y]:= [C,T,Y];NUCSET[S]:= [C,G,S];
NUCSET[M]:= [A,C,M];NUCSET[K]:= [G,T,K];NUCSET[W]:= [A,T,W];
NUCSET[B]:= [C,G,T,B,Y,K,S];NUCSET[D]:= [A,G,T,W,R,D,K];
NUCSET[H]:= [A,C,T,M,Y,W,H];NUCSET[V]:= [A,C,G,M,R,S,V];
NUCSET[N]:= [A..N];
AMBIGUOUS:= NUCSET[N] - [A,C,G,T];
Since set operations are built into the language, this
makes dealing with ambiguities quick and easy. For example,
if SEQ[Sk] in PATTERN[Pj] then (*CHARACTER MATCHED*)
decides if the sequence element Sk matches the restriction pattern
element Pj.
If I was writing it today, I'd probably implement the ambiguity
relationships as Java classes, and then check to see whether
a sequence element was an element of the class.
===============================================================================
Brian Fristensky |
Department of Plant Science |
University of Manitoba | All kings is mostly rapscallions.
Winnipeg, MB R3T 2N2 CANADA |
frist at cc.umanitoba.ca |
Office phone: 204-474-6085 | Mark Twain (1835-1910)
FAX: 204-261-5732 |
http://home.cc.umanitoba.ca/~frist/
===============================================================================