IUBio

Sellers Pattern-matching algorithm

Allen Delaney allen at sparc.brc.ubc.ca
Mon Mar 22 11:23:21 EST 1993


The concept of distance is non-directional, i.e. d(a,b) = d(b,a).

This is not necessarily what you want in a pattern matching algorithm.
When searching a data base for a target, you want to penalize for
a longer probe than a target, otherwise every short sequence in the
data base will give you a low-distance match, drowning out all the
interesting matches. You do, however, want to see matches where your
probe sequence matches a subsequence of a longer target sequence.

The algorithm you are using seems to be designed with this in mind, with
no penalty for terminal target sequence gaps. If the algorithm is adjusted
to have the same penalty for all terminal gaps, then inserting "confused
PhD" will have the same penalty as deleteing "confusedPhD", and d(a,b)=d(b,a).

Even in evolutionary trees there is directionality, ancestor --> descendant.
Most parsimony algorithms assume non-directional distance, but I don't know
that this reflects reality.

To obtain non-directional distances:

1) use D = min( d(a,b),d(b,a))
2) use if length(a) < length(b) then D=d(a,b) else D=d(b,a)
3) fix the algorithm terminal gap penalties.


-- 
Allen Delaney (allen at brc.ubc.ca)
Biomedical Research Centre, U.B.C., Vancouver, B.C. Canada, V6T 1Z3




More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net