Are there any Dynamic Programmers out there able to give
a puzzled PhD student help with the following problem ?
Sellers' pattern matching algorithm enables the
calculation of the best global alignment between two
sequences . In addition it allows the comparison of this
alignment with other suboptimal alignments ( cf Sellers
in "Time warps, String edits and Macromolecules: The theory
and practice of sequence comparison", Sankoff and Kruskal
(eds), Addison-Wesley (1983) ). Comparison of alignments
is based on a distance metric, especially useful for
evolutionary comparisons.
As such, I thought it would be ideal for drawing trees.
Unfortunately, the algorithm as specified in the
aforementioned book does not seem to behave quite as well
as I expected it to : in particular, the distance between
sequence(a) and pattern(b) is not equivalent to the
distance between sequence(b) and pattern(a). This is a
direct consequence of the fact that the pattern can start
anywhere in the length of the sequence. For instance, when
the sequence is "puzzledPhDstudent" and the pattern is
"student" the distance is 0. When the sequence and pattern
are switched, the distance is 10 (using Identity Distance
Matrix, gap penalty = insertion penalty = 1). The increased
distance is due to the cost of inserting "puzzledPhD".
Can anyone suggest a way of modifying the algorithm so that
d(a,b) = d(b,a)?
Yours Hopefully,
Kerr Hatrick.