Those working on evolutionary trees and multiple alignment may be
interested in:
%A L. Allison
%A C. S. Wallace
%T An Information Measure for the String to String Correction Problem with
Applications.
%J Australian Computer Science Communications
%E G. Gupta
%V 16
%N 1
%P 659-668
%M JAN
%D 1994
%O 17th Annual Computer Science Conference, Jan 1994, Christchurch N. Z.
%K stochastic, Gibbs sampling, simulated annealing, SA, MolBio, evolution,
multiple alignment, string, sequence, evolutionary family
phylogenetic tree trees, MML MDL FSM PFSM HMM ACSC ACSC17 ACSC94
%X isbn 0-473-02313-X
Algorithms for approximate string matching are widely used to infer a
relation between two strings in situations where strings mutate,
evolve or contain noise, eg. DNA or protein strings.
Such a relation is (just) a hypothesis.
It has been shown how to calculate a probability for such hypotheses.
Unfortunately the matching algorithms do not directly extend to more than
two strings because of rapidly increasing complexity.
However, a stochastic method is one way around this difficulty.
The probabilities of competing hypotheses are used
to generate random alignments of two or more strings.
The alignments are sampled from the correct probability distribution.
Averaging over many such alignments gives good estimates of how closely
the strings are related and in what way. (Mediocrity is useful.)
In addition, sampling in an increasingly selective way gives
a simulated annealing search for an optimal multiple-alignment.
Lloyd ALLISON
Central Inductive Agency,
Department of Computer Science, email: lloyd at cs.monash.edu.au
Monash University, Clayton, tel: 61 3 905 5205
Victoria 3168, AUSTRALIA fax: 61 3 905 5146