sequence alignment by Minimum Message Length (MML) encoding

lloyd allison lloyd at bruce.cs.monash.OZ.AU
Sun May 5 21:23:25 EST 1991


    DNA alignment based on Minimum Message Length encoding (MML).

A set of routines based on the MML principle is available for research into
the alignment problem for two strings and the modelling of the mutation
They are available (via email) at no charge for non-commercial, non-classified,
research purposes.
They derive from work started in
  Allison L. & C.N.Yee. Bull. Math. Biol. 52(3) 431-453 1990
and extended and improved in
  Allison L., C.S.Wallace & C.N.Yee.
      AAAI Symposium on AI+Mol. Bio., Stanford, 1990.
  and Tech report 90/148 Dept. Comp. Sci., Monash University, AUSTRALIA 3168

1, 3 and 5-state models of mutation are implemented.
They model simple, linear and piece-wise-linear indel costs respectively.
The probability of all alignments is added together (efficiently);
this gives a smooth cost function in all cases, amongst other effects.
Optimal parameter values are inferred from the given strings.
The parameter values are included in the message length at appropriate accuracy.
This allows comparison of alternative models on an equal footing.
There is an in-built null-theory and significance test.

A simple driver program is provided to make the routines usable.
Graphical routines are provided to print a probability density plot of
all alignments on a laser printer.
The routines are written in `C'.
They are moderately, but not excessively, heavy users of CPU time and a good
workstation, or more powerful machine, having *hardware* floating-point
arithmetic is recommended for their use on long strings.
They are not intended for quickly searching large data bases of sequences.

To get a `shar' script of the routines send (e)mail to the address below;
ditto for the Tech report 90/148 but remember to include a "real" return

Department of Computer Science, UUCP:lloyd at bruce.cs.monash.edu.au
Monash University, Clayton,     or  :uunet!munnari!bruce.cs.monash.edu.au!lloyd
VICTORIA 3168, AUSTRALIA        Tel :565-5205               FAX: +61 3 565 5146


More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net