Massive multiple alignment

Richard Hughey rph at cse.ucsc.edu
Tue Apr 11 19:11:27 EST 1995

In article <1995Apr11.104221.1265 at reks.uia.ac.be>,
przemko at reks.uia.ac.be (Przemko) writes: 
   I would like to do multiple sequence alignements on samples that
   have couple  
   thousands seqs. Let's say all transmembrane domains or all AUG

You may be interested in our hidden Markov-based multiple alignment
software called SAM (Sequence Alignment and Modeling).  Given a set of
training sequences (say, a subset of your thousand sequences), the
system will generate a statistical model of the family.  Each sequence
can then be aligned to the model, producing a multiple alignment in
n^3 time after the training.  Of course, since sequences are not
compared pairwise, regions of close similarity in 2 sequences can wind
up in different columns (the optimal would require n^1000 steps,
though, so we didn't implement that :-).

You may also be interested in Lawrence et. al's Gibbs sampler and
Sean Eddy's HMM* (http://cele.mrc-lmb.cam.ac.uk/)

You can find out more about SAM, its methods, and related papers at
http://www.cse.ucsc.edu/research/compbio/sam.html or by email to
sam-info at cse.ucsc.edu (which at the moment points to me).


Richard Hughey
Assistant Professor
Computer Engineering Board
University of California
Santa Cruz, CA 95064
(408) 459-2939 Fax: (408) 459-4829
rph at ce.ucsc.edu

