In article <1995Apr11.104221.1265 at reks.uia.ac.be>,
przemko at reks.uia.ac.be (Przemko) writes:
I would like to do multiple sequence alignements on samples that
have couple
thousands seqs. Let's say all transmembrane domains or all AUG
contexts.
You may be interested in our hidden Markov-based multiple alignment
software called SAM (Sequence Alignment and Modeling). Given a set of
training sequences (say, a subset of your thousand sequences), the
system will generate a statistical model of the family. Each sequence
can then be aligned to the model, producing a multiple alignment in
n^3 time after the training. Of course, since sequences are not
compared pairwise, regions of close similarity in 2 sequences can wind
up in different columns (the optimal would require n^1000 steps,
though, so we didn't implement that :-).
You may also be interested in Lawrence et. al's Gibbs sampler and
Sean Eddy's HMM* (http://cele.mrc-lmb.cam.ac.uk/)
You can find out more about SAM, its methods, and related papers at
http://www.cse.ucsc.edu/research/compbio/sam.html or by email to
sam-info at cse.ucsc.edu (which at the moment points to me).
Richard
Richard Hughey
Assistant Professor
Computer Engineering Board
University of California
Santa Cruz, CA 95064
(408) 459-2939 Fax: (408) 459-4829
rph at ce.ucsc.edu