I believe Steve Clark's MALIGNED program contains a command STATS,
which produces a tabular output file with some measures of
multi-sequence similarity. See CABIOS 8(6): 535-538.
Independent of any particular program, probably you need to think
carefully how YOU would wish the figure of merit to be calculated.
How to assess similarities involving conservatively replaced amino
acids would seem to allow many possible schemes. Will you work from
the pairwise scores in PAM matrices, or will you work from mere a.a.
family memberships? If the former, which table will be appropriate?
If the latter, which families subset assignment will be appropriate;
and will it be a grouping from the literature, or your own "custom"
grouping?
The details can get arbitrarily complex. Presumably, the appropriate
approach depends on what ultimate use the similarity measure will have.
Also, bear in mind that it may be extremely difficult to formulate
a measure which is "metric", in the sense that any given m.s.a.'s
number can be compared to some other m.s.a.'s number for a meaningful
rating of which m.s.a. has "higher similarity" overall. Chances are,
it's not possible -- readers: correct me if I'm wrong! -- considering
all the dimensions to an m.s.a. (number of sequences, overall length
of the m.s.a., presence or absence of gaps, etc). And, if it's not
possible to formulate a sim # which is metric, than any sim # you do
conceive of will have extremely limited usefulness. (But maybe you
have some very specific goal, not general m.s.a. comparison, towards
which some sim # can be developed in good faith.)
Mark Reboul
Columbia-Presbyterian Cancer Center Computing Facility
mark at cuccfa.ccc.columbia.edu