Hint on quantifying joint similarity after m.s.a.

Thu Jul 29 08:02:53 EST 1993

I believe Steve Clark's MALIGNED program contains a command STATS, 
which produces a tabular output file with some measures of 
multi-sequence similarity. See CABIOS 8(6): 535-538.

Independent of any particular program, probably you need to think 
carefully how YOU would wish the figure of merit to be calculated.
How to assess similarities involving conservatively replaced amino 
acids would seem to allow many possible schemes. Will you work from 
the pairwise scores in PAM matrices, or will you work from mere a.a.
family memberships? If the former, which table will be appropriate?
If the latter, which families subset assignment will be appropriate; 
and will it be a grouping from the literature, or your own "custom"

The details can get arbitrarily complex. Presumably, the appropriate 
approach depends on what ultimate use the similarity measure will have.

Also, bear in mind that it may be extremely difficult to formulate
a measure which is "metric", in the sense that any given m.s.a.'s 
number can be compared to some other m.s.a.'s number for a meaningful
rating of which m.s.a. has "higher similarity" overall. Chances are, 
it's not possible -- readers: correct me if I'm wrong! -- considering
all the dimensions to an m.s.a. (number of sequences, overall length 
of the m.s.a., presence or absence of gaps, etc). And, if it's not 
possible to formulate a sim # which is metric, than any sim # you do
conceive of will have extremely limited usefulness. (But maybe you 
have some very specific goal, not general m.s.a. comparison, towards
which some sim # can be developed in good faith.)

Mark Reboul
Columbia-Presbyterian Cancer Center Computing Facility
mark at cuccfa.ccc.columbia.edu

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net