displaying alignments

westerm at aclcb.purdue.edu westerm at aclcb.purdue.edu
Tue Dec 21 14:29:52 EST 1993

Hum, all the alignment display programs that I would have recommended to 
Michael Baron (BARON at AVRI.AFRC.AC.UK), he has already tried:

    -- PrettyBox
    -- BoxShade
    -- PrettyPlot
    -- The prettyprint option within SeqApp

All of the programs are not to his liking; they all have faults. This
is not, IMHO, uncommon amoung people who do protein alignments. I have
yet to find a common method of doing protein consensus that satisfies 
everyone. I can't speak for the other programs but the statement:

>I HAVE ALREADY TRIED the GCG program Prettybox, but found that there 
>are still some bugs in it, in that it makes some pretty strange 
>decisions as to what is the consensus, and doesn't respond properly to 
>the PLURALITY setting (which *should* allow one to say that one wants 
>at least X residues to agree before there is a consensus).

Show a lack of understanding on how PrettyBox works -- perhaps not
suprising since I've never written documentation for it. PrettyBox 
does a consensus by 'voting' amoung the amino acids. The 'votes' are
the scores as defined by the file 'prettypep.cmp'; said file can be
changed if one doesn't like the scores. Whichever amino acid that
gathers the most votes is considered to be the consensus amino acid. 
Because of the default scores, this can sometimes lead to what some
people would consider strange results. An example:

Given 5 aligned amino acids:  Y Y Y W W

What is the alignment? Some people would say 'Y' because 3 of the 5
amino acids are 'Y'. 

However, if you look at the scoring table and tally up votes, you will
find that 'F' is the most common denominator. Why?

Y vs. Y has a score of 1.5
Y vs. W has a score of 1.1
Y vs. F has a score of 1.4
W vs. F has a score of 1.3
W vs. W has a score of 1.5

Looking at how the aligned AAs vote:
  The 3 'Y's give 3 times 1.5 or 4.5 votes to a 'Y' consensus
  The 3 'Y's give 3 times 1.1 or 3.3 votes to a 'W' consensus
  The 3 'Y's give 3 times 1.4 or 4.2 votes to a 'F' consensus
  The 2 'W's give 2 times 1.5 or 3.0 votes to a 'W' consensus
  The 2 'W's give 2 times 1.1 or 2.2 votes to a 'Y' consensus
  The 2 'W's give 2 times 1.3 or 2.6 votes to a 'F' consensus

Totaling everything up:

  A 'Y' consensus receives 4.5 + 2.2 or 6.7 votes
  A 'W' consensus receives 3.3 + 3.0 or 6.3 votes
  A 'F' consensus receives 4.2 + 2.6 or 6.8 votes

So the 'F' consensus 'wins' and PrettyBox will shade all 5 aligned 
amino acids as 'similar' but not 'identical'. 

More complex examples can be created but the process is the same.

Aside from changing the score data file, there are a couple 
of command line switches that can modify the scores. '/threshold'
will keep low scoring amino acids from voting. '/plurality'
will only consider consensuses that gather a minimum number of votes (note
that this is not the same as saying 'X residues must agree', just that
'X number of votes must be gathered'). '/simplify' can sometime be
useful by making similar amino acids act the same.

I am, slowly, working on another version of PrettyBox which will have
even more command line switches that enable even finer control of the
consensus algorithm. But even then, I suspect, not everyone will be

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net